Recent comments in /f/MachineLearning

ggdupont t1_jb152am wrote

Reply to comment by ok531441 in To RL or Not to RL? [D] by vidul7498

That's the cherry on the top (see https://twitter.com/hlntnr/status/1632030583462285312 ), not the core of the app.

(edit in reaction to downvotes: in all transparency, I love RL paradigm and really think this is decision making approaches are a key to AI ; this being said, my experience in industrial application of RL has always been disapointing in that others approaches did better ;-) )

−3

ggdupont t1_jb14rw1 wrote

Reply to comment by ilyakuzovkin in To RL or Not to RL? [D] by vidul7498

>Over the course of the last years we have seen successful applications of RL

Like real production level applications?
Apart from super nice demo and research paper, I've really not seen much RL in real life production.

1

PassionatePossum t1_jb0xvdo wrote

Thanks. I'm a sucker for this kind of research: Take a simple technique and evaluate it thoroughly, varying one parameter at a time.

It often is not as glamourous as some of the applied stuff. But IMHO these papers are a lot more valuable. With all the applied research papers, all you know in the end that someone had better results. But nobody knows where these improvements actually came from.

411

ThirdMover t1_jb0x91p wrote

I think this is really exciting. LLM applications like ChatGPT seem to still mostly just pipe the result of the model sampling directly out but with 100 times faster inference, maybe complex chain of thought procedures with multiple differently prompted model instances (well, the same model but different contexts) can be chained and work together to improve their output while still running close to real time.

3

royalemate357 t1_jb0smq3 wrote

It's awesome work, but I don't think anyone is claiming anywhere near 100x faster speed and lower VRAM are they?

>RWKV-3 1.5B on A40 (tf32) = always 0.015 sec/token, tested using simple pytorch code (no CUDA), GPU utilization 45%, VRAM 7823M
>
>GPT2-XL 1.3B on A40 (tf32) = 0.032 sec/token (for ctxlen 1000), tested using HF, GPU utilization 45% too (interesting), VRAM 9655M

From this it sounds like about ~2x improvement (dont get me wrong 2x improvement is great for same performance). As for you have to store all the parameters of RWKV model just like GPT, that takes up most of the memory if you're trying to fit models in consumer hardware. Memory is just less because of no need for KV cache.

7

earslap t1_jb0qamw wrote

When you feed messages into the API, there are different "roles" to tag each message ("assistant", "user", "system"). So you provide content and tell it from which "role" the content comes from. The model continues from there using the role "assistant". There is a token limit (limited by the model) so if your context exceeds that (combined token size of all roles), you'll need to inject salient context from the conversation using the appropriate role.

2

tripple13 t1_jb0ksx6 wrote

I find it quite ridiculous to discount RL. Optimal control problems have existed since the beginning of time, and for the situations in which you cannot formulate a set of differential equations, optimizing obtuse functions with value or policy optimization could be a way forward.

It reminds me of the people who discount GANs due to their lack of a likelihood. Sure, but can it be useful regardless? Yes, actually, it can.

14