Recent comments in /f/MachineLearning
Tea_Pearce t1_ja73tpy wrote
Reply to comment by CellWithoutCulture in [D] Is RL dead/worth researching these days? by [deleted]
fyi, GATO used imitation learning, which is closer to supervised than RL.
KBM_KBM t1_ja71zrh wrote
Chat gpt works using a combination of rl and llm
tdgros t1_ja71ave wrote
Reply to comment by CellWithoutCulture in [D] Is RL dead/worth researching these days? by [deleted]
>toolformer
Are you sure there's RL in Toolformer? I thought it was mostly self-supervised and fine-tuned.
KBM_KBM t1_ja70ytj wrote
Hopefully it is easier to use than pytorch geometric
[deleted] t1_ja70o3j wrote
[removed]
cthorrez t1_ja70abd wrote
Reply to comment by PassingTumbleweed in [D] Is RL dead/worth researching these days? by [deleted]
I find it a little weird that RLHF is considered to be reinforcement learning.
The human feedback is collected offline and forms a static dataset. They use the objective from PPO but it's really more of a form of supervised learning. There isn't an agent interacting with an env, the "env" is just sampling text from a static dataset and the reward is the score from a neural net trained on a static dataset.
Centigonal t1_ja708qj wrote
Reply to comment by hackinthebochs in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
I think it's fair to call RNA a language.
[deleted] t1_ja6zwva wrote
Reply to comment by harharveryfunny in [N] Google: An Important Next Step On Our AI Journey by EducationalCicada
[removed]
_learn_faster_ OP t1_ja6zovh wrote
Reply to comment by machineko in [D] Faster Flan-T5 inference by _learn_faster_
We have GPUs (e.g. A100) but can only use 1 GPU per request (not multi-gpu). We are also willing to take a bit of an accuracy hit.
Let me know what you think would be best for us?
When you say compression do you mean things like pruning and distillation?
pyonsu2 t1_ja6xz70 wrote
Hottest ever. RHFL, robotics
tripple13 t1_ja6xhe0 wrote
If all you do is following trends, and whats in the "spotlight" you probably don't care about your research, but care about the accolades.
jj_HeRo t1_ja6wl0q wrote
Yann LeCun said on Twitter that it is dead... go figure.
PassingTumbleweed t1_ja6w9ai wrote
It's weird to read this when RLHF has been one of the key components of chat GPT and friends
hpstring t1_ja6uzk4 wrote
Reply to comment by [deleted] in [D] Is RL dead/worth researching these days? by [deleted]
Understood. That depends on personal prediction of the research landscape in the future but I would say it is still researched by institutions like DeepMind. But both RL and LLM share a common aspect: they are very, very expensive.
hackinthebochs t1_ja6tpln wrote
Reply to [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
At what point do we stop calling it a language model?
IndieAIResearcher t1_ja6t4ba wrote
RL + NLP and RL + Vision would have some future, I guess. It would be an integral part.
walk-the-rock t1_ja6sp5s wrote
Reply to comment by st8ic in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
Richard Socher (chief scientist at salesforce) is one of the world leaders in natural language processing within ML/AI
impossiblefork t1_ja6rt6s wrote
Reply to comment by okokoko in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt
I doubt it's possible, but I imagine something like [ed:the] DAN thing with ChatGPT.
Most likely you'd talk to the AI such that the rationality it has obtained from its training data make it reason things out that it's owner would rather it stay silent about it.
[deleted] OP t1_ja6r4xl wrote
Reply to comment by hpstring in [D] Is RL dead/worth researching these days? by [deleted]
[deleted]
darthstargazer t1_ja6qf4v wrote
Reply to comment by Alert_Ad2 in [D] Navigating Academic Conferences by MyActualUserName99
Jeez why so toxic
darthstargazer t1_ja6qchw wrote
Haha I'm sure u will get some good advice! I would say 1. Enjoy the trip 2. If you are aiming postdocs time to do some networking. 3. Enjoy the free food! 4. If you don't understand much about what other papers are about don't stress 😊
hpstring t1_ja6pm05 wrote
Do you specifically mean applications in NLP? RL seems to have a lot of applications in fields like game playing, robotics, neural theorem proving, etc. which seems to have no direct connection with LLMs
CellWithoutCulture t1_ja6pjet wrote
Seems more like an AskML question.
But RL is for situations when you can't backprop the loss. It's noisier than supervised learning. So if you can use supervised learning, then that's what you should generally use.
RL is still used, for example the recent GATO and Dreamer v3. Or used in training an LLM to use tools like in toolformer. And also OpenAI's famous RLHF, which stands for reinforcement learning with human feedback. This is what they use to make ChatGPT "aligned" although in reality it doesn't get there.
Smallpaul t1_ja6pbdt wrote
Reply to comment by [deleted] in [D] To the ML researchers and practitioners here, do you worry about AI safety/alignment of the type Eliezer Yudkowsky describes? by SchmidhuberDidIt
Replying to yourself doesn't get anyone's attention.
rbain13 t1_ja742s1 wrote
Reply to [D] Is RL dead/worth researching these days? by [deleted]
Computers are largely failed attempts at doing what our brains do. Our brains use RL (i.e. dopamine + serotonin) and neural networks. It is probably useful to study for that reason alone :shrug: