Recent comments in /f/MachineLearning
[deleted] t1_jb16jpa wrote
Reply to comment by [deleted] in [D] Ethics of minecraft stable diffusion by NoLifeGamer2
[removed]
ElleLeonne t1_jb16hic wrote
Reply to comment by Chadssuck222 in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho
Maybe it hurts generalization? ie, causes overfitting?
There could even be a second paper in the works to address this question
Chadssuck222 t1_jb15xxk wrote
Noob question: why title this research as ‘reducing under-fitting’ and not as ‘improving fitting of the data’?
ggdupont t1_jb152am wrote
Reply to comment by ok531441 in To RL or Not to RL? [D] by vidul7498
That's the cherry on the top (see https://twitter.com/hlntnr/status/1632030583462285312 ), not the core of the app.
(edit in reaction to downvotes: in all transparency, I love RL paradigm and really think this is decision making approaches are a key to AI ; this being said, my experience in industrial application of RL has always been disapointing in that others approaches did better ;-) )
ggdupont t1_jb14rw1 wrote
Reply to comment by ilyakuzovkin in To RL or Not to RL? [D] by vidul7498
>Over the course of the last years we have seen successful applications of RL
Like real production level applications?
Apart from super nice demo and research paper, I've really not seen much RL in real life production.
PassionatePossum t1_jb0xvdo wrote
Thanks. I'm a sucker for this kind of research: Take a simple technique and evaluate it thoroughly, varying one parameter at a time.
It often is not as glamourous as some of the applied stuff. But IMHO these papers are a lot more valuable. With all the applied research papers, all you know in the end that someone had better results. But nobody knows where these improvements actually came from.
ThirdMover t1_jb0x91p wrote
Reply to comment by Art10001 in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng
I think this is really exciting. LLM applications like ChatGPT seem to still mostly just pipe the result of the model sampling directly out but with 100 times faster inference, maybe complex chain of thought procedures with multiple differently prompted model instances (well, the same model but different contexts) can be chained and work together to improve their output while still running close to real time.
tysam_and_co t1_jb0x0e7 wrote
Interesting. This seems related to https://arxiv.org/abs/1711.08856.
eclipsejki t1_jb0tmeq wrote
this is dangerous on so many levels. External API calls has access to your entire computer. I'd wait for smaller personal LLM
[deleted] t1_jb0sy46 wrote
royalemate357 t1_jb0smq3 wrote
Reply to comment by Art10001 in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng
It's awesome work, but I don't think anyone is claiming anywhere near 100x faster speed and lower VRAM are they?
>RWKV-3 1.5B on A40 (tf32) = always 0.015 sec/token, tested using simple pytorch code (no CUDA), GPU utilization 45%, VRAM 7823M
>
>GPT2-XL 1.3B on A40 (tf32) = 0.032 sec/token (for ctxlen 1000), tested using HF, GPU utilization 45% too (interesting), VRAM 9655M
From this it sounds like about ~2x improvement (dont get me wrong 2x improvement is great for same performance). As for you have to store all the parameters of RWKV model just like GPT, that takes up most of the memory if you're trying to fit models in consumer hardware. Memory is just less because of no need for KV cache.
_Arsenie_Boca_ t1_jb0sm2c wrote
Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng
I have been following your reddit posts for some while now, but I still dont think I fully understand it. Did you consider writing a paper? It might help people get the method and might fuel the open source help you get.
[deleted] t1_jb0sjkb wrote
[deleted]
WarAndGeese t1_jb0rsum wrote
Reply to comment by currentscurrents in [N] EleutherAI has formed a non-profit by StellaAthena
My mistake, it is a funny and good joke I just overreacted. I see too many non-ironic statements like that and it clouded my vision.
Spare_Side_5907 t1_jb0rhw1 wrote
Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng
Is this similar to Toeplitz Neural Network for Sequence Modeling https://openreview.net/forum?id=IxmWsm4xrua ?
earslap t1_jb0qamw wrote
Reply to comment by qqYn7PIE57zkf6kn in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir
When you feed messages into the API, there are different "roles" to tag each message ("assistant", "user", "system"). So you provide content and tell it from which "role" the content comes from. The model continues from there using the role "assistant". There is a token limit (limited by the model) so if your context exceeds that (combined token size of all roles), you'll need to inject salient context from the conversation using the appropriate role.
Art10001 t1_jb0q49f wrote
Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng
If you are RWKV's creator, kudos to you, the work you have done is amazing.
Reminder for everybody: it can run rather quickly in CPU, meaning it can truly run locally in phones. It also is 100 times faster, and uses 100 times less (V)RAM.
radi-cho OP t1_jb0oopy wrote
Supaguccimayne t1_jb0lees wrote
Reply to comment by Ye1488 in [P] LazyShell - GPT based autocomplete for zsh by rumovoice
Im right in the middle of being a millennial and played super mario rpg
tripple13 t1_jb0ksx6 wrote
Reply to To RL or Not to RL? [D] by vidul7498
I find it quite ridiculous to discount RL. Optimal control problems have existed since the beginning of time, and for the situations in which you cannot formulate a set of differential equations, optimizing obtuse functions with value or policy optimization could be a way forward.
It reminds me of the people who discount GANs due to their lack of a likelihood. Sure, but can it be useful regardless? Yes, actually, it can.
[deleted] t1_jb0fikm wrote
Reply to [D] Ethics of minecraft stable diffusion by NoLifeGamer2
Zero chance. Do it
kekinor t1_jb09s2y wrote
Reply to comment by rumovoice in [P] LazyShell - GPT based autocomplete for zsh by rumovoice
Might be that I misunderstood then. Thanks for pointing out the differences and shell_gpt.
Quazar_omega t1_jb034l2 wrote
Reply to comment by Sirisian in [P] LazyShell - GPT based autocomplete for zsh by rumovoice
Oh, wow, now THAT is self centered!
ok531441 t1_jb0229i wrote
Reply to To RL or Not to RL? [D] by vidul7498
Why would RL be doomed? Didn’t sticking RL on top of a big GPT model just give us ChatGPT?
Toast119 t1_jb16zt9 wrote
Reply to comment by Chadssuck222 in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho
I think it's because Dropout is usually seen as a method for reducing overfitting and this paper is claiming and supporting that it is also useful for reducing underfittting as well.