Toast119 t1_jb16zt9 wrote on March 5, 2023 at 5:31 PM

Reply to comment by Chadssuck222 in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

I think it's because Dropout is usually seen as a method for reducing overfitting and this paper is claiming and supporting that it is also useful for reducing underfittting as well.

[deleted] t1_jb16jpa wrote on March 5, 2023 at 5:28 PM

Reply to comment by [deleted] in [D] Ethics of minecraft stable diffusion by NoLifeGamer2

[removed]

ElleLeonne t1_jb16hic wrote on March 5, 2023 at 5:28 PM

Reply to comment by Chadssuck222 in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Maybe it hurts generalization? ie, causes overfitting?

There could even be a second paper in the works to address this question

Chadssuck222 t1_jb15xxk wrote on March 5, 2023 at 5:24 PM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Noob question: why title this research as ‘reducing under-fitting’ and not as ‘improving fitting of the data’?

ggdupont t1_jb152am wrote on March 5, 2023 at 5:19 PM

Reply to comment by ok531441 in To RL or Not to RL? [D] by vidul7498

That's the cherry on the top (see https://twitter.com/hlntnr/status/1632030583462285312 ), not the core of the app.

(edit in reaction to downvotes: in all transparency, I love RL paradigm and really think this is decision making approaches are a key to AI ; this being said, my experience in industrial application of RL has always been disapointing in that others approaches did better ;-) )

ggdupont t1_jb14rw1 wrote on March 5, 2023 at 5:17 PM

Reply to comment by ilyakuzovkin in To RL or Not to RL? [D] by vidul7498

>Over the course of the last years we have seen successful applications of RL

Like real production level applications?
Apart from super nice demo and research paper, I've really not seen much RL in real life production.

PassionatePossum t1_jb0xvdo wrote on March 5, 2023 at 4:29 PM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Thanks. I'm a sucker for this kind of research: Take a simple technique and evaluate it thoroughly, varying one parameter at a time.

It often is not as glamourous as some of the applied stuff. But IMHO these papers are a lot more valuable. With all the applied research papers, all you know in the end that someone had better results. But nobody knows where these improvements actually came from.

ThirdMover t1_jb0x91p wrote on March 5, 2023 at 4:24 PM

Reply to comment by Art10001 in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

I think this is really exciting. LLM applications like ChatGPT seem to still mostly just pipe the result of the model sampling directly out but with 100 times faster inference, maybe complex chain of thought procedures with multiple differently prompted model instances (well, the same model but different contexts) can be chained and work together to improve their output while still running close to real time.

tysam_and_co t1_jb0x0e7 wrote on March 5, 2023 at 4:22 PM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Interesting. This seems related to https://arxiv.org/abs/1711.08856.

eclipsejki t1_jb0tmeq wrote on March 5, 2023 at 3:56 PM

Reply to [P] LazyShell - GPT based autocomplete for zsh by rumovoice

this is dangerous on so many levels. External API calls has access to your entire computer. I'd wait for smaller personal LLM

[deleted] t1_jb0sy46 wrote on March 5, 2023 at 3:51 PM

Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

[removed]

royalemate357 t1_jb0smq3 wrote on March 5, 2023 at 3:49 PM

Reply to comment by Art10001 in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

It's awesome work, but I don't think anyone is claiming anywhere near 100x faster speed and lower VRAM are they?

>RWKV-3 1.5B on A40 (tf32) = always 0.015 sec/token, tested using simple pytorch code (no CUDA), GPU utilization 45%, VRAM 7823M
>
>GPT2-XL 1.3B on A40 (tf32) = 0.032 sec/token (for ctxlen 1000), tested using HF, GPU utilization 45% too (interesting), VRAM 9655M

From this it sounds like about ~2x improvement (dont get me wrong 2x improvement is great for same performance). As for you have to store all the parameters of RWKV model just like GPT, that takes up most of the memory if you're trying to fit models in consumer hardware. Memory is just less because of no need for KV cache.

_Arsenie_Boca_ t1_jb0sm2c wrote on March 5, 2023 at 3:49 PM

Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

I have been following your reddit posts for some while now, but I still dont think I fully understand it. Did you consider writing a paper? It might help people get the method and might fuel the open source help you get.

[deleted] t1_jb0sjkb wrote on March 5, 2023 at 3:48 PM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

[deleted]

WarAndGeese t1_jb0rsum wrote on March 5, 2023 at 3:43 PM

Reply to comment by currentscurrents in [N] EleutherAI has formed a non-profit by StellaAthena

My mistake, it is a funny and good joke I just overreacted. I see too many non-ironic statements like that and it clouded my vision.

Spare_Side_5907 t1_jb0rhw1 wrote on March 5, 2023 at 3:41 PM

Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

Is this similar to Toeplitz Neural Network for Sequence Modeling https://openreview.net/forum?id=IxmWsm4xrua ?

earslap t1_jb0qamw wrote on March 5, 2023 at 3:32 PM

Reply to comment by qqYn7PIE57zkf6kn in [D] OpenAI introduces ChatGPT and Whisper APIs (ChatGPT API is 1/10th the cost of GPT-3 API) by minimaxir

When you feed messages into the API, there are different "roles" to tag each message ("assistant", "user", "system"). So you provide content and tell it from which "role" the content comes from. The model continues from there using the role "assistant". There is a token limit (limited by the model) so if your context exceeds that (combined token size of all roles), you'll need to inject salient context from the conversation using the appropriate role.

Art10001 t1_jb0q49f wrote on March 5, 2023 at 3:31 PM

Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

If you are RWKV's creator, kudos to you, the work you have done is amazing.

Reminder for everybody: it can run rather quickly in CPU, meaning it can truly run locally in phones. It also is 100 times faster, and uses 100 times less (V)RAM.

radi-cho OP t1_jb0oopy wrote on March 5, 2023 at 3:20 PM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Paper: https://arxiv.org/abs/2303.01500
Code: https://github.com/facebookresearch/dropout

Supaguccimayne t1_jb0lees wrote on March 5, 2023 at 2:55 PM

Reply to comment by Ye1488 in [P] LazyShell - GPT based autocomplete for zsh by rumovoice

Im right in the middle of being a millennial and played super mario rpg

tripple13 t1_jb0ksx6 wrote on March 5, 2023 at 2:50 PM

Reply to To RL or Not to RL? [D] by vidul7498

I find it quite ridiculous to discount RL. Optimal control problems have existed since the beginning of time, and for the situations in which you cannot formulate a set of differential equations, optimizing obtuse functions with value or policy optimization could be a way forward.

It reminds me of the people who discount GANs due to their lack of a likelihood. Sure, but can it be useful regardless? Yes, actually, it can.