farmingvillein t1_j8p7lci wrote on February 15, 2023 at 11:22 PM

Reply to comment by csreid in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

Neither really work for super long contexts, so it is kind of a moot point.

Both--empirically--end up with bolt-on approaches to enhance memory over very long contexts, so it isn't really clear (a priori) that the RNN has a true advantage here.

csreid t1_j8p5z30 wrote on February 15, 2023 at 11:11 PM

Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

But they theoretically support infinite context length. Getting it is a problem to be solved, not a fundamental incompatibility like it is with transformers.

Binliner42 t1_j8p5iqj wrote on February 15, 2023 at 11:08 PM

Reply to [D] Is anyone working on ML models that infer and train at the same time? by Cogwheel

Reinforcement learning?

avocadoughnut t1_j8p3psq wrote on February 15, 2023 at 10:55 PM

Reply to comment by redv in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

He has trained several smaller RWKV models. You can find them on huggingface

maizeq t1_j8p3f1s wrote on February 15, 2023 at 10:53 PM

Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

Any papers I can refer to that for that last paragraph? I expect it is true but would love to see some empirical work.

MysteryInc152 t1_j8p2jrd wrote on February 15, 2023 at 10:47 PM

Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

That's fair. we won't know till it's tested for sure.

farmingvillein t1_j8p269l wrote on February 15, 2023 at 10:44 PM

Reply to comment by MysteryInc152 in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

> I hope more catch on because the lack of a limited context length is a game changer.

I'd be cautious about concluding this, without more testing.

RNNs, in some theoretical sense, support infinite context more easily than N^2 transformers; in practice, their effective "context window" often doesn't look much different than a reasonable transformer, when we look at performance metrics against long sequences.

TheGamingPhoenix_000 t1_j8p1o9l wrote on February 15, 2023 at 10:41 PM

Reply to [D] Simple Questions Thread by AutoModerator

Dumb Question: Where is a good resource to understand the actual math going on, most resources I find with a simple google search is only api usage, not actually what all the parameters and such mean

MadScientist-1214 t1_j8ox26g wrote on February 15, 2023 at 10:09 PM

Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie

Better than AdamW if (a) the model is a transformer, (b) not a lot of augmentations are used. Otherwise, the improvements are not that large. I doubt this optimizer works well with regular CNNs like efficientnet or convnext.

[deleted] t1_j8ovtbo wrote on February 15, 2023 at 10:01 PM

Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

[removed]

Jean-Porte t1_j8oswiy wrote on February 15, 2023 at 9:42 PM

Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie

I'm waiting for deberta glue/superglue results, it's weird that they picked T5 for that

zdss t1_j8osnth wrote on February 15, 2023 at 9:41 PM

Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie

I've just skimmed the paper, but this is a confusing result. I can see a simpler optimizer paying off when using similar amounts of computing due to being able to run more iterations, but they claim it's also better on a per-iteration basis across the entire learning task. There's not a lot going on in this algorithm, so where is the magic coming from?

It's kind of hard to believe that while people were experimenting with all these more complex optimizers no one tried something this simple and saw that it had state-of-the-art results.

DigThatData t1_j8or4jr wrote on February 15, 2023 at 9:31 PM

Reply to [P] Build data web apps in Jupyter Notebook with Python only by pp314159

i feel like voila is pretty hard to beat, especially considering it already ships with jupyter. just change the word "tree" in your URL to "voila" and bam: your notebook's a webapp.

redv t1_j8opi69 wrote on February 15, 2023 at 9:21 PM

Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

Is it possible to run this on a latptop using CPU and with less than 16GB of ram? If yes, then how does one do this? Thanks.

currentscurrents t1_j8op44d wrote on February 15, 2023 at 9:19 PM

Reply to [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie

Does it though? There was a reproducibility survey recently that found that many optimizers claiming better performance did not in fact work for anything other than the tasks tested in their papers.

Essentially they were doing hyperparameter tuning - just the hyperparameter was the optimizer design itself.

SnuggleWuggleSleep t1_j8omznf wrote on February 15, 2023 at 9:05 PM

Reply to [D] Simple Questions Thread by AutoModerator

How do LSTMs for sports prediction work? My understanding with LSTMs is that they're predicting the next step in a sequence, but a sports match is two sequences coming against each other.

MunichNLP32 t1_j8omtkk wrote on February 15, 2023 at 9:04 PM

Reply to [D] Is anyone working on ML models that infer and train at the same time? by Cogwheel

Imho In context learning is doing that, For more literature read: https://arxiv.org/abs/2211.15661

cantfindaname2take t1_j8omdij wrote on February 15, 2023 at 9:02 PM

Reply to comment by terath in [D] Is anyone working on ML models that infer and train at the same time? by Cogwheel

This! Online learning is a very common term that is used in time series modeling, for example in anomaly or change point detection.

deitscherdeifl t1_j8om1z7 wrote on February 15, 2023 at 8:59 PM

Reply to [D] Is anyone working on ML models that infer and train at the same time? by Cogwheel

Im not deep in it, but maybe hierarchical temporal memory by numenta ist interesting for you

MysteryInc152 t1_j8oj9qx wrote on February 15, 2023 at 8:42 PM

Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

Fantastic work. Thanks for doing this. Good luck scaling to 24b. I hope more catch on because the lack of a limited context length is a game changer.

mz_gt t1_j8ofbiq wrote on February 15, 2023 at 8:17 PM

Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

This is really awesome! I’ve been seeing the progress of your work on RWKV and I have to ask: I know you’ve mentioned a lot of RWKV is using tricks from here and there, and adding a lot of your own tweaks of course, but have you considered writing a paper? There are plenty of highly renowned published works with less to say than RWKV.

I think a renewed discussion about RNNs is more than warranted right now given the current direction with transformers, and the highly complicated nature of HiPPOs are personally not something I see replacing it anytime soon.

terath t1_j8oemyz wrote on February 15, 2023 at 8:13 PM

Reply to [D] Is anyone working on ML models that infer and train at the same time? by Cogwheel

Another key phrase to use with google scholar is "online learning", this is where you have a stream of new examples and you update a model one example at a time. Usually you can use the model for inference at any point in this process, and some algorithms in this area are designed to be a bit more aggressive or at least to control the update rates to more quickly more more slowly adapt to new data.

[deleted] t1_j8odbw1 wrote on February 15, 2023 at 8:05 PM

Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

[removed]

HighLevelJerk t1_j8oacj8 wrote on February 15, 2023 at 7:47 PM

Reply to comment by Reddit1990 in [P] Introducing arxivGPT: chrome extension that summarizes arxived research papers using chatGPT by _sshin_

If ChatGPT really was that smart, it would just copy that

[deleted] t1_j8o87a8 wrote on February 15, 2023 at 7:33 PM

Reply to [P] Build data web apps in Jupyter Notebook with Python only by pp314159

[removed]

Recent comments in /f/MachineLearning