kryptoklob t1_jb4c5f8 wrote on March 6, 2023 at 9:35 AM

Reply to [P] diffground - A simplistic Android UI to access ControlNet and instruct-pix2pix. by radi-cho

Amazing. What base model are you using?

Delacroid t1_jb4c3xt wrote on March 6, 2023 at 9:34 AM

Reply to comment by jobeta in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

I don't think so. If you look at the figure and check the angle between whole dataset backprop and minibatch backprop, increasing the learning rate wouldn't change that angle. Only the scale of the vectors.

Also, dropout does not (only) introduce noise, it prevents coadaptation of neurons. In the same way that in random forest each forest is trained on a subset on the data (bootstrapping I think it's called) the same happens for neurons when you use dropout.

I haven't read the paper but my intuition says thattthe merit of dropout for early stages of training could be that the bootstrapping is reducing the bias of the model. That's why the direction of optimization is closer to the whole dataset training.

iloveintuition t1_jb4bp1o wrote on March 6, 2023 at 9:28 AM

Reply to [D] Best way to run LLMs in the cloud? by QTQRQD

Using vast.ai for running flan-xl, works pretty well. Haven't tested on LLama scale.

askljof t1_jb4bkf0 wrote on March 6, 2023 at 9:26 AM

Reply to comment by deekaire in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Amazing reply 🤝

PassionatePossum t1_jb4977c wrote on March 6, 2023 at 8:51 AM

Reply to comment by speyside42 in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Agreed. Sometimes theoretical analysis doesn't transfer to the real world. And sometimes it is also valuable to see a complete system. Because the whole training process is important.

However, since my days in academia are over, I am much less interested in getting the next 0.5% of performance out of some benchmark dataset. In industry you are way more interested in a well-working solution that you can produce quickly instead of the best-performing solution. So, I am way more interested in a tool set of ideas that generally work well and ideally a knowledge of what the limitations are.

And yes, while papers about applications can provide practical validation of these ideas, very few of these papers conduct proper ablation studies. And in most cases it is also too much to ask. Pretty much any application is a complex system with an elaborate pre-processing and training procedure. You cannot practically evaluate the influence of every single step and parameter. You just twiddle around with the parameters you deem to be most important and that is your ablation study.

Philpax t1_jb471z4 wrote on March 6, 2023 at 8:20 AM

Reply to comment by I_will_delete_myself in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

There's information about this in the README, but I'll admit that it's a little too technical and doesn't have a high-level description of the ideas. Looking forward to the paper!

cantfindaname2take t1_jb42qee wrote on March 6, 2023 at 7:21 AM

Reply to comment by ggdupont in To RL or Not to RL? [D] by vidul7498

Isn't it extensively used in robotics??

speyside42 t1_jb425d4 wrote on March 6, 2023 at 7:13 AM

Reply to comment by PassionatePossum in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

A good mixture is key. Independent applied research will show whether the claims of slight improvements hold in general. A counter example where "this kind of research" has failed us are novel optimizers.

pyonsu2 t1_jb3y5ps wrote on March 6, 2023 at 6:23 AM

Reply to [D] Best way to run LLMs in the cloud? by QTQRQD

maybe, Colab Pro+?

WandererXZZ t1_jb3wyfc wrote on March 6, 2023 at 6:09 AM

Reply to comment by alterframe in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

It's actually, for every layer in the ResNet, dropping everything else except residual connections with a probability p. See this paper Deep Network with Stochastic Depth

rpnewc t1_jb3uwx9 wrote on March 6, 2023 at 5:46 AM

Reply to comment by ComputerAttny in [D] Ethics of minecraft stable diffusion by NoLifeGamer2

Good to know.

Mr_Smartypants t1_jb3tlu5 wrote on March 6, 2023 at 5:31 AM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

> We begin our investigation into dropout training dynamics by making an intriguing observation on gradient norms, which then leads us to a key empirical finding: during the initial stages of training, dropout reduces gradient variance across mini-batches and allows the model to update in more consistent directions. These directions are also more aligned with the entire dataset’s gradient direction (Figure 1).

Interesting. Has anyone looked at optimally controlling the gradient variance with other means? I.e. minibatch size?

[deleted] t1_jb3t91m wrote on March 6, 2023 at 5:27 AM

Reply to [D] Best way to run LLMs in the cloud? by QTQRQD

[removed]

ComputerAttny t1_jb3lk7d wrote on March 6, 2023 at 4:12 AM

Reply to comment by rpnewc in [D] Ethics of minecraft stable diffusion by NoLifeGamer2

Also worth noting that if you are an individual (ie not deep pockets) they’ll bring suit for an injunction. An injunction just demands you stop doing something; won’t be for money. So you’ll just lose time/effort, not $$$.

tysam_and_co t1_jb3i6eq wrote on March 6, 2023 at 3:42 AM

Reply to comment by amhotw in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Right, right, right, though I don't see how dropout introduces bias into the network. Sure, we're subsampling the network in general, but overall the information integrated with respect to a minibatch should be less on the whole due to gradient noise, right? So the bias should be less and as a result we have more uncertainty, then more steps equals more integration time of course and on we go from there towards that elusive less-biased estimator.

I guess the sticking point is _how_ they're saying that dropout induces bias. I feel like fitting quickly in a non-regularized setting has more bias by default, because I believe the 0-centered noise should end up diluting the loss signal. I think. Right? I find this all very strange.

jobeta t1_jb3hh74 wrote on March 6, 2023 at 3:36 AM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

This is cool and I haven’t finished reading it yet but, intuitively, isn’t that roughly equivalent to have a higher learning rate in the beginning? You make the learning algorithm purposefully imprecise at the beginning to explore quickly the loss landscape and later on, once a rough approximation of a minimum has been found, you are able to explore more carefully to look for a deeper minimum or something? Like the dropout introduces noise doesn’t it?

ilyakuzovkin t1_jb3faxj wrote on March 6, 2023 at 3:18 AM

Reply to comment by growqx in To RL or Not to RL? [D] by vidul7498

Point taken :) Not the best example, what I was aiming for was an example of a problem that is clearly best solved with some other computational framework than RL

amhotw t1_jb38ai5 wrote on March 6, 2023 at 2:21 AM

Reply to comment by tysam_and_co in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Based on what you copied: they are saying that dropout introduces bias. Hence, it reduces the variance.

Here is why it might be bothering you: bias-variance trade-off makes sense if you are on the efficient frontier, ie cramer-rao bound should hold with equality for trade-off to make sense. You can always have a model with a higher bias AND a higher variance; introducing bias doesn't necessarily reduce the variance.

BrotherAmazing t1_jb37vx3 wrote on March 6, 2023 at 2:18 AM

Reply to comment by Chadssuck222 in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

It’s sort of a “clickbait” title I didn’t like myself even if it’s a potentially interesting paper.

Usually we assume dropout helps prevent overfitting, not help with underfitting, but the thing I don’t like about the title is it makes it sound like dropout helps with underfitting in general. It does not and they don’t even claim it does—even by the time you finish reading their Abstract you can tell that they’re only saying dropout has been observed to help with underfitting in certain circumstances when used in certain ways only.

I can come up with low dimensional counter-examples where dropout won’t help you when you’re underfitting, and will necessarily be the cause of the underfitting for example.

BrotherAmazing t1_jb36ydw wrote on March 6, 2023 at 2:10 AM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Not a fan of the title they chose for this paper, as it’s really “Dropout can reduce underfitting” and not that it does in general.

Otherwise it may be interesting if this is re-produced/verified.

ThaGooInYaBrain t1_jb342rd wrote on March 6, 2023 at 1:47 AM

Reply to comment by ggdupont in To RL or Not to RL? [D] by vidul7498

> "In October 2022, DeepMind unveiled a new version of AlphaZero, called AlphaTensor, in a paper published in Nature. The version discovered a faster way to perform matrix multiplication – one of the most fundamental tasks in computing – using reinforcement learning."

Matrix multiplication is a pretty damn practical real life application, no?

I_will_delete_myself t1_jb33bmz wrote on March 6, 2023 at 1:41 AM

Reply to [D] Best way to run LLMs in the cloud? by QTQRQD

Use a spot instance. If you testing it out you wallet will thank you later. Look at my previous post on here about running stuff in the cloud before you do it.

[deleted] t1_jb32vlo wrote on March 6, 2023 at 1:38 AM

Reply to comment by I_will_delete_myself in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

[deleted]

I_will_delete_myself t1_jb32fo5 wrote on March 6, 2023 at 1:34 AM

Reply to [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

What’s the reason to use this over a transformers? Transformers allow transfer learning and is able to paralize easier. 啊我看到你的知乎。你在什么公司去工作？

itsnotmeyou t1_jb2zfbq wrote on March 6, 2023 at 1:09 AM

Reply to comment by itsnotmeyou in [D] Best way to run LLMs in the cloud? by QTQRQD

On a side note sagemaker was not supporting shm-size so might not work for large lm

Recent comments in /f/MachineLearning