Recent comments in /f/deeplearning

Dropkickmurph512 t1_j9sxa1j wrote

Agree about the overparamatized models but learning the noise definitely doesn't help. It's mostly from measurements error/quantization and other stuff that is not in the vector space of the signals you care about. It is why early stopping can be useful and actually acts as a regularizer. If you want to a good example look into denoising properties of deep image prior. It can remove noise by training on a single image and stop before learning the image completely.

1

suflaj t1_j9qectx wrote

That is peanuts for the datasets they are trained on. We're talking datasets in the order of terabytes, and the model doesn't usually even iterate over more than 10% of that. So you can't even overfit a model unless you're dealing with duplicates because you will never even go through the whole dataset.

Even if the model had 1 trillion parameters and iterated over the whole dataset, it would be too small for the number of relations contained within a dataset of 1 trillion+ bytes. AND THAT'S IF THEY WERE LINEAR, which they are (usually) NOT.

So there is large overhead in needing multiple sets of parameters to define just one type of relation. Not to mention that some of these models are trained on data pairs, which means the SQUARE of that number of relations. We're talking about physically impossible number of parameters here, which will require solutions radically different that simple matrix multiplication and nonlinear activations.

5

levand t1_j9qe7ev wrote

> These models are too small to truly overfit on their datasets.

I thought we were talking about 175 billion parameters, literally some of the biggest models in existence? Although it is true that at some point models get big enough that they become less prone to overfitting (and it's not clear why): https://openai.com/blog/deep-double-descent/

1

artsybashev t1_j9puq9o wrote

It is in a way the same phenomena. If you think about information in images, overfitting would start to learn even the noise patterns in the images. If your training data does not have enough real information to fill the model capacity, the model will start to learn noise and overfit to your data.

3

suflaj t1_j9puhj0 wrote

Overfit on what? These models are too small to truly overfit on their datasets.

They overfit on noise, which seems to be one reason for their good performance, so it's something you want. And that is not a bad thing - learning what the noise is helps generalization. Once the model starts figuring out noise, it can go beyond what the data would usually allow it to in terms of generalization.

EDIT: Also, a larger model is more easily internally restructured. Overparametrized models are sort of like very big rooms. It's easier to rearrange the same furniture in a larger room, than it is in a smaller room.

2

Appropriate_Ant_4629 t1_j9p4z0e wrote

A bigger array holds more information than a smaller one.

^(You'd need to refine your question. It's obvious that a bigger model could outperform a smaller one -- simply by noticing that it could be identical to the smaller one by just setting the rest of it weights to zero. For every single one of those weights, if there's any value better than zero, the larger model would be better.)

8

jakecoolguy t1_j9jd1hg wrote

I would seriously consider also getting a lot of RAM and a beast of a cpu with a lot of cores if you’re going multiple gpus. A lot of machine learning applications (especially less optimised ones) require stuff to be done on the cpu in parallel with gpu and you don’t want to get bottlenecked

3