Recent comments in /f/MachineLearning

JimmyTheCrossEyedDog t1_j6nv3zg wrote

This feels like a mix-up between the colloquial and mathematical definitions of dimension Yes, NN approaches tend to work better on very high-dimensional data, but the dimension here refers to the number of input features. So, for a 416x416x3 image, that's >500k dimensions, far higher than the number of dimensions in almost all tabular datasets.

> image data 4D (extra dimension for batch)

The batch is an arbitrary parceling of data simply due to how NNs are typically trained for computational reasons. If I were to train a NN on tabular data, it'd also be batched, but it doesn't give it a new meaningful dimension (either in the colloquial sense or the sense that matters for ML)

Also, NNs are still the best option for computer vision even on greyscale data, which is spatially 2D but still has a huge number of dimensions.

edit: I'd also argue that high dimensionality isn't the biggest reason NNs work for computer vision, but something more fundamental - see qalis's point bin this thread

17

EduCGM OP t1_j6nrow0 wrote

Anyway, the purpose of this work was just talking about a topic that is popular being at the same time readable by a broad audience, and was done by an undergraduate student. There is no peer reviewed article yet, hopefully soon and I will be delighted to share it here as well.

1

akrasia_here_I_come t1_j6nmd3k wrote

But this is exactly the kind of content that gets neglected in academia because of the assumption that everyone reading them already knows the field very well. Lit reviews are wonderful when you can find them, but there's not a lot of incentive to publish those.

If there's a peer-reviewed article out there that covers this info, then by all means please share it! (And in that case, it may be justified to critique someone for sharing the non-peer-reviewed equivalent). But if there's not, it seems pointlessly exclusionary gatekeep the sharing of illuminating content just because it's from outside academia proper.

−1

Brudaks t1_j6nj9z1 wrote

For most established tasks people have a good idea (based on empirical evidence) about the limits of particular methods for this task.

There are tasks where "traditional machine learning methods" work well, and people working on these tasks use them and will use them.

And there are tasks where they don't and deep learning gets far better results that we could/can do otherwise - and for those types of tasks, yes, it would be accurate to say that we have given up on traditional machine learning; if you're given an image classification or text analysis task, you'd generally use DL even for a simple baseline without even trying any of the "traditional" methods we used in earlier years.

12

arhetorical t1_j6nhean wrote

Hiya, great work again! Maybe I'm outing myself a little here, but the code doesn't work on Windows machines, apparently because the processes are spawned instead of forked. I'm not sure it's an easy fix and maybe not worth the time (it works fine on WSL) but just thought I'd mention in case you weren't aware!

On the ML side, should this scale up pretty straightforwardly to CIFAR100 or are there things to be aware of?

2

thevillagersid t1_j6nfjle wrote

Reply to comment by antodima in [D] Sparse Ridge Regression by antodima

You can still compute the estimator with sparse inputs because the regularization term ensures the denominator is full rank. If the zeros are standing in for missing values, however, your estimates will be biased.

As for your second question, W* computed from only columns 2 and 4 will only yield the same values as W in the unrestricted model if the columns of X are orthogonal. Could you work with an orthogonal transform (e.g. PCA projection) of the X matrix?

2

Internal-Diet-514 t1_j6nep37 wrote

Deep learning is only really the better option with higher dimensional data. If Tabular data is 2D, time series is 3D and image data 4D (extra dimension for batch) than deep learning is really only used for 3D and 4D data. As others have said tree based models will most of the time outperform deep learning on a 2D problem.

But I think the interesting thing is the reason we have to use deep learning in the first place. In higher dimensional data we don’t have something that is “a feature” in the sense that we do with 2D data. In time series you have features but they are taken over time so really we need a feature which describes that feature over time. That’s what CNNs do. CNNs are feature extractors and at the end of the process almost always put that data back into 2D format (when doing classification) which is sent through a neural net, but it could be sent through a random forest as well.

I think it’s fair to compare a neural network to traditional ML but when we get into a CNN thats not really a comparison. A CNN is a feature extraction method. The great thing is that we can optimize this step by connecting it to a neural network with a sigmoid (or whatever activation) output.

We don’t have a way to connect traditional ML methods with a feature extraction method in the way you can with back propagation for a neural net and a CNN. If it’s possible to find a way to do that, maybe we would see a rise in the use of traditional ML for high dimensional data.

8

andreichiffa t1_j6n9lg6 wrote

A lot of the conclusions from that paper has been called into question by the discovery GPT-2 was actually memorizing a lot of information from the training dataset a little less than a year later: https://arxiv.org/abs/2012.07805

About a year after that Anthropic came out with a paper that suggested that there were scaling laws that meant undertrained larger models did not that much better and actually did need more data: https://arxiv.org/pdf/2202.07785.pdf

Finally, more recent results from DeepMind did an additional pass on the topic and seem to suggest that the relationship between the data and model size is much more tight than anticipated and that a 4x smaller model trained for 4x the time would out-perform the larger model: https://arxiv.org/pdf/2203.15556.pdf

Basically the original OpenAI paper did contradict a lot of prior research on overfitting and generalization and seems to be due to a Simpson paradox instance on some of the batching they were doing.

1

SaifKhayoon t1_j6n9kb2 wrote

Nah, researchers haven't given up on traditional machine learning methods! They combine them with deep learning in lots of places, like image classification, speech recognition, and recommender systems.

Plus, traditional methods can be better for some tasks, like when you have a small dataset or want an explainable model or real-time predictions.

16