Recent comments in /f/MachineLearning

Disastrous_Nose_1299 OP t1_j9k1hpi wrote

I fully respect you because the reason I wanted to talk about this on reddit was because I couldn't talk to any professionals, I am fully aware of what the god in the gaps theory is, but my idea is different because it does not claim that god exists somewhere, instead it is a thought experiment.

A"does god exist?"

B"no"

A"But what if he is in a black hole?"

B"he is not in a black hole"

A"I cannot fully trust your judgement until we see what is inside a black hole first, then we can say whether or not he is in a black hole."

It is simple, concise and one time one of Open AI's models called me a genius because of it, although most people seem to think im an idiot for saying it.

−1

DigThatData t1_j9k17rr wrote

it's not. tree ensembles scale gloriously, as do approximations of nearest neighbors. there are certain (and growing) classes of problems for which deep learning produces seemingly magical results, but that doesn't mean it's the only path to a functional solution. It'll probably give you the best solution, but that doesn't mean it's the only way to do things.

in any event, if you want to better understand scaling properties of DL algorithms, a good place to start is the "double descent" literature.

3

TinkerAndThinker t1_j9k0y39 wrote

Looking for recommendations on PhD-level papers/textbooks/reading list on Machine Learning.

I want to revisit even the most "basic" of topics such as linear/logistic regression, but with better deeper understanding.

Desired outcome: able to answer questions like

  • how to test for xxx assumption
  • what is the implication if xxx assumption is violated (eg. heteroskedascity of error terms)

TIA!

1

bloodmummy t1_j9jzvnr wrote

It strikes me that people who tout DL as a hammer-for-all-nails never touched tabular data in their lives. Go try to do a couple of Kaggle Tabular competitions and you'll soon realise that DL can be very dumb, cumbersome, and data-hungry. Ensemble models,Decision Tree models, and even feature-engineered Linear Regression models still rule there and curb-stomp DL all day long ( For most cases ).

Tabular data is also still the type of data most-used with ML. I'm not a "DL-hater" if there is such a thing, in fact my own research is using DL only. But it isn't a magical wrench, and it won't be.

8

cccntu OP t1_j9jz6ov wrote

This project started out as me exploring if PyTorch parametrizations could be used to do LoRA, and it turned out perfect for this task! And I simply wanted to share that.
I think it would be interesting to see it integrated into PEFT, too. Although they already have their own LoRA implementation there.

1

SodomizedPanda t1_j9jyhem wrote

And somehow, the best answer is at the bottom of the thread..

A small addition : Recent research suggests that the implicit bias in DNN that helps generalization does not only lie in the structure of the network but in the learning algorithm as well (Adam, SGD, ...). https://francisbach.com/rethinking-sgd-noise/ https://francisbach.com/implicit-bias-sgd/

27

activatedgeek t1_j9jvj8h wrote

For generalization (performing well beyond the training), there’s at least two dimensions: flexibility and inductive biases.

Flexibility ensures that many functions “can” be approximated in principle. That’s the universal approximation theorem. It is a descriptive result and does not prescribe how to find that function. This is not something very unique to DL. Deep Random Forests, Fourier Bases, Polynomial Bases, Gaussian processes all are universal function approximators (with some extra technical details).

The part unique to DL is that somehow their inductive biases have helped match some of the complex structured problems including vision and language that makes them generalize well. Inductive bias is a loosely defined term. I can provide examples and references.

CNNs provide the inductive bias to prefer functions that handle translation equivariance (not exactly true but only roughly due to pooling layers). https://arxiv.org/abs/1806.01261

Graph neural networks provide a relational inductive bias. https://arxiv.org/abs/1806.01261

Neural networks overall prefer simpler solutions, embodying Occam’s razor, another inductive bias. This argument is made theoretically using Kolmogorov complexity. https://arxiv.org/abs/1805.08522

107