Recent comments in /f/MachineLearning

adt t1_jbbzba8 wrote

There are a few that 'feel' that way. Try Megatron-11B (~200:1) based on RoBERTa (6,198:1). Wayyyyy ahead of its time, and I've matched it with much larger models in some testing.

https://app.inferkit.com/demo

Here's the full table of Chinchilla-align comparisons:

https://lifearchitect.ai/models-table/

2

smallfried t1_jbbs0hg wrote

Nice, it works quite well with a fancy prompt (in bold):

"A user is talking to a very imaginative writer, famous for writing in flowery language.

User: What inspired you to become a writer?

Writer: Reading and storytelling have been my passions since I was a little child. I delight in weaving new worlds and characters with the magic of words. Writing is also a way for me to pour out my soul and share my thoughts and feelings with others.

User: Can you describe a world in which fish are ruling the planet?

Writer: I am fascinated with the ocean because I feel like it has a life of its own, and I enjoy reading novels where characters live underwater. Imagine a planet where we and all other land animals have been forced underground by the fish. Fish have taken over the top layer of the planet and now everything has to be built submerged. Houses, cars, grocery stores, and schools—everything has to be submerged to allow fish to walk on top. I imagine that humans would still be living underground, walking through tunnels and using cars, trains, and subways to travel to different areas of the planet. I imagine that everything would be lit by small lights that shine from the bottoms of houses, and that it would rain a lot because the fish would need the water for their bodies. I can also imagine fish having to adapt to the planet aboveground, and that they would want to take over the world above the ocean. They would fight a big war against humans and would want to live above ground, but humans would fight for their right to live in their own planet.

User: Great! Thanks for the story, can you give me a parting gift?

Writer: Sure, I will give you a pencil!

Writer is a student and lives in an imaginary house. I used to be obsessed with the world underwater. I imagined that houses and cars were underwater and that the outside world was more like a vacation destination for humans. I liked to imagine that, if I could go to school underwater, I would be a very good student. I would be one of the best students and would never have to repeat a grade."

2

Alchera_QQ t1_jbbfukn wrote

Can somebody elaborate on the discrepancy between PyTorch and TF?

I keep hearing that Torch is preferred for research and academic purposes, but TF seems to be very close in terms of accuracy and performance.

What factor makes Torch users over an order of magnitude more popular here?

4

NickSinghTechCareers t1_jbb71l9 wrote

This won’t work. Ads is solid for something easily purchasable for low dollar amounts, like a new gadget or fitness supplement. Selling a technical consulting service requires a ton of trust, which you get by networking, content marketing, speaking at conferences, etc.

2

KD_A t1_jbb5kx5 wrote

The section "Check if your model is overfitting" could be improved.

> The model is overfitting (high variance) when it has low error on the training set but high error on the test set.

A big gap between training and validation error does not imply that it is overfitting. In general, an absolute gap between training and validation errors does not tell you how validation error will change if a model is made more complex or more simple. To answer questions about overfitting and underfitting, one needs to train multiple models and compare their training and validation errors.

> Overfitting and underfitting is easy to detect by visualizing loss curves during training.

nit: this caption is phrased too liberally, as the graph only answers this question: given this model architecture, optimizer, and dataset, which model epoch/checkpoint should I select? It does not tell you about any other factors which modulate model complexity.

> This often means that the training set is not representative of the domain it is supposed to run in.

I wouldn't call this a variance issue per se. If it were a variance issue, sampling more data from the training distribution should significantly lower validation error. If the training distribution is biased, sampling more of it will not help a whole lot.

That all being said, I share your passion for greater standardization of ML workflow. And I agree that there needs to be more work on diagnosing problems, and less "throwing stuff at the wall". To add something, I now typically run learning curves. They can cost quite a bit when training big NNs. But even a low-resolution curve can give a short term answer to an important question: how much should I expect this model to improve if I train it on n more observations? And assuming you have a decent sense of your model's capacity, this question is closely related to another common one: should I prioritize collecting more data, or should I make a modeling intervention? Learning curves have motivated big improvements in my experience.

5

TubasAreFun t1_jbapwmb wrote

JAX does offer some general matrix math that can be more useful/fast than torch alone. I often do deep learning with torch and then use JAX on the top to train statistical models (i.e. fuse features from multiple models, raw features, etc. into a single regression/inference)

16

LappenX t1_jbap16k wrote

That is exactly what should be happening. In the Cityscapes dataset for example you can always see part of the vehicle in the bottom of the image, and these pixels are set to ignore to be excluded from training loss and test metrics.

1