Recent comments in /f/MachineLearning
phamtuanminhmeo t1_jbcjjnb wrote
Reply to comment by ortegaalfredo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Thank you so much 🥺
yumiko14 t1_jbc8v8x wrote
Reply to comment by Alchera_QQ in [R] Analysis of 200+ ML competitions in 2022 by hcarlens
XGDragon t1_jbc09qd wrote
Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Awesome roundup. I wonder, was https://grand-challenge.org/ included in your analysis?
adt t1_jbbzba8 wrote
Reply to comment by __Maximum__ in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
There are a few that 'feel' that way. Try Megatron-11B (~200:1) based on RoBERTa (6,198:1). Wayyyyy ahead of its time, and I've matched it with much larger models in some testing.
Here's the full table of Chinchilla-align comparisons:
senacchrib t1_jbbsgju wrote
Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens
This is amazing, thank you! Where do L2R problems fall in your classification? Tabular?
smallfried t1_jbbs0hg wrote
Nice, it works quite well with a fancy prompt (in bold):
"A user is talking to a very imaginative writer, famous for writing in flowery language.
User: What inspired you to become a writer?
Writer: Reading and storytelling have been my passions since I was a little child. I delight in weaving new worlds and characters with the magic of words. Writing is also a way for me to pour out my soul and share my thoughts and feelings with others.
User: Can you describe a world in which fish are ruling the planet?
Writer: I am fascinated with the ocean because I feel like it has a life of its own, and I enjoy reading novels where characters live underwater. Imagine a planet where we and all other land animals have been forced underground by the fish. Fish have taken over the top layer of the planet and now everything has to be built submerged. Houses, cars, grocery stores, and schools—everything has to be submerged to allow fish to walk on top. I imagine that humans would still be living underground, walking through tunnels and using cars, trains, and subways to travel to different areas of the planet. I imagine that everything would be lit by small lights that shine from the bottoms of houses, and that it would rain a lot because the fish would need the water for their bodies. I can also imagine fish having to adapt to the planet aboveground, and that they would want to take over the world above the ocean. They would fight a big war against humans and would want to live above ground, but humans would fight for their right to live in their own planet.
User: Great! Thanks for the story, can you give me a parting gift?
Writer: Sure, I will give you a pencil!
Writer is a student and lives in an imaginary house. I used to be obsessed with the world underwater. I imagined that houses and cars were underwater and that the outside world was more like a vacation destination for humans. I liked to imagine that, if I could go to school underwater, I would be a very good student. I would be one of the best students and would never have to repeat a grade."
currentscurrents t1_jbbmmqs wrote
Reply to comment by _Arsenie_Boca_ in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Eventually you can reach a point where any possible change to the model decreases performance. Then you've fully converged.
Nobody ever does this though because of diminishing returns.
gmork_13 t1_jbbj49n wrote
With fp16/int8 you can probably stick a couple of LLMs of smaller size onto that card.
Have a look around, with fp32 it's about 1B params per 4GB of VRAM. Halve it for fp16 and again for int8 (very roughly).
__Maximum__ OP t1_jbbi89l wrote
Reply to comment by _Arsenie_Boca_ in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Until looking at loss does not get you excited?
_Arsenie_Boca_ t1_jbbh5ng wrote
Reply to comment by __Maximum__ in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Until convergence is something that we often say and hear but makes no sense by definition. Convergence never ends
Alchera_QQ t1_jbbfukn wrote
Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Can somebody elaborate on the discrepancy between PyTorch and TF?
I keep hearing that Torch is preferred for research and academic purposes, but TF seems to be very close in terms of accuracy and performance.
What factor makes Torch users over an order of magnitude more popular here?
gopher9 t1_jbb9f2l wrote
RWKV works rather well on 4090.
NickSinghTechCareers t1_jbb71l9 wrote
Reply to comment by doctorjuice in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? by doctorjuice
This won’t work. Ads is solid for something easily purchasable for low dollar amounts, like a new gadget or fitness supplement. Selling a technical consulting service requires a ton of trust, which you get by networking, content marketing, speaking at conferences, etc.
KD_A t1_jbb5kx5 wrote
The section "Check if your model is overfitting" could be improved.
> The model is overfitting (high variance) when it has low error on the training set but high error on the test set.
A big gap between training and validation error does not imply that it is overfitting. In general, an absolute gap between training and validation errors does not tell you how validation error will change if a model is made more complex or more simple. To answer questions about overfitting and underfitting, one needs to train multiple models and compare their training and validation errors.
> Overfitting and underfitting is easy to detect by visualizing loss curves during training.
nit: this caption is phrased too liberally, as the graph only answers this question: given this model architecture, optimizer, and dataset, which model epoch/checkpoint should I select? It does not tell you about any other factors which modulate model complexity.
> This often means that the training set is not representative of the domain it is supposed to run in.
I wouldn't call this a variance issue per se. If it were a variance issue, sampling more data from the training distribution should significantly lower validation error. If the training distribution is biased, sampling more of it will not help a whole lot.
That all being said, I share your passion for greater standardization of ML workflow. And I agree that there needs to be more work on diagnosing problems, and less "throwing stuff at the wall". To add something, I now typically run learning curves. They can cost quite a bit when training big NNs. But even a low-resolution curve can give a short term answer to an important question: how much should I expect this model to improve if I train it on n more observations? And assuming you have a decent sense of your model's capacity, this question is closely related to another common one: should I prioritize collecting more data, or should I make a modeling intervention? Learning curves have motivated big improvements in my experience.
__Maximum__ OP t1_jbb5bzm wrote
Reply to comment by CKtalon in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Right, I just noticed that LLaMA says they didn't fix their compute. Thanks. I wonder if there is a small architecture that is trained until convergence.
randyzmzzzz t1_jbau9l2 wrote
Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens
🫡 F
Lucas_Matheus t1_jbatxn8 wrote
Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Amazing. This seems like a great way to learn how things are currently being done in ML
ortegaalfredo OP t1_jbat4qi wrote
Reply to comment by blablanonymous in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Just joking, even unbounded, LLaMA its actually more restrained than the original Bing or jailbroken ChatGPT.
ortegaalfredo OP t1_jbaswga wrote
Reply to comment by polawiaczperel in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Interesting, will research more into that code, its exactly what I need to run 33B.
Currently using a single card it's still too slow to use it as a chatbot.
jumbos_clownroom t1_jbarbtv wrote
Reply to comment by MassedCompute in [R] Where can I train a deep learning algorithm with a $1 million budget? by coderdd
$5 here OP. I’ll be losing money on it but f this guy ^
TubasAreFun t1_jbapwmb wrote
Reply to comment by hcarlens in [R] Analysis of 200+ ML competitions in 2022 by hcarlens
JAX does offer some general matrix math that can be more useful/fast than torch alone. I often do deep learning with torch and then use JAX on the top to train statistical models (i.e. fuse features from multiple models, raw features, etc. into a single regression/inference)
LappenX t1_jbap16k wrote
Reply to comment by florinandrei in [D] The MMSegmentation library from OpenMMLab appears to return the wrong results when computing basic image segmentation metrics such as the Jaccard index (IoU - intersection-over-union). It appears to compute recall (sensitivity) instead of IoU, which artificially inflates the performance metrics. by florinandrei
That is exactly what should be happening. In the Cityscapes dataset for example you can always see part of the vehicle in the bottom of the image, and these pixels are set to ignore to be excluded from training loss and test metrics.
CKtalon t1_jbaogg3 wrote
Reply to [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Chinchilla just says that for a given compute, what is the optimal amount of data to train on to give the best bang for your buck. It doesn’t mean that the model converges to ‘best performance’ once it reaches the Chinchilla-optimal token count. Ergo, you can keep training if you have plenty of budget
alushamir t1_jbao8ez wrote
Reply to comment by TikiTDO in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
>BLIP VQA
Thanks for sharing! you can try fastdup. It's free and scales. It's also very easy to use.
https://github.com/visual-layer/fastdup
Would love to get your feedback. PM or join our Slack channel. Will be happy to talk more.
midasp t1_jbcusx7 wrote
Reply to [D] To Make Your Model Better, First Figure Out What's Wrong by pgao_aquarium
It seems most of the article can be summarized by "don't hire inexperienced ML practitioners, they have no idea how to improve a badly performing model", followed by "here are some really simple things a somewhat seasoned ML practitioner would do".