MassedCompute t1_jbao1yf wrote on March 7, 2023 at 6:03 PM

It will perform well if the test data is high quality. Really, a good amount of the budget should go into the test data, rather than infinite training.

[deleted] t1_jbacipc wrote on March 7, 2023 at 4:49 PM

Reply to comment by trajo123 in [R] Where can I train a deep learning algorithm with a $1 million budget? by coderdd

[removed]

its_ean t1_jbac88s wrote on March 7, 2023 at 4:47 PM

Reply to [R] Where can I train a deep learning algorithm with a $1 million budget? by coderdd

I'll do it! $10^6 please.

(I'd subcontract it out to my uncles)

ortegaalfredo OP t1_jbaaqv5 wrote on March 7, 2023 at 4:38 PM

Reply to comment by SrPeixinho in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

The most important thing is to create a multi-process quantization to int8, this will allow it to work with 4X3090 GPU cards. Now it requires 8X3090 GPUs and its way over my budget.

Or just wait some days, I'm told some guys have 2xA100 cards and they will open a 65B model to the public this week.

ortegaalfredo OP t1_jbaadnz wrote on March 7, 2023 at 4:35 PM

Reply to comment by phamtuanminhmeo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Yes, you can send raw prompts using 'raw' like this:

'@ BasedGPT raw The recipe of a chocolate cake is'

This will send whatever you write raw, without any wrapping or added text. But you have to write the prompt as a continuation like every other LLM before ChatGPT.

currentscurrents t1_jba92os wrote on March 7, 2023 at 4:27 PM

Reply to [R] Where can I train a deep learning algorithm with a $1 million budget? by coderdd

Spend part of it to hire an ML expert with a PhD to help you.

trajo123 t1_jba7yx2 wrote on March 7, 2023 at 4:19 PM

Reply to [R] Where can I train a deep learning algorithm with a $1 million budget? by coderdd

So this is where clueless managers come for inspiration!

Few_Pangolin4015 t1_jba7yg0 wrote on March 7, 2023 at 4:19 PM

Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? by doctorjuice

Hey, I'm a MLE, with ~3yrs exp looking for some private work I can do on the side - evenings and weekends. DM and let's connect on LN.

ok531441 t1_jba6m3g wrote on March 7, 2023 at 4:10 PM

Reply to [R] Where can I train a deep learning algorithm with a $1 million budget? by coderdd

> Can I spend $1 million on cloud computing companies ... (AWS, Azure, Google Cloud)

Yes, any one of these companies will gladly accept money for the services they provide.

> and would that make sense?

Depends on the problem you're trying to solve, most applications don't need that kind of budget. If you need to ask, you probably don't need to spend that much.

cesarebo t1_jba5k13 wrote on March 7, 2023 at 4:03 PM

Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Good job u/hcarlens and MLcontests team! Thank you for the insights!

blablanonymous t1_jba5dai wrote on March 7, 2023 at 4:02 PM

Reply to [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Why are you saying it’s unhinged? It just feels to me like it’s simply not constrained in the same way ChatGPT is, which is a very important part of providing a good experience, isn’t it?

ReginaldIII t1_jb9xlil wrote on March 7, 2023 at 3:10 PM

Reply to comment by abnormal_human in [R] Created a Discord server with LLaMA 13B by ortegaalfredo

Fair enough, I didn't realize that hosting a publicly available service is not the same as distributing.

Dankmemexplorer t1_jb9xjl9 wrote on March 7, 2023 at 3:10 PM

Reply to [D] Neat project that would "fit" onto a 4090? by lifesthateasy

-stable diffusion would be fun to play with

-you can try simple computer vision tasks / finetune a model to detect your cat or something

hcarlens OP t1_jb9woj6 wrote on March 7, 2023 at 3:04 PM

Reply to comment by WirrryWoo in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

I found that for a lot of time-series problems, people often treated them as if they were standard tabular/supervised learning problems. There's a separate page of the report which goes into these in detail: https://mlcontests.com/tabular-data?ref=mlc_reddit

For example, for the Kaggle Amex default prediction competition, the data is time-series in the sense that you're given a sequence of customer statements, and then have to predict the probability of them defaulting within a set time period after that. The winner's solution mostly seemed to flatten the features and use LightGBM, but they did use a GRU for part of their final ensemble: https://www.kaggle.com/competitions/amex-default-prediction/discussion/348111

The M6 forecasting competition finished recently, I'm looking forward to seeing what their winners did: https://m6competition.com/

hcarlens OP t1_jb9vu1f wrote on March 7, 2023 at 2:58 PM

Reply to comment by jamesmundy in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Yeah that one is really cool! They had an initial competition stage, open to everyone, where evaluation was done in a simulation environment (in software) as opposed to real robots. Competitors were given data from dozens of hours of actual robot interaction which they could use to train their policies.

The teams that qualified there made it through to the real robot stage. At that point they could submit their policies for weekly evaluation on actual robots - so they could have a few practice runs on the actual robots before the final leaderboard run.

WirrryWoo t1_jb9v31t wrote on March 7, 2023 at 2:52 PM

Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Thank you for the analysis! Any insights on forecasting and time series related problems? Do most solutions use GRUs? Thanks!

TikiTDO t1_jb9thji wrote on March 7, 2023 at 2:41 PM

Reply to comment by alushamir in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust

That's interesting. More similarity than I expected.

That said, with my workflow I tend to not worry too much about dupes, since they are likely to end up with different labels focusing on different things. That said, my approach also requires a lot more manual steps and intervention, so I can definitely see how such a dedupe may help with the current setup.

In case anyone's interested, here's what I find works for me:

First I started with a few hundred manually annotated images. I then used those to fine tune a version of BLIP VQA.
Whenever I have new images, I have a script that will interrogate VQA for details about the picture (things like camera angle, number of people, the focus of the picture, and whether it satisfies any extra training criteria I have), and then get a gradCAM of key elements I may want to focus on. This will generate a JSON file with a lot of image information.
I can then use the JSON file along with a language model to generate multiple information dense prompts that should correspond with the image.
Based on my training goals at the time, I send an image into an generic approval queue where I can validate a few hundred images a day before sending it to my generic training location, in addition to that I may also send it into a specialised queue if I'm trying to train up a specific concept or idea. For example I'm working on hands at the moment. It can still obviously use some more work (It's still not sure what all the fingers are called and how they move), but there's no way I'd be able to get something like that out of vanilla SD 2.1. Note, it's also pretty important to have a good variety of related concepts in a specialised set; so for example, for hands you want old hands, young hands, man's hands, woman's hands, hand bones, hand muscles, pictures of people practising drawing hands, pictures of people doing things with hands, all annotated with some connecting terms, but also adding additional context that might not be available in other places.
I will alternate small number of higher lr training cycles with new concepts and a lower batch size, and then a long low lr run for the larger training set with a higher batch size. This way I can constantly validate if it's learning the ideas I want to, and then reinforce those ideas. This has the secondary bonus that once I've validated the individual concept I generally won't have to worry about it if I ever restart training, and even if I do I can always pick out a few hundred images to refine things.

It's obviously a much slower process than just scraping the internet for a bunch of images and shoving them into CLIP, but it's reliable enough that I have tens of thousands of images at this point, which gets me some really nice results.

Incidentally, with the gradCAM data I can also use higher res pictures, which I can subdivide in zoomed in portions for studying particular topics.

jamesmundy t1_jb9rdku wrote on March 7, 2023 at 2:25 PM

Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Really detailed analysis! For the Real Robot Challenge you mentioned (very cool!) were people able to test on the robot before the competition/during training?

hcarlens OP t1_jb9q3cm wrote on March 7, 2023 at 2:16 PM

Reply to comment by backhanderer in [R] Analysis of 200+ ML competitions in 2022 by hcarlens

Yeah, not just competitive ML but the research community as a whole seem to have almost entirely switched to PyTorch now (based on the Papers With Code data). I was expecting to see some people using JAX though!

Recent comments in /f/MachineLearning