timdettmers t1_j4mfjbw wrote on January 16, 2023 at 7:20 PM

Reply to comment by tripple13 in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

I thought about making this recommendation, but the next generation of GPUs will not be much better. You probably need to wait until about 2027 for a better GPU to come along. I think for many waiting 4 years for an upgrade might be too long, so I recommend mostly buying now. I think the RTX 40 cards are a pretty good investment that will last a bit longer than previous generations.

avocadoughnut t1_j4mci2y wrote on January 16, 2023 at 7:01 PM

Reply to comment by Acceptable-Cress-374 in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

ChatGPT is GPT3 + instructional finetuning + RLHF for alignment. If you're talking about using those models ro gather training data, that's against OpenAI TOS, so I've heard. The goal is to make something that isn't closed source, something you can run yourself.

Immediate-Tailor-275 t1_j4mc45h wrote on January 16, 2023 at 6:59 PM

Reply to Apple AI Residency 2023 [R] by Extension-Reward5756

Didn’t hear back anything either:(

init__27 OP t1_j4mb9ga wrote on January 16, 2023 at 6:54 PM

Reply to comment by tripple13 in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

If its any help, u/tripple13 I bought 2 3090s recently at a "discounted" rate (1400$ compared to 3500$ that I paid when they were kings), I'm really happy with it.

BUT OFCOURSE I ALSO WISH I HAD THE LATEST & FASTEST ONES :')

Zondartul t1_j4mb6rm wrote on January 16, 2023 at 6:53 PM

Reply to comment by Acceptable-Cress-374 in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

So using a big network to teach a small network? That's a thing people do. See teacher-student learning, and distillation.

init__27 OP t1_j4mb2iy wrote on January 16, 2023 at 6:53 PM

Reply to comment by BeatLeJuce in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

The author is hanging out and collecting feedback on here. I'm sure he'll correct it in an update.

Maybe I'm too in the roots but if I were in the author's shoes I would assume as well that the reader would know of these terms and cards.

init__27 OP t1_j4mavhs wrote on January 16, 2023 at 6:51 PM

Reply to comment by timdettmers in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

Oh wow, Great to see you here as well Tim 🙏

As a Kaggler, the usage for my case varies extensively, if I end up in a Deep Learning competition, for 1-2 months, the usage usually is around 60-100% I would like to say.

I know many top Kagglers that compete year around, I would vaguely guess their usage is the highest in %

Senko812 t1_j4mat77 wrote on January 16, 2023 at 6:51 PM

Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice

Send me a direct message

DaLameLama t1_j4mamhy wrote on January 16, 2023 at 6:50 PM

Reply to [D] What kinds of interesting models can I train with just an RTX 4080? by faker10101891

Relevant: https://arxiv.org/abs/2212.14034

>Cramming: Training a Language Model on a Single GPU in One Day

BeatLeJuce t1_j4m9pp6 wrote on January 16, 2023 at 6:44 PM

Reply to [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

Overall nice, but the article also uses some expressions without ever explaining them. For example: What is H100, and what is A100. Somewhere in the Article, it says that H100=RTX40 cards, somewhere else it says A100 is a RTX40 card. Which is which?

Also, what is TF32? It's an expression that appears in a paragraph without explanation.

tripple13 t1_j4m7ykq wrote on January 16, 2023 at 6:34 PM

Reply to [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

Well, somehow I expected TD's conclusion to be "Skip the current gen, wait for newer gen"

And yet, here we are.

Acceptable-Cress-374 t1_j4m7mee wrote on January 16, 2023 at 6:32 PM

Reply to comment by avocadoughnut in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

> Their current goal is to develop interfaces to gather data, and then train a model using RLHF

Potentially naive question, as I don't have much experience with LLMs. Has anyone tried using existing SotA (paid) models like davinci / gpt3 instead of RLHF? They seem to be pretty good at a bunch of focused tasks, especially in few-shot. Does that make sense?

farox t1_j4m771b wrote on January 16, 2023 at 6:29 PM

Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice

I can't tell you about ML specifically, but maybe some useful pointers for freelancing in general. I've been in software for ~25 years, 15 or so freelancing.

First thing is that as a freelancer you're not part of "the team". This can be good or bad for you, I think it's fantastic. No dealing with political bs, I charge hourly, so no gorging with overtime etc.

But that's it. You're a tool to do a job and then leave (in theory).

In my experience most small companies won't have use for you. For one, you'll be more expensive than their employed staff, but they also want to keep that know how in house.

Mid to large companies is where you will get the most traction. However they see you as a tool. So they don't want to hire you specifically, but "an ML engineer with 6 YoE". So they outsource that problem to a recruiter or similar agency. This is for the case that you get hit by a bus, they make a phone call and get a fresh body.

So far I only had good experiences with these agencies, pay is good, it's professional and shit just gets done and you paid.

The other option is going through your network. As you have more work experience you should be able to build that and then lean on it if you have more capacity, read: looking for a job. Then you're more likely to find a smaller business because they are interested in getting you on board.

I tried my hands on those fancy new websites as well, with the same result. The problem here is also that you're more likely to compete with some kid in India that charges 1/10th of your rate.

Another thing to keep in mind: Do not go into this for the money. If you factor everything in: Vacation, sick days, hardware, licenses, pension/retirement (rule of thumb: 30% of your net income) etc. it doesn't come out that far apart.

TLDR: Computer Futures, Hays that sort of company or through your network

SupplyChainPhd t1_j4m4yb9 wrote on January 16, 2023 at 6:15 PM

Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice

I’ve been working on getting some data science work secured and should have a few things coming up (3-6 months, maybe sooner). Let’s chat

OJMofo t1_j4m4f0g wrote on January 16, 2023 at 6:12 PM

Reply to [D] The Illustrated Stable Diffusion (Video) by jayalammar

Great video! Comprehensive overview that’s digestible for the target audience.

avocadoughnut t1_j4m12v2 wrote on January 16, 2023 at 5:51 PM

Reply to [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws

There's currently a project in progress called OpenAssistant. It's being organized by Yannic Kilcher and some LAION members, to my understanding. Their current goal is to develop interfaces to gather data, and then train a model using RLHF. You can find a ton of discussion in the LAION discord. There's a channel for this project.

EmbarrassedHelp t1_j4lyssq wrote on January 16, 2023 at 5:37 PM

Reply to comment by dmart89 in [D] Can ChatGPT flag it's own writings? by MrSpotgold

The digital watermark though risks damaging the model outputs, and would rendered useless when changing generated the text output yourself.

timdettmers t1_j4lvr7i wrote on January 16, 2023 at 5:19 PM

Reply to comment by lostmsu in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

I like this idea! I already factored in fixed costs for building a desktop computer but the electricity is also an important part of the overall cost especially if you compare it to cloud options.

I am currently gathering feedback to update the post later. I think it's quick to create a chart based on this data and create an update later today.

The main problem to estimate cost is to get a good number on the utilization time of GPUs for the average user. For PhD students, the number was about 15% utilization (fully using a GPU 15% of total time). This means, with an average of 60 watt idle and 350 watt max for a RTX 4090: 60 watt * 0.85 + 350 watt * 0.15=103.5 watt. That is 906 kWh per year or about $210 per year per RTX 4090 (assuming US average is 0.23 cents per kWh).

Does that look good to you?

I think its quick to create a chart based on this data and create an update later today.

Edit: part of this seemed to got lost in editing. Oops! I re-added the missing details.

MysteryInc152 t1_j4lv0d5 wrote on January 16, 2023 at 5:14 PM

Reply to [D]: Are there models like CODEX but work in a reversed way? by GoodluckH

Codex and chatGPT can understand more than just functions. The issue with them is the limited token window.

lostmsu t1_j4lrt8a wrote on January 16, 2023 at 4:54 PM

Reply to [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27

Performance/$ characteristic needs an adjustment based on longevity * utilization * electricity cost. Assuming you are going to use card for 5 years at full load, that's $1000-$1500 in electricity at 1$ per year per 1W of constant use (12c/kWh). This would take care of the laughable notion, that Titan Xp is worth anything, and sort cards much closer to their market positioning.