Recent comments in /f/MachineLearning
Rockingtits t1_j9afl0a wrote
Reply to [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Why not look into distilled models like DistilBERT
ggf31416 t1_j9a8p88 wrote
Reply to comment by DevarshTare in [D] What matters while running models? by DevarshTare
3070 and 3060ti both have 8GB, and while the 3070 will be a bit faster, most people will agree that the difference is not worth the price if you have a tight budget.
For training the extra 4GB from the plain 3060 is quite useful, but for inference only you can run most small and medium models (such as stable diffusion) in 8GB and the 3060ti will be faster.
lemurlemur t1_j9a8gb7 wrote
Reply to comment by BarockMoebelSecond in [D] Please stop by [deleted]
Yes, this is how science works - you make a claim and show proof.
This is NOT how developing an idea works though, and this subreddit exists in part to help develop ideas. Developing an idea requires entertaining ideas that are not fully formed, and yes this includes some ideas that may seem stupid or wrong.
DevarshTare OP t1_j9a7h2p wrote
Reply to comment by TruthAndDiscipline in [D] What matters while running models? by DevarshTare
Thanks a lot!
DevarshTare OP t1_j9a7gap wrote
Reply to comment by ggf31416 in [D] What matters while running models? by DevarshTare
Appreciate it! This gave me a better picture. I was stuck between 3060 ti and 3070. In this case 3060 ti is the logical option. I will be using the Colab for training it, and can probably optimise it to run with 8 Gb, if I'm not wrong?
avocadoughnut t1_j9a64k1 wrote
Reply to comment by gliptic in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.
[deleted] t1_j9a56tc wrote
Reply to [R] [N] In this paper, we show how a conversational model, 3.5x smaller than SOTA, can be optimized to outperform the baselines through Auxiliary Learning. Published in the ACL Anthology: "Efficient Task-Oriented Dialogue Systems with Response Selection as an Auxiliary Task." by radi-cho
[removed]
thecodethinker t1_j9a4mvo wrote
Reply to comment by synth_mania in [R] neural cloth simulation by LegendOfHiddnTempl
Yeah, exactly my point about image classification. We’ve had it for a long time already.
easy_peazy t1_j9a2xe0 wrote
Reply to comment by guaranteednotabot in [D] Simple Questions Thread by AutoModerator
I’m not sure what the time complexity is
Disastrous_Elk_6375 t1_j9a2877 wrote
Reply to comment by ArmagedonAshhole in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Thanks!
ArmagedonAshhole t1_j9a1vq3 wrote
Reply to comment by Disastrous_Elk_6375 in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
it depends mostly on settings so no.
Small context like 200-300 tokens could work with 24GB but then your AI will not remember and connect dots well which would make model worse than 13B
People are working right now on spliting work between gpu(vram) and cpu(ram) in 8bit mode. I think like 10% to RAM would make model work well on 24GB vram card. IT would be a bit slower but still usable.
If you want you can always load whole model to ram and run it via cpu but it is very slow.
[deleted] t1_j9a12np wrote
Reply to [R] neural cloth simulation by LegendOfHiddnTempl
[removed]
guaranteednotabot t1_j9a0q74 wrote
Reply to comment by easy_peazy in [D] Simple Questions Thread by AutoModerator
Say there’s a model with double the parameter, will it take twice as long to process?
snowpixelapp t1_j99zc4b wrote
Reply to comment by sam__izdat in [P] I've been commissioned to make 1000+ variations of my unique geometric art, while retaining its essential characteristics. It's been suggested that I use GAN to create permutations of my art. Any advice/directions? by eternalvisions
In my experiments, I have found dreambooth implementation by diffusers to be not good. There are many alternatives for it though.
[deleted] t1_j99ycwk wrote
Reply to comment by [deleted] in [D] Lack of influence in modern AI by I_like_sources
[removed]
ggf31416 t1_j99y9e1 wrote
Reply to [D] What matters while running models? by DevarshTare
https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/
https://lambdalabs.com/gpu-benchmarks
How much VRAM you need will depend mostly on the number of parameters of the model with some extra for the data. At FP32 precision each parameter needs 4 bytes, at FP16 or BF16 2 bytes, and at FP8 or INT8 only one byte. Almost all models can be run at FP16 without noticeable accuracy loss, FP8 sometimes works, sometimes it doesn't depending on the model.
gliptic t1_j99y0cp wrote
Reply to [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
RWKV can run on very little VRAM with Rwkvstic streaming and 8-bit. I've not tested streaming, but I expect it's a lot slower. 7B model sadly takes 8 GB with just 8-bit quantization.
Disastrous_Elk_6375 t1_j99xxfa wrote
Reply to comment by ArmagedonAshhole in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Are there some rough numbers on prompt size vs. ram usage after the model load? I haven't played yet with GPT-NeoX
[deleted] t1_j99xtz1 wrote
Reply to comment by I_like_sources in [D] Lack of influence in modern AI by I_like_sources
[removed]
[deleted] t1_j99xtwi wrote
Reply to comment by NotARedditUser3 in [P] Looking to use Chat-GPT for your business? Data-Centric Fine-Tuning Is All You Need! by Only-Caterpillar4057
[removed]
derek_ml t1_j99xa9y wrote
Reply to comment by millenial_wh00p in [R] Using AI/ML for Quality Control for a factory? by aumzzzz
I'm gonna start sprinkling cinnamon on my GPU. Apparently that's what has been missing!
XecutionStyle t1_j99vve7 wrote
Reply to [D] Lack of influence in modern AI by I_like_sources
You show very specific issues that if you count all of them, is influence.
TruthAndDiscipline t1_j99vabp wrote
Reply to [D] What matters while running models? by DevarshTare
VRAM has no effect on speed, but if you don't have enough to load model and data, you can't train (CUDA out of memory error).
For performance just look for performance charts.
Disastrous_Elk_6375 t1_j99ujv1 wrote
Reply to comment by head_robotics in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
add this to your .from_pretrained("model" , device_map="auto", load_in_8bit=True)
Transformers does the rest.
[deleted] t1_j9ahqkw wrote
Reply to [R] Using AI/ML for Quality Control for a factory? by aumzzzz
[deleted]