Recent comments in /f/MachineLearning

ggf31416 t1_j9a8p88 wrote

3070 and 3060ti both have 8GB, and while the 3070 will be a bit faster, most people will agree that the difference is not worth the price if you have a tight budget.

For training the extra 4GB from the plain 3060 is quite useful, but for inference only you can run most small and medium models (such as stable diffusion) in 8GB and the 3060ti will be faster.

2

lemurlemur t1_j9a8gb7 wrote

Reply to comment by BarockMoebelSecond in [D] Please stop by [deleted]

Yes, this is how science works - you make a claim and show proof.

This is NOT how developing an idea works though, and this subreddit exists in part to help develop ideas. Developing an idea requires entertaining ideas that are not fully formed, and yes this includes some ideas that may seem stupid or wrong.

−1

avocadoughnut t1_j9a64k1 wrote

Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.

16

ArmagedonAshhole t1_j9a1vq3 wrote

it depends mostly on settings so no.

Small context like 200-300 tokens could work with 24GB but then your AI will not remember and connect dots well which would make model worse than 13B

People are working right now on spliting work between gpu(vram) and cpu(ram) in 8bit mode. I think like 10% to RAM would make model work well on 24GB vram card. IT would be a bit slower but still usable.

If you want you can always load whole model to ram and run it via cpu but it is very slow.

12

ggf31416 t1_j99y9e1 wrote

https://timdettmers.com/2023/01/30/which-gpu-for-deep-learning/

https://lambdalabs.com/gpu-benchmarks

How much VRAM you need will depend mostly on the number of parameters of the model with some extra for the data. At FP32 precision each parameter needs 4 bytes, at FP16 or BF16 2 bytes, and at FP8 or INT8 only one byte. Almost all models can be run at FP16 without noticeable accuracy loss, FP8 sometimes works, sometimes it doesn't depending on the model.

3