Recent comments in /f/MachineLearning

zzzthelastuser t1_j7ulu8h wrote

> CUDA graphs require us to capture a graph per input tensor shape, there is a non-negligible warmup time. We measure around 10mn on 2 different machines / GPUs (down from 50mn in our previous Kernl version). One user reported with the new version a bit more than 20mn of warmup time. We are aware of obvious ways to decrease it significantly.

Dumb question, but what's mn? millineconds?

10

pommedeterresautee OP t1_j7uk761 wrote

I just discovered the project https://github.com/ggerganov/whisper.cpp

As written in another comment, there is no way for (recent) CPU (even ARM ones) to be as fast as (recent) GPU on such big model (the list no GPU support in limitations).

https://www.reddit.com/r/MachineLearning/comments/10xp54e/comment/j7tk4fx/?utm_source=share&utm_medium=web2x&context=3

That being said, the project looks super cool, tks for the pointer (I ordered a M2 Max, lots of fun to come :-) )

3

Available_Lion_652 OP t1_j7ue7pj wrote

My motherboard is quite old and the best CPU that I can attach yo it is a i7 7700k. From what I have read, if I will process the dataset before training, than it should not bottleneck. But what I was think was that the preprocessed dataset is held in 32 GB of RAM. The CPU transfers data from RAM to GPU memory. It has only 8 threads. Let s say I want to train from scratch a GPT2. I do not know exactly how much the CPU/RAM frequency will bottleneck the training process. I fon t want to change my whole hardware. If 3090 RTX is to performant and the bottleneck is to high, I was wondering if I can buy a 3060/3080

1

blackkettle t1_j7ud34i wrote

Are you talking about this paper:

- https://cdn.openai.com/papers/whisper.pdf

maybe I missed it but I can't find any place in that paper where they talk about the trade-offs with respect to real time factor and decoding strategies. RTF vs acc curves for CPU vs GPU for STT typically vary not in terms of absolute performance but in terms of where along the RTF curve you achieve a particular accuracy. That impacts what kinds of tasks you can expect to use the model for, and how you can expect to scale it to real world applications. So far this has been the weakest point for all the Whisper related work (still better off with espnet, k2, speechbrain, etc). This information would be interesting to see if they have it.

2

leventov t1_j7ubimw wrote

Top AI researchers (Yoshua Bengio, Yann LeCun) are essentially cognitive scientists. By "cognitive science", I mean here general theories of cognition, not human cognition. If you watch any recent talk by Bengio (example), you recognise that it's a talk about cognitive science at least as much as it is about AI. From his talks, you could also roughly sense the type of problems these researchers are solving when they move to the level of thinking about cognitive science.

Theories of cognitive science and ML/DL form an "abstraction-grounding" stack:
general theories of cognition (intelligence, agency) ->
general theories of DNN working in runtime ->
interpretability theories for a concrete DNN architecture.

1

Tober447 t1_j7u90qp wrote

You could try an autoencoder with CNN layers and a bottleneck of 2 or 3 neurons to be able to visualize these embeddings. The autoencoder can be interpreted as non-linear PCA.

​

Also, similarity in this embedding space should correlate with similarity of the real images/whatever your CNN extracts from the real images.

5

blackkettle t1_j7u2kd0 wrote

Probably my question was not well-formulated. I'm just curious about what the RTF vs Accuracy tradeoff looks like. I'm not questioning whether it works, I'm just curious what the actual performance looks like.

You report on memory usage and beam sizes, as well as relative speedup, but it would be interesting to also see WER performance, as well as the actual absolute RTFs.

2