Recent comments in /f/MachineLearning

JustOneAvailableName t1_j7v99gd wrote

If the model is sufficiently large (if not, you don't really need to wait long anyways) and no expensive CPU pre/postprocessing is done, the 3090 will be the bottleneck.

A single 3090 might not have enough memory to train GPT 2 large, but it's probably close.

Fully training a LLM on a single 3090 is impossible, but you could finetune one.

3

schludy t1_j7v11vj wrote

The individual steps sound ok, however, if you project 20.000 to 2D, the results you got look very reasonable. I'm not sure about UMAP, but I think for tSNE, it's recommended to have low dimensionality, something more in the order of 32 features. I would probably try to adjust the architecture, as other comments have suggested

1

mr_house7 t1_j7uza9c wrote

Reply to comment by sonofmath in [D] List of RL Papers by C_l3b

I'm so sorry to bother you again. Just one final question.
Do you know if the Spinning up algos are worth while? Since I'm on Windows it seems to be a little more changeling to install it in my local machine. Is there an alternative to installing on local machine like colab?

1

pommedeterresautee t1_j7uwa71 wrote

At start the weights will be moved on the GPU. Then during training, the tokenizer will convert your strings to a int64 tensors. They are quite light, and those are moved to GPU during training. What you need is not the fastest CPU but one which can feed your GPU faster that the data it will consume. In GPT2 case, CPU like 7700 won't be an issue. Image or sounds (TTS, ASR) may have more demanding preprocessing during training.

5

zanzagaes2 OP t1_j7uual3 wrote

Yes, that's a great idea. I guess I can use the encoder-decoder to create a very low-dimensional embedding and use the current one (~500 features) to find similar images to a given one, right?

Your perspective has been really helpful, thank you

2

Tober447 t1_j7uq41s wrote

You would take the output of a layer of your choice from the trained cnn (as you do now) and feed it into a new model, that is the autoencoder. So yes, the weights from your model are kept, but you will have to train the autoencoder from scratch. Something like CNN (only inference, no backprop) --> Decoder --> Latent Space --> Encoder for training and during inference you take the output of the decoder and use it for visualization or similarity.

4

zanzagaes2 OP t1_j7unuq2 wrote

May I use some part of the trained model to avoid retraining from scratch? The current model has very decent precision and I have generated some other visualizations for it (like heatmaps) so doing work around this model would be very convenient.

Edit: I have added an image of the best embedding I have found until now as a reference

1

zanzagaes2 OP t1_j7unjuw wrote

I have not found a very convincing embedding yet, I have tried several that go from ~500 features (class activation map) to ~20.000 features (output of last convolutional layer before pooling), all generated from the full training set (~30.000 samples)

In all cases I do the same, I use PCA to reduce vectors to 1.000 features and UMAP or t-SNE (usually try both) to get a 2d vector I can scatter plot. I have tried to use UMAP for the full process but it doesn't escalate well enough. Is this a good approach?

Edit: I have added an image of the best embedding I have found until now as a reference

1