Recent comments in /f/MachineLearning

Tyson1405 t1_j8i09j7 wrote

Hello,

Most of the time I only have my old laptop available without a dGPU and a 5 year old I7 dual core.

Training on the thing takes lots of time. What could you suggest for training models online? My datasets are often in the 2-10gb Range. I don’t have a problem to pay like 30-50 Euros monthly.

I heard colab pro was super good but since they changed to the compute units model it got pretty meh? Or is it still good? Otherwise I heard about paperclip.

What else can you recommend? I only want to train models online and then export them using joblib. I am also a Student just in Case there are some nice discounts.

Appreciate any help!

1

lwl t1_j8hoxpg wrote

Super interesting work, thank you for sharing! If you are still active on reddit - we noticed that the pdf is no longer available on arxiv, are you able to say why that is?

4

boadie t1_j8hhtwi wrote

In the opposite direction from your question is a very interesting project, TinyNN all implemented as close to the metal as possible and very fast: https://github.com/NVlabs/tiny-cuda-nn

Also in the vague neighbourhood of your question is the Triton compiler, while on the surface being a Python jit compiler is language coverage is much smaller than Python and you can view it as a small dsl, all the interesting bits are way below that level: https://openai.com/blog/triton/

1

andreichiffa t1_j8hf2th wrote

10% is what OpenAI considered as "good enough" for theirs, but the problem is with the fact that the detection is not uniform. Most neurodivergent folks will be misclassified as generative models, just as for people with social anxiety who tend to be wordy. Non-native and non-fluent English speakers are the other big false-positive triggers.

1

Disastrous_Elk_6375 t1_j8hdb2r wrote

Do you know if distilling will be possible after instruct finetuning and the RLHF steps? I know it works on "vanilla" models, but I haven't searched anything regarding distillation of instruct trained models.

2

Zondartul t1_j8hd29v wrote

The plan is to make it kinda good and train in (on industrial hardware) and then distill it down to a smaller model that ideally can fit in a consumer GPU. It's going to be big at first but they do want to make it small eventually.

5

trnka t1_j8hcpwt wrote

I've been learning more about multilingual neural machine translation models lately such as the one in Google's recent paper:

Bapna, A., Caswell, I., Kreutzer, J., Firat, O., van Esch, D., Siddhant, A., Niu, M., Baljekar, P., Garcia, X., Macherey, W., Breiner, T., Axelrod, V., Riesa, J., Cao, Y., Chen, M. X., Macherey, K., Krikun, M., Wang, P., Gutkin, A., … Hughes, M. (2022). BUILDING MACHINE TRANSLATION SYSTEMS FOR THE NEXT THOUSAND LANGUAGES

I'm not sure I understand why it works for languages with no parallel data with any language though.... for instance Latinized Hindi doesn't have any parallel data. Why would the encoder or decoder representations of Latinized Hindi be compatible with any other language?

Is it because byte-pair encoding is done across languages, and that Latinized Hindi will have some word overlap with languages that DO have parallel data? So then it's encouraging the learning algorithm to represent those languages in the same latent space?

2

andreichiffa t1_j8hawd4 wrote

I have reported to Huggingface what its detector was used for and its failure modes (hint:false positives are worse). In the first days of December. They decided to keep it up. It’s on their consciousness.

Same thing with API providers. Those willing to sell you one are selling you snake oil. It’s on their consciousness.

Same thing for you. You want to build an app that sells snake oil that can be harmful in a lot of scenarios? It’s on your consciousness.

But at that point you even don’t need an API to build it.

1