Recent comments in /f/MachineLearning
_Ruffy_ t1_j635zdc wrote
Reply to comment by nmfisher in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Good idea in principle, anyone know more about this or any references?
Blutorangensaft OP t1_j6356ho wrote
Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Compare different autoencoders in their ability to create valid language in a continuous space. Later, I want to generate sentences in its latent space by using another neural network, and have them decoded to real sentences by the autoencoder. I want the space to be smooth because the second neural net will naturally be using gradient descent, which involves infinitesimal changes. I believe this network will perform better if the changes that happen actually represent meaningful distances between real sentences.
InsidiousApe t1_j634sxw wrote
Reply to [D] Simple Questions Thread by AutoModerator
I enjoy that this is the simple questions thread. :)
Let me ask something much simpler, although in three parts. I am a web developer with no ML experience, but with a specific project in mind. I'd like to understand the process a touch better in order to help me find a programmer to work alongside (paid of course).
(1) Provided the information is easily found via API for instance, what is the ingestion process like time wise for very large amounts of information? I realize that is subjective to the physical size of the data, but are there other things going on which take time in that process?
(2) In order to program a system to look for correlations in data where no one may have seen them before, what is the process used to do this? This is what I'm truly looking to do once that information is taken in. For example, a ton of (HIPAA Compliant) medical information is taken in and I'm looking to build a system that can look for commonalities of people with a thyroid tumor. Obviously tons of tweaking to those results, but what is the process which allows this to happen?
jackilion t1_j634fkx wrote
Reply to comment by Blutorangensaft in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
What's the point of this score?
Blutorangensaft OP t1_j6344jf wrote
Reply to comment by crt09 in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Ahh, I get you now, my apologies. I'm more interested in the performance on the decoding side indeed, because I want to later generate sentences in that latent space with another neural net and have them decoded to normal tokens.
crt09 t1_j633u7c wrote
Reply to comment by Blutorangensaft in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
I think there's miscommunication, it sounds like you think I'm proposing a training method but I'm suggesting how to measure smoothness.
If you have the BLEU distances between input sentences and the distances between their latents, you can see measure how the distances change between the two which I *think* would indicate smoothness. Or you could do some other measurements on the latents to see how smoothly(?) they are distributed? tbh I'm not entirely sure what you mean by smooth, sorry.
If you're looking to measure performance wouldn't that loss for the training method you be mentioned be useful?
Or are you looking for measuring performance on decoding side?
Blutorangensaft OP t1_j632b2s wrote
Reply to comment by crt09 in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
Using slightly different sentences to be decoded to the same sentence exists as an idea in the form of denoising autoencoders, yes. I plan to use this down the road, but for now I am interested in thinking about measuring performance.
crt09 t1_j631rr5 wrote
Reply to [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
Here's 'Language modelling with pixels'! https://openreview.net/pdf?id=FkSp8VW8RjH It gets close to BERT in English peformance. It does better in other languages but that's probably only cause BERT wasn't trained much on them afaik. but still! its apparently much more viable thought
crt09 t1_j6317t4 wrote
Just speaking from gut here but you could go the other way around and get sentences with varying BLEU differences, encode them all and see how distance their latent representations are, this way you wouldnt have to worry about the effect of the validity of the generated sentences which might be a problem with the other way around (I think)
ThatInternetGuy t1_j6300ue wrote
Reply to [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo
Stable Diffusion is made up of a VAE image encoder, CLIP text encoder, U-Net which was trained in a transformer/diffusion process.
GAN-based text2image is made up mainly of ResNet which was trained using a generator+discriminator process.
IMO, you're looking for differences between U-Net and ResNet. There are a few differences:
- Training a ResNet in that fashion is much more unpredictable.
- With ResNet, you have to code a good custom discriminator (the component that scores the output images) for your specific model. With U-NET, the diffusion process will take care of all by itself.
- ResNet output is limited to 128x128. (Maybe scalable tho)
- Scaling a ResNet doesn't necessarily make it more capable; its performance doesn't scale up to the amount of training data. A U-Net can scale as big as the VRAM allows and will take advantage of more training data.
For the big guys, really, they need that last bullet point. They want a model that can scale up with the amount of training data so that they can just throw more powerful hardware to achieve more competitive results. A GAN can cost several thousand dollars to train and that would hit its performance ceiling too soon. A Latent Diffusion model can cost as much as you can afford and its performance will gradually improve with more resources thrown at it.
JustOneAvailableName t1_j62zkw3 wrote
Reply to comment by angkhandelwal749 in [Discussion] Github like alternative for ML? by angkhandelwal749
Check mymlops.com
Thanos_nap OP t1_j62ytd9 wrote
Reply to comment by geldersekifuzuli in [P] Building a LSTM based model for binary classification by Thanos_nap
Thanks, will check this out 👍🏻
angkhandelwal749 OP t1_j62ysay wrote
Reply to comment by Acceptable-Cress-374 in [Discussion] Github like alternative for ML? by angkhandelwal749
Yes, but it needs to be configured a lot. Also, does it do automatic data versioning everytime it detects a major change or we need to commit each time?
Thanos_nap OP t1_j62yrcn wrote
Reply to comment by guava-bandit in [P] Building a LSTM based model for binary classification by Thanos_nap
Yes, this is helpful. Thank you.
So to give you a idea of the actions, it has actions from our end and customer action (for marketing): Email / SMS / etc communication from our end Email open/sms clicked by customer
Transaction data actions: Bought x on date and time, bought y on date and time, etc.
All of it is arranged as per the timestamp of that action.
.fit() part I'm passing the data in same manner as you mentioned but not sure why the error is still there. Will check the tutorial someone else has posted!
angkhandelwal749 OP t1_j62yil7 wrote
Reply to comment by Delicious-View-8688 in [Discussion] Github like alternative for ML? by angkhandelwal749
Also, does any one of them do automatic versioning?
nmfisher t1_j62y29r wrote
Reply to [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Slight tangent - has anyone ever tried "fine-tuning" a large speech recognition model (e.g. Whisper) by feeding a training set and pruning activations? The idea being that only a subset of weights/activations are necessary for a given speaker/dataset, so you can compress a larger model into a smaller one (and then continue retraining conventionally) that performs equally well for a given subset of data. Presumably this would require some degree of sparsity to begin with?
Individual-Cause-616 OP t1_j62v04b wrote
Reply to comment by plocco-tocco in [D] score based vs. Diffusion models by Individual-Cause-616
Diffusion I guess
Delicious-View-8688 t1_j62uxhi wrote
Reply to comment by angkhandelwal749 in [Discussion] Github like alternative for ML? by angkhandelwal749
databricks is a platform that does it all. but technically, Azure, AWS, and GCP all do it too
angkhandelwal749 OP t1_j62urqo wrote
Reply to comment by curiousshortguy in [Discussion] Github like alternative for ML? by angkhandelwal749
>https://adataanalyst.com/wp-content/uploads/2021/05/Infra-Tooling3.png
Understood! Thanks so much for that - also wanted to understand at core the thinking process of an ML engineer - what parameters do they prioritise while choosing a tool - like user experience or service? lot of features or just few quality features done well?
angkhandelwal749 OP t1_j62uk80 wrote
Reply to comment by Delicious-View-8688 in [Discussion] Github like alternative for ML? by angkhandelwal749
Is there no platform which does it all? Would WandB cut it?
Acceptable-Cress-374 t1_j62ql8t wrote
Reply to comment by lookatmetype in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
Stable diffusion with proper hands? :)
Acceptable-Cress-374 t1_j62qh5g wrote
Reply to comment by ElectronicCress3132 in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents
Thank you for putting it into words, I was having trouble understanding this myself.
geldersekifuzuli t1_j62obhr wrote
This is a great tutorial here. https://youtu.be/ZrgVlfNduj8
He shared the codes as well. This can be replicated for your own data set pretty easily. Two days is more than enough if you just replicate this work for your own dataset.
Reminder : Glove is an lstm based model.
conv3d t1_j62o8yn wrote
I can’t believe nobody has mentioned MLFlow
kanink007 t1_j638gwv wrote
Reply to comment by AlmightySnoo in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon
Any Info about AMD APUs? By now I gave up hoping for AMD making ROCm available for APUs. I dont know much about Triton: Does it support APUs like 5600g?