Recent comments in /f/MachineLearning

Blutorangensaft OP t1_j6356ho wrote

Compare different autoencoders in their ability to create valid language in a continuous space. Later, I want to generate sentences in its latent space by using another neural network, and have them decoded to real sentences by the autoencoder. I want the space to be smooth because the second neural net will naturally be using gradient descent, which involves infinitesimal changes. I believe this network will perform better if the changes that happen actually represent meaningful distances between real sentences.

1

InsidiousApe t1_j634sxw wrote

I enjoy that this is the simple questions thread. :)

Let me ask something much simpler, although in three parts. I am a web developer with no ML experience, but with a specific project in mind. I'd like to understand the process a touch better in order to help me find a programmer to work alongside (paid of course).

(1) Provided the information is easily found via API for instance, what is the ingestion process like time wise for very large amounts of information? I realize that is subjective to the physical size of the data, but are there other things going on which take time in that process?

(2) In order to program a system to look for correlations in data where no one may have seen them before, what is the process used to do this? This is what I'm truly looking to do once that information is taken in. For example, a ton of (HIPAA Compliant) medical information is taken in and I'm looking to build a system that can look for commonalities of people with a thyroid tumor. Obviously tons of tweaking to those results, but what is the process which allows this to happen?

1

crt09 t1_j633u7c wrote

I think there's miscommunication, it sounds like you think I'm proposing a training method but I'm suggesting how to measure smoothness.

If you have the BLEU distances between input sentences and the distances between their latents, you can see measure how the distances change between the two which I *think* would indicate smoothness. Or you could do some other measurements on the latents to see how smoothly(?) they are distributed? tbh I'm not entirely sure what you mean by smooth, sorry.

If you're looking to measure performance wouldn't that loss for the training method you be mentioned be useful?

Or are you looking for measuring performance on decoding side?

1

crt09 t1_j6317t4 wrote

Just speaking from gut here but you could go the other way around and get sentences with varying BLEU differences, encode them all and see how distance their latent representations are, this way you wouldnt have to worry about the effect of the validity of the generated sentences which might be a problem with the other way around (I think)

1

ThatInternetGuy t1_j6300ue wrote

Stable Diffusion is made up of a VAE image encoder, CLIP text encoder, U-Net which was trained in a transformer/diffusion process.

GAN-based text2image is made up mainly of ResNet which was trained using a generator+discriminator process.

IMO, you're looking for differences between U-Net and ResNet. There are a few differences:

  • Training a ResNet in that fashion is much more unpredictable.
  • With ResNet, you have to code a good custom discriminator (the component that scores the output images) for your specific model. With U-NET, the diffusion process will take care of all by itself.
  • ResNet output is limited to 128x128. (Maybe scalable tho)
  • Scaling a ResNet doesn't necessarily make it more capable; its performance doesn't scale up to the amount of training data. A U-Net can scale as big as the VRAM allows and will take advantage of more training data.

For the big guys, really, they need that last bullet point. They want a model that can scale up with the amount of training data so that they can just throw more powerful hardware to achieve more competitive results. A GAN can cost several thousand dollars to train and that would hit its performance ceiling too soon. A Latent Diffusion model can cost as much as you can afford and its performance will gradually improve with more resources thrown at it.

−4

Thanos_nap OP t1_j62yrcn wrote

Yes, this is helpful. Thank you.

So to give you a idea of the actions, it has actions from our end and customer action (for marketing): Email / SMS / etc communication from our end Email open/sms clicked by customer

Transaction data actions: Bought x on date and time, bought y on date and time, etc.

All of it is arranged as per the timestamp of that action.

.fit() part I'm passing the data in same manner as you mentioned but not sure why the error is still there. Will check the tutorial someone else has posted!

1

nmfisher t1_j62y29r wrote

Slight tangent - has anyone ever tried "fine-tuning" a large speech recognition model (e.g. Whisper) by feeding a training set and pruning activations? The idea being that only a subset of weights/activations are necessary for a given speaker/dataset, so you can compress a larger model into a smaller one (and then continue retraining conventionally) that performs equally well for a given subset of data. Presumably this would require some degree of sparsity to begin with?

29

angkhandelwal749 OP t1_j62urqo wrote

>https://adataanalyst.com/wp-content/uploads/2021/05/Infra-Tooling3.png

Understood! Thanks so much for that - also wanted to understand at core the thinking process of an ML engineer - what parameters do they prioritise while choosing a tool - like user experience or service? lot of features or just few quality features done well?

1