Recent comments in /f/MachineLearning

bloc97 t1_j63q1nk wrote

>It's simpler (which leads to progress)

I wouldn't say current diffusions models are simpler, in fact they are much more complex than even the most "complex" GAN architectures. However it's exactly because of all the other points that they have become this complex. A vanilla GAN would never be able to endure this much tweaking without mode collapse. Compare that to even the most basic score-based models, which are always stable.

Sometimes, the "It just works™" proposition is much more appealing than pipeline simplicity or speed.

2

TFenrir t1_j63np8c wrote

chatGPT (so take it with many grains of salt)

> The paper is discussing a machine that can create new songs or music. They are testing to see if the machine is able to memorize songs or if it can come up with new ones. They are looking at how well the machine does when given different amounts of information to work with. They found that even when given a lot of information, the machine is not able to create exact copies of songs. However, it can create similar songs. They also found that when the machine is given very little information, the songs it creates are not very diverse. They include examples of the machine's output in the accompanying material.

30

anony_sci_guy t1_j63nj0u wrote

This was exactly my first thought too - free up all those extra parameters & re-randomize them. Problem could be that the re-randomized parameters will have a big gap in distribution between the pre-tuned and re-randomized weights, so you'd want different step sizes for them. I've played with it before & ran into this problem, but got too lazy to actually implement a solution. (I'm actually a biologist, so don't really have bandwidth to dig into the ML side as much)..

3

K3tchM t1_j63l7xu wrote

I don't know which numerical optimization OP is trying to solve, but one major weakness of this paper is that their method requires two solver calls per instance per epoch... Training time might quickly become intractable.

OP should have a look at other methods that aim to solve their problem efficiently, such as https://arxiv.org/abs/2112.03609 or recently https://arxiv.org/abs/2203.16067

1

tahansa t1_j63fqca wrote

"Is it a memorization machine or can it create new songs?"

​

From the paper:
"Memorization analysis. Figure 3 reports both exact and
approximate matches when the length of the semantic token
prompt is varied between 0 and 10 seconds. We observe
that the fraction of exact matches always remains very
small (< 0.2%), even when using a 10 second prompt to
generate a continuation of 5 seconds. Figure 3 also includes results for approximate matches, using τ = 0.85.
We can see a higher number of matches detected with this
methodology, also when using only MuLan tokens as input
(prompt length T = 0) and the fraction of matching examples increases as the length of the prompt increases. We
inspect these matches more closely and observe that those
with the lowest matching score correspond to sequences
characterized by a low level of token diversity. Namely, the
average empirical entropy of a sample of 125 semantic tokens is 4.6 bits, while it drops to 1.0 bits when considering
sequences detected as approximate matches with matching
score less than 0.5. We include a sample of approximate
matches obtained with T = 0 in the accompanying material.
Note that acoustic modeling carried out by the second stage
introduces further diversity in the generated samples, also
when the semantic tokens match exactly."

15

tahansa t1_j63f88w wrote

Incredible stuff.

Gotta get them copyright things solved with those visual NNs before these audio models hit the mainstream.

The progress of these audio models getting me much more stoked than those of the image models.

28

jackilion t1_j63e6ah wrote

There is no reason to assume your latent space will be smooth by itself. I remember a paper for image generation that had techniques for smoothing out the latent space that can be applied during training:

https://arxiv.org/abs/2106.09016

&#x200B;

It's about GANs, not autoencoders, but maybe you can find some ideas in there.

1

Thanos_nap OP t1_j63dyc3 wrote

There is a temporaral component. These customer actions are week wise. So the data is Customer ID, week number, action, converted yes or no.

I can get this in the 3d shape with time step as week, features = actions. But I'm confused what would be the batch here.

But yes, i agree with you this is not the best method for my use case!

2

suflaj t1_j63bf1q wrote

Well for starters, it would probably have worse performance due to so many redundant features, and it would be much slower.

Remember that the embedding layer carries loads of overhead, as we're talking V * d matrices. So for a corpus of 250k and embedding vector of 768, ex., we're talking about 192M parameters just for the embedding layer. Maybe you can save some space by having a sparse embedder, but find me a free implementation of sparse layers that work as well as dense ones. Other than that, the 192M parameters are, before compression techniques, equivalent to 768M. And that's just in memory, and the gradient, unless sparsified, will be 768M PER BATCH.

This is without mentioning that you would likely need to increase the embedding dim to account for the 8 times times bigger vocabulary.

2