Recent comments in /f/MachineLearning

MrCheeze t1_j661b7r wrote

This seems like a major increase in quality compared to past attempts. And with long term coherency too, check out those 5 minute tracks.

And if that wasn't up, we even got an additional mode that lets you provide a melody of your own and ask for an arrangement. Should be very useful for composition.

Assuming that these results aren't cherrypicked or otherwise misleading, I'd be very excited to try to make music with an open replication of this.

9

feloneouscat t1_j65vzjx wrote

>Make some minor grammar mistakes while writing the post.

Huh. So you told it to do something it wouldn’t ordinarily do.

This seems akin to salesman who took a sledge to a product and then argued that it breaks in the field (true story). When you leave that off, does the paragraph get caught? Or did you muck about to find something that assured it would think it was human generated?

1

albertzeyer t1_j65rtdq wrote

What do you mean? There are many such papers where people only use attention-based encoder-decoder (AED) for speech recognition. Some random papers:

See my Phd thesis for some overview over CTC, AED, RNN-T and other approaches: https://www-i6.informatik.rwth-aachen.de/publications/download/1223/Zeyer--2022.pdf

I call this "sequence-to-sequence architecture".

I think most people nowadays use RNN-T.

Some people use CTC just because of its simplicity, and also it might be more stable, behave more sane on long sequences, where AED might break, and online streaming is simpler than AED.

AED is clearly better than CTC. But RNN-T is also better than CTC.

Of course, a combination is yet better than both. So AED+CTC is better than both AED or CTC alone. And ESPnet, a very popular open source framework, has this implemented, so many people just use that.

2