Recent comments in /f/MachineLearning

blackkettle t1_j7tyq1r wrote

This is very interesting, thanks for sharing! Do you have any more detail on RTF vs Accuracy curves? Also did you run this on any other data sets? Librispeech - even the “other” pieces is very clean, simple data from an acoustic and linguistic standpoint.

It would be really interesting to see how well this holds on noisy spontaneous speech like conversations.

4

mongoosefist t1_j7tt6a2 wrote

Emergent behaviour is called such because we don't yet have the ability to predict it, we can only observe it and deduce where it emerged after the fact. SO, the fact that you can't wrap your head around what such an ability would look like makes perfect sense!

If we're speculating I'd put my money on /u/ID4gotten 's answer. I bet one of these models starts integrating some intuition of physical laws.

13

logsinh t1_j7tsvku wrote

Anyway, here is the denoised audio of your example speech: https://www.sndup.net/pbxf/. There is no improvement, your best bet is audio super-resolution.

Input: Speech MOS: 4.259 Noise MOS: 4.369 Overall MOS: 3.927

Output: Speech MOS: 4.263 Noise MOS: 4.403 Overall MOS: 3.947

2

logsinh t1_j7tqjmm wrote

The audio is a bit distorted possibly due to noise gating. I don't see too much noise, so maybe noise reduction is not what you need. The audio has 8 kHz bandwidth (16 kHz sample rate), maybe you may try to use an audio super-resolution network such as https://github.com/mindslab-ai/nuwave2 to increase the audio bandwidth.

3

Initial-Image-1015 t1_j7tqegx wrote

Since there are so many subfields and subsubfields, each of which with an overwhelming amount of research, pick the domain you're interested in an follow the authors of the most relevant papers in that domain on Twitter. Over time, you will naturally find more and more related authors to also follow.

For the other domains, general newsletters give good overviews.

1

programmerChilli t1_j7tpwd7 wrote

Lots of things. You can see a flamegraph here: https://horace.io/img/perf_intro/flamegraph.png (taken from https://horace.io/brrr_intro.html).

Dispatcher is about 1us, but there's a lot of other things that need to go on - inferring dtype, error checking, building the op, allocating output tensors, etc.

4