Recent comments in /f/MachineLearning

prototypist t1_j6ljszc wrote

I just barely got into text NLP when I could run notebooks with a single GPU / CoLab and get interesting outputs. I've seen some great community models (such as Dhivehi language) made with Mozilla Common Voice data. But if I were going to collect a chunk of isiXhosa transcription data, and try to run it on a single GPU, that's hours of training to an initial checkpoint which just makes some muffled noises.At end of 2022 there was a possibility to fine-tune OpenAI Whisper, so if I tried again, I might start there. https://huggingface.co/blog/fine-tune-whisper

Also I never use Siri / OK Google / Alexa. I know it's a real industry but I never think of use cases for it.

2

pronunciaai t1_j6l49ij wrote

Yeah I work in the space (mispronunciation detection) and there is not a lack of frameworks, (speechbrain, NeMo, and thunder-speech being the more useful ones for custom stuff imo). The barrier to entry is all the stuff you have to learn to do audio ML, and all the pain points around stuff like CTC. Tutorials are more needed than frameworks to get more people actively working on speech and voice in my opinion.

6

starfries t1_j6l0aeq wrote

Thanks for that resource, I've been experimenting with the lottery ticket method but that's a lot of papers I haven't seen! Did you initialize the weights as if training from scratch, or did you do something like trying to match the variance of the old and new weights? I'm intrigued that your method didn't hurt performance - most of the things I've tested were detrimental to the network. I have seen some performance improvements under different conditions but I'm still trying to rule out any confounding factors.

1

thanks_champagne t1_j6l062p wrote

How do I find someone who has access to medical imaging models? I have found a couple open source resources but not sure if I have the skills/time to install the code. Specifically, I would like machine learning to analyze the scans I have of my left eye. I have a rare eye condition that has so far been deemed idiopathic.

3

babua t1_j6khgfr wrote

I don't think it stops there either, streaming architecture probably breaks core assumptions of some speech models. e.g. for STT, when do you "try" to infer the word? for TTS, how do you intonate the sentence correctly if you don't know the second half? You'd have to re-train your entire model for the streaming case and create new data augmentations -- plus you'll probably sacrifice some performance even in the best case because your model simply has to deal with more uncertainty.

3

edunuke t1_j6k9zjc wrote

In the UK there are at least 3 types of PhDs: 1) PhD by thesis, 2) PhD by publication and 3) profesional PhD.

Options 2 and 3 may be compatible to what you want. It really depends on your advisor, the source of funding, and your motivation. There are many reasons to pursue a phD that doesn require an academic motivation and you are entitled to your own reasons. I've met people that have done PhDs while also having a Job but those are the exception rather than the norm and challenging to say the least.

1

likenedthus t1_j6jy1mw wrote

You probably won’t find a good PhD program that is “advertised” as being online. There are just too many variables. That said, it’s absolutely possible to work something out with your advisors that is effectively remote, assuming that most of your courses can be taken remotely and your research doesn’t require tools/resources that you cannot access remotely.

I’m doing my PhD in cognitive science at an international university (I live in the states). Plenty of online coursework has been made available to me to meet those requirements. But I am 100% responsible for proposing and executing remote advisement/evaluation. I’m also required to visit the university at 1–2 times per year for 1–3 weeks per visit.

1

Brudaks t1_j6jqizr wrote

Availability of corpora for other languages.

If you care about much less resourced languages than English or the big ones, then you can generally get sufficient text to do interesting stuff, but working with speech becomes much more difficult due to very limited quantity of decent quality data.

2