Recent comments in /f/MachineLearning
RemarkableSavings13 t1_ja12aii wrote
Reply to comment by Brunt__ in [D] Looking for someone to do a small coding job by Brunt__
I know, but if you're paying for a custom voice that can't be cheap. I'd guess you'll be paying 5 figures at least for something like this, since you can't buy it without "contacting a sales rep". Your sales rep will be able to tell you if offline models are available, they often are but just aren't advertised.
To be honest though it sounds like you may be out of your depth, the Google custom voice product is expecting you to be a company with a deep pocketbook and a professional voice actor doing the reading. Is that who you are? If you're just some person who wants to use your own voice to read books, look into some of the zero shot TTS tools other people have posted.
SweatyBicycle9758 t1_ja107ho wrote
Where r u doing phd
Brunt__ OP t1_ja0xlvw wrote
Reply to comment by RemarkableSavings13 in [D] Looking for someone to do a small coding job by Brunt__
I haven't found that option on their site.
Brunt__ OP t1_ja0xi5e wrote
Reply to comment by junetwentyfirst2020 in [D] Looking for someone to do a small coding job by Brunt__
I appreciate this feedback.
RemarkableSavings13 t1_ja0wqn2 wrote
Reply to comment by Brunt__ in [D] Looking for someone to do a small coding job by Brunt__
You’re already committed to paying for the custom voice? Honestly if you’re already paying for that they might just offer the option to buy an offline model you can run on prem.
junetwentyfirst2020 t1_ja0wjzs wrote
Reply to comment by Brunt__ in [D] Looking for someone to do a small coding job by Brunt__
Most people in this field who are able to get jobs in this field have an undergrad in computer science, and a masters degree. It’s applied math + computer science, which is different from being a web developer. There are no people with these degrees who are struggling to find work currently, and they command relatively high salaries at their jobs (>150k USD guaranteed).
You might be able to find a regular dev who could put this together, but if something doesn’t work out of the box the chances that they’ll know how to address the problem is pretty much zero because it’s not just a coding issue. We don’t even look at resumes that don’t have a masters degree because it really is important that the candidate can do all kinds of math, knows the family of algorithms, how to train DL models well, can explain why something did or didn’t work, can do analysis of data and results, and can also write efficient code. LOL it’s a stressful field 😝
PassionatePossum t1_ja0sovs wrote
Reply to comment by SuchOccasion457 in [D] Cost of data acquisition by SuchOccasion457
Sorry, I am not aware of any literature. The contractual stuff is mostly handled between ours and the hospital's lawyers. I'm not really involved in all of that.
And I'm not even sure that you'll find a one-size fits all answer. The requirements for a collaboration can vary wildly. A private practice usually has much more flexibility when it comes to technical infrastructure. In a hospital, the IT department usually wants to know when you are planning to connect stuff to their systems. But there are certain diseases that you are very unlikely to see in a private practice.
I would do the following: Once you know what kind of data you need, talk to physicians to understand their workflow. Then make a proposal how to collect the data and talk it through with them or their lawyers. If there is a potential problem with the plan, they'll tell you.
Once you know the workflow, you'll probably also have an idea how long it will take for them to collect the data you are looking for and from there you can make an educated guess how much it is going to cost you.
The rest is up for negotiation. As far as I know we have contracts that have built-in safeguards for both the physicians as us. They get a fixed price for a fixed number of hours they work for us. And they guarantee a certain minimum number of recorded procedures. If they can deliver more in the alotted time, even better.
savage_slurpie t1_ja0nsfe wrote
$50,000 final offer
SuchOccasion457 OP t1_ja0nas2 wrote
Reply to comment by PassionatePossum in [D] Cost of data acquisition by SuchOccasion457
thank you very much for the ellaborate response! are you aware of any literature, even if informal, that goes into details about the whole process with any type of rough estimates?
Dragonsareforreal t1_ja0mzvn wrote
Look into Tortoise TTS https://github.com/neonbjb/tortoise-tts and read the Adding a new voice part of things, that would be a good starting point.
jobeta t1_ja0m0og wrote
Reply to comment by SuchOccasion457 in [D] Cost of data acquisition by SuchOccasion457
Yes but just pick two or three and ask? Also check on Amazon mechanical Turk if you find labeling job listed and the rates. I have only needed this one but used upwork. We paid well and it was a while ago so I don’t think the price I will give you will be a good reference.
PassionatePossum t1_ja0llsg wrote
Reply to [D] Cost of data acquisition by SuchOccasion457
The data is always the most expensive part. I work in the medical device industry and it strongly depends on the type of data and how much effort it is for the physicians to collect it.
In the simplest case you can just run a recording device while they are doing their procedures. But of course it rarely is that simple: You need to be careful not to capture any data that can be used to personally identify the patient (and the definition of personally identifying information is - at least in Europe - extremely wide).
The next question is: Do you need any lab data as groundtruth? If the answer is "yes", it will create a lot of effort for the physician because he/she can not simply record the data. They will have to keep track of the patients, recordings and diagnosis and annotate them later accordingly.
Another thing to keep in mind is: In many cases you cannot just connect a non-certified device to a medical device. You often need special recording hardware that is medically certified. That probably mostly is the case for surgical devices. The rules for MRI images migth be more relaxed. I don't know.
As a rough guideline you can expect to pay physicians around 200€ / hour (in the U.S. likely even more than that). And as I said: How much data you get for that, strongly depends on the type of data that you collect.
Brunt__ OP t1_ja0kj6v wrote
Reply to comment by RemarkableSavings13 in [D] Looking for someone to do a small coding job by Brunt__
Yes
RemarkableSavings13 t1_ja0kg0w wrote
So you want to use Google Custom Voice service to create a model of your own voice, then distill that voice into a custom on-device model?
Brunt__ OP t1_ja0jcfh wrote
I don't mind paying for the task. I was not expecting it to be inexpensive.
NotARedditUser3 t1_ja0in8n wrote
You should paste this into chatgpt. You might get some useful resources on where to go. Short answer.... You expect way too much for a budget of almost nothing
aidenr t1_ja0igb4 wrote
“I’m not a mechanic but I’d like a custom motorcycle. Seems easy enough, anyone up for the task? Or recommend me a commodity worker who can do it for nearly zero. Thanks!”
SuchOccasion457 OP t1_ja0c7x9 wrote
Reply to comment by jobeta in [D] Cost of data acquisition by SuchOccasion457
have you seen anyone selling datasets? I found one webpage that openly lists prices, everything else seems rather closed :( Everyone does per-user pricing
jobeta t1_ja08xy1 wrote
Reply to [D] Cost of data acquisition by SuchOccasion457
I don’t think there is a general answer to that. For labeling there are multiple services that you can use. You could just contact them and ask or look if they advertise how much they pay people to label to get a proxy. For the data itself, it completely depends on the data. I would imagine medical data would be hard to obtain and require some legal consideration around privacy (at least I would hope so).
[deleted] t1_ja083mb wrote
Reply to comment by FrostedFlake212 in [D] Simple Questions Thread by AutoModerator
Yes, the model "thinks" the solution found is the best, but it is not. The model is getting confused because of some complex mathematical results that it gets along the way, and never gets to the optimal solution hence "non optimal solution".
Sometimes it goes even worse: not only it does not converge to the best solution (previous paragraph) but also diverges, i.e the error increases (value grows) instead of decrease. This is less common and maybe just means there are planning errors.
This is just a broad idea.
FrostedFlake212 t1_ja04vmw wrote
Reply to comment by [deleted] in [D] Simple Questions Thread by AutoModerator
Oh wow okay, that makes a lot of sense! So essentially “converging” means, in simpler terms, that the model comes to a conclusion. And what you’re saying is that the model comes to a conclusion too fast on its conditions, and these are good conditions but not the optimal ones.
blueSGL t1_ja00p4i wrote
Reply to [R] [P] New ways of breaking app-integrated LLMs with prompt injection by taken_every_username
I first saw this mentioned 9 days ago by Gwern in the comment here on LW
>"... a language model is a Turing-complete weird machine running programs written in natural language; when you do retrieval, you are not 'plugging updated facts into your AI', you are actually downloading random new unsigned blobs of code from the Internet (many written by adversaries) and casually executing them on your LM with full privileges. This does not end well."
This begs the question, how are you supposed to sanitize this input whilst still keeping them useful?
taken_every_username OP t1_j9zz7jc wrote
Reply to comment by currentscurrents in [R] [P] New ways of breaking app-integrated LLMs with prompt injection by taken_every_username
They mention code completion in the paper too. I guess yea chatgpt isn't really affected but sure seems like connecting them to stuff was the main future selling point
AmalgamDragon t1_j9zyyib wrote
Reply to comment by currentscurrents in [D] Isn't self-supervised learning(SSL) simply a kind of SL? by Linear--
> Rewards are sparse in the real world
This doesn't seem true. The only reason we aren't getting negative rewards (e.g. pain, discomfort, etc.) constantly is that we learn to generally avoid them.
GlorifiedPlumber100 t1_ja12ovz wrote
Reply to [D] Navigating Academic Conferences by MyActualUserName99
If you are presenting in the poster session, have the 20 second long summary of your poster. Most people are at the poster session for the free food or because a friend has a poster. If you can hook a casual observer with a pithy, quick summary, they may stick around for the 5 minute version. If you launch straight into the 5 minute version, people will pretend they can't hear you over the noise of the crowd.