Recent comments in /f/MachineLearning
much_bad_gramer t1_j4uiipp wrote
I may try making this
timelyparadox t1_j4uig6f wrote
Reply to [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng
It really wants you to make a chat bot, I think it is self aware and biased
Philpax t1_j4uhp74 wrote
I've thought about this and it seems doable (especially with the availability of both YouTube transcripts and Whisper), but the cost of training would be quite tedious for a hobbyist. Am excited to see if anyone tackles it, though.
Philpax t1_j4uhmlc wrote
Reply to comment by C0hentheBarbarian in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
For this, I'd infer on the client (especially if you train on the YouTube transcript, so that you don't need to run Whisper over the audio track). Of course, it's much harder to make it a paid product then đ
FastestLearner OP t1_j4uhkbm wrote
Reply to comment by C0hentheBarbarian in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Yes. I did think about that and potential solutions could be:
(1) A startup offering services in exchange of a small fee - The good thing about it is that once you do an inference on a video, you can serve it to thousands of customers with no additional cost (except for server maintenance and bandwidth, but no extra GPU cost other than the first time you ran it on a particular video).
(2) Crowd sourced inference - The current state of the sponsor-blocking extension is that it requires manual user input which it sources from the crowd and collects at a central server. So it's basically crowd-sourced (or peer-sourced) manual labour. I'm sure if someone could come up with an automated version like an executable which runs in the background with very small resource usage, then inference can be done via crowd-sourcing too, the timestamps can then be collected to a central server and distributed across the planet. The good thing about this is that as more and more people join in to participate in the peer-sourced inference, the lower would be the cost of keeping any one peer's GPU busy.
C0hentheBarbarian t1_j4ug85t wrote
Reply to comment by FastestLearner in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Training isnât the main issue wrt cost. Inference is.
FastestLearner OP t1_j4ufxgc wrote
Reply to comment by CallFromMargin in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
I am not well acquainted with NLP tasks. So I have no idea of how much resource it would need to get a transformer trained on it (or finetune an existing model like BERT on the dataset). If resources are a concern, one could do a crowd sourced training, like LeelaChessZero. I think it's a matter of time someone comes along and does this, because blocking ads is the inevitable future of the internet. Also, some company/startup can do it on a subscription model like the already existing paid adblocking softwares. It's a potential startup idea IMO.
CallFromMargin t1_j4uehtz wrote
How well would it work, and how much would it cost? GPU instances are not cheap, and each minute thousands of hours of YouTube videos are uploaded.
JClub OP t1_j4uc8lc wrote
Reply to comment by buzzbuzzimafuzz in [D] RLHF - What type of rewards to use? by JClub
Yes, that makes sense! But for example, can you really combine a thumbs-up/down experience with a scale of 1-5? That will be even harder to make them both work together when training the model, right?
JClub OP t1_j4uc0bg wrote
Reply to comment by velcher in [D] RLHF - What type of rewards to use? by JClub
PPO's formula makes the gradient update always rather smaller than other RL algorithms. I get that the reward is measuring the human's preference but that does not answer my question đ¤ : what rewards work best for PPO?
[deleted] t1_j4uaouz wrote
Reply to [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada
[removed]
buzzbuzzimafuzz t1_j4u5jrz wrote
Reply to [D] RLHF - What type of rewards to use? by JClub
I think what OpenAI and Anthropic typically do is providing evaluators with two possible responses and having them select which one is better. If you have numerical ratings, it might be hard to calibrate them. From the original paper "Deep reinforcement learning from human feedback" (2017):
>We ask the human to compare short video clips of the agentâs
behavior, rather than to supply an absolute numerical score. We found comparisons to be easier for humans to provide in some domains, while being
equally useful for learning human preferences.
Comparing short video clips is nearly as fast as
comparing individual states, but we show that
the resulting comparisons are significantly more
helpful
ChatGPT seems to be trained from a combination of expert-written examples and upvotes and downvotes on individual messages.
2blazen OP t1_j4u5jf7 wrote
Reply to comment by hayder978 in [D] Speaker diarization: reusing fitted speaker embedding clusters? by 2blazen
With my RTX 3060 it takes 3m50s to diarize 1 hour, 20m to do 3 hours (although can be reduced to 16m by presetting the number of speakers - I didn't check 1h segment like this, also keep in mind it takes time to load the models into vram), however 5 hour episodes keep getting my process killed after around 40m. It's probably a memory issue, and could even happen during the segmentation, but reusing clusters is a common issue on Github, it wouldn't just be for my usecase
nmfisher t1_j4typw0 wrote
Reply to comment by Impressive_Iron_6102 in [D] Iâm a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice
If I was using one of the newer search engines that let you block domains then Medium would definitely be on my blacklist. The signal-to-noise ratio is just way too low.
towardsdatascience might be slightly better but even if you find something worthwhile, it's probably available somewhere else that doesn't clog up your search results.
mrconter1 OP t1_j4tuaal wrote
Reply to comment by blose1 in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
> This is not testing intelligence, this is testing if human was trained on computer usage, knows what e-mail is and used gmail before.
I don't think it's binary. I think intelligence is a large part here.
> Someone from tribe in Africa would fail your test while he is human and is intelligent,
Could you train a bird to pass all questions on this benchmark? No. Because it's not as intelligent as a human.
> train him on this task like you would train current gen multimodal system and it will pass your benchmark. You train LLM in combination with image model and RL model, train on instruction following using inputs you described and now it understands what it sees, can follow what you want it to do.
Solving this benchmark is an easy problem? How long do you think it will take until we have a model that can causually solve all the instructions a gave in the previous comment?
velcher t1_j4ts9n0 wrote
Reply to [D] RLHF - What type of rewards to use? by JClub
Disclaimer: I'm a deep RL person, so I'm speaking from a pure RL viewpoint. I have never trained LLM with RLHF (yet ;) ).
You can think of rewards as a way of expressing preferences to the model. Then you can reason about what types of rewards to use.
Binary: either the output is good or bad. There is no preference between outputs that are good (they are all 1) or outputs that are bad (they are all 0). Scale of 1-5: there are 5 preferences of increasing order. In particular, the rank 1 choice is exactly 1 real value (see aside for what the real value does) more than rank 2. Ranking 4 different model outputs: Not sure what you mean here.
Aside: So reward scale can affect the RL process. RL policies are commonly trained through something called the "Policy Gradient", which weights the policy update by the scale of the return (sum of rewards). So the larger your reward scaling, the larger this gradient. Too large rewards can cause the gradient to be too large and lead to an unstable policy, too small rewards can result in small gradients and therefore slow-to-converge policies. This reward scale can be counteracted by the learning rate, or reward normalization. But all of this needs to be tuned for the specific task.
Reward scaling can also affect your RL algorithm, particularly if it uses an entropy penalty for exploration (SAC, TD3, PPO, TRPO etc.).
lorenzo1384 t1_j4trj9v wrote
Reply to comment by nateharada in [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada
Thanks and yes I do have a premium GPU. I am paying for all the proof of concepts i do. So this will be helpful.
nateharada OP t1_j4tojyh wrote
Reply to comment by lorenzo1384 in [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada
Yeah it should work if you use the API (and if you have a GPU in your co-lab). I don't think it'll work with TPU just yet.
extracompute t1_j4tnogh wrote
Reply to comment by MrAcurite in [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada
>Give Award
Ha. computeX has automated notifs built in to avoid problems like these.
What's the biggest bill you've ever come back to on Monday AM?
Unlikely-Advice-7168 t1_j4tgblj wrote
Reply to comment by Apprehensive-Tax-214 in [P] Built an at-cost, pay per second, open-source API for Tortoise text-to-speech (best I've heard!) by Apprehensive-Tax-214
https://github.com/supabase/supabase/discussions/5289
Seems fairly common
inquisitor49 t1_j4tgazw wrote
Reply to [D] Simple Questions Thread by AutoModerator
In transformers, a positional embedding is added to a word embedding. Why does this not mess up the word embedding, such as changing the embedding to another word?
BrotherAmazing t1_j4tdklr wrote
Reply to comment by MrSpotgold in [D] Can ChatGPT flag it's own writings? by MrSpotgold
No itâs not.
Anyone who wanted to cheat on a take-home essay or assignment always could, and anyone who has to write an essay in-class monitored for more critical and competitive standardized tests cannot be pulling out their devices and typing into chatGPT, which doesnât write A+ essays that a teacher canât detect are âa little offâ anyway.
As a former educator myself, I always knew which students had mastered the material and could intelligently talk about it in class discussions, during office hours, and through in-class essays/quizzes where they could not cheat while I closely monitored. They couldnât get an A+ by simply cheating on a few of the take-home essays, and the typical cheaters are cheating just to get by and still end up with inferior grades to those who master the subject.
Furthermore, concentrating too much on catching cheaters takes away from time you could be spending enriching the learning experience of everyone else.
It also sounds corny but is true: When you cheat, youâre only cheating yourself. Cheating really is self-policing in many instances. When we interview candidates who have a degree and a high GPA, itâs very obvious of they just got good grades but are clueless and we donât hire them. It might be cheating, or maybe grade inflation, or perhaps just short-term memorizing but not actually retaining or understanding what they were learning, but itâs night and day.
Those who truly care to learn will excel in their jobs and get better promotions. ChatGPT isnât going to help you there.
Having said that, I would consider possibly modifying the curriculum you only give take-home work that is 90% of the grade and can, but itâs not worth stressing over. Put your effort into teaching and enriching the lives of those who want to learn and yearn for knowledge. Youâre an educator first, and police work is just a side gig you canât ignore, but isnât your main purpose.
blose1 t1_j4td3lq wrote
Reply to comment by mrconter1 in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
>Recognize the Gmail icon of I say "send an email"
This is not testing intelligence, this is testing if human was trained on computer usage, knows what e-mail is and used gmail before.
Someone from tribe in Africa would fail your test while he is human and is intelligent, train him on this task like you would train current gen multimodal system and it will pass your benchmark. You train LLM in combination with image model and RL model, train on instruction following using inputs you described and now it understands what it sees, can follow what you want it to do.
MrAcurite t1_j4t9ch1 wrote
Reply to comment by Zealousideal_Low1287 in [P] A small tool that shuts down your machine when GPU utilization drops too low. by nateharada
At work, we've got this thing that will notify you if a cloud instance has been running for 24 hours. However, it does this by messaging your work email, you can't configure it to go to a personal device or anything. Meaning, if you set a job to run at the end of the week, you can come back on Monday to over a thousand dollars of cloud charges and like fifty angry emails about it.
FastestLearner OP t1_j4uilg8 wrote
Reply to comment by Philpax in [D] Idea: SponsorBlock with a neural net as backend by FastestLearner
Yes. I initially thought of having a neural net trained on the audio track of a particular YT video, but I think the transcripts would provide just enough information, and fine tuning existing language models would work quite well especially with the recent tremendous growth of NLP. Collecting the audio would also require far more storage space than text, and would probably require more RAM, VRAM and compute.
If you are leaning towards crowd-sourcing the inference, I think it would be possible to do that using JS libs (such as TensorFlow.js), although I have no experience of these. The good thing is, once you do an inference on a video, you just upload them to the central server and everyone can get it for free (not requiring further inference costs).