Recent comments in /f/MachineLearning
Jean-Porte t1_j6y0djg wrote
Reply to comment by alpha-meta in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
The beginning of the best possible answer might not be the best beginning. It's the final outcome, the complete answer that counts, so it makes sense to evaluate that. The reward is the feedback on the complete answer.
koolaidman123 t1_j6y07he wrote
Reply to comment by was_der_Fall_ist in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
have you even read the instructGPT paper?
>In Stiennon et al. (2020), the RM is trained on a dataset of comparisons between two model outputs on the same input. They use a cross-entropy loss, with the comparisons as labels—the difference in rewards represents the log odds that one response will be preferred to the other by a human labeler. In order to speed up comparison collection, we present labelers with anywhere between K = 4 and K = 9 responses to rank. This produces (K C 2 ) comparisons for each prompt shown to a labeler. Since comparisons are very correlated within each labeling task, we found that if we simply shuffle the comparisons into one dataset, a single pass over the dataset caused the reward model to overfit.5 Instead, we train on all (K C 2 ) comparisons from each prompt as a single batch element. This is much more computationally efficient because it only requires a single forward pass of the RM for each completion (rather than (K 2 ) forward passes for K completions) and, because it no longer overfits, it achieves much improved validation accuracy and log loss. Specifically, the loss function for the reward model is: loss (θ) = − 1/ (K C 2 ) E(x,yw ,yl )∼D [log (σ (rθ (x, yw) − rθ (x, yl)))] (1) where rθ (x, y) is the scalar output of the reward model for prompt x and completion y with parameters θ, yw is the preferred completion out of the pair of yw and yl, and D is the dataset of human comparisons.
you know that figure you're referencing comes from the instructgpt paper... right?
munkisquisher t1_j6xzccm wrote
Reply to comment by bumbo-pa in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Yeah it's the only way to screenshare
was_der_Fall_ist t1_j6xz6wj wrote
Reply to comment by koolaidman123 in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
ChatGPT had labelers rank outputs from best to worst, not head to head. (Different than InstructGPT, maybe?)
“A prompt and several outputs are generated. A labeler ranks the outputs from best to worst.”
EnsignElessar t1_j6xyzkr wrote
Reply to comment by ISitAndWatch in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Really basic stuff like copy/paste does not work. But they want to add in more features?!
godx119 t1_j6xywl5 wrote
Reply to [D] PC takes a long time to execute code, possibility to use a cloud/external device? by Emergency-Dig-5262
I was able to get $100 credit on Azure by signing up as a student, I would think that would cover whatever resources you need for your project.
[deleted] t1_j6xytt3 wrote
Reply to comment by Imonfire1 in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
[removed]
alpha-meta OP t1_j6xylk8 wrote
Reply to comment by Jean-Porte in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
Could you help me understand what the far-away rewards represent here in this context? The steps are generating the individual words? So in this case you mean words that occur early in the text? In this case, a weighting scheme for the cross-entropy loss components could be used?
JigglyWiener t1_j6xy6ys wrote
Reply to comment by znihilist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Infringing content can be created with any number of tools and we don’t sue photoshop for not detecting someone trying to alter images of what is clearly Mickey Mouse. We sue the person when they are making money off of the sale of copyrighted material.
It’s not worth chasing copyright for Pennies
arhetorical t1_j6xxijd wrote
Reply to comment by TrevorIRL in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
$20 is frankly a very reasonable price for anyone who uses it professionally. For people who just use to generate memes or students who want to cheat on homework it's less reasonable, but I don't think that's their target market (and in the case of cheating, something they actually want to avoid).
Emergency-Dig-5262 OP t1_j6xx6w6 wrote
Reply to comment by qalis in [D] PC takes a long time to execute code, possibility to use a cloud/external device? by Emergency-Dig-5262
I am using GridSearchCV. I don't know Hyperopt or TPE, but I will definitely do some research about them. Thank you!
Good call on the cores. That's the next thing I will check out!
myrmil t1_j6xw2sq wrote
Reply to comment by mr_birrd in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
Yeah, they sure wouldn't Kappa
TrevorIRL t1_j6xvg76 wrote
Reply to comment by frequenttimetraveler in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
Even a VERY conservative estimate here yields $4 000 000 a month in revenue which is more than enough to cover expenses and grow.
Very right that this is early days and yes, uninspired, but effective.
There will be new avenues for monetization once it matures. For example, opening the API for a fee would be another strategy that would earn huge dollars for OpenAI and allow some incredible apps to be developed!
mr_birrd t1_j6xvaec wrote
Reply to comment by Monoranos in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
Well the thing is you aren't the first one to think about that. They do this for very long already and know that what they do is legal here. They would not waste millions in training it just to throw it away afterwards.
Monoranos t1_j6xumt3 wrote
Reply to comment by mr_birrd in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
Even if they have full transparency it doesn't mean they are GDPR complient. I tried to look more into it but was not successfull.
melgor89 t1_j6xufba wrote
From my experience, they are equal now, especially when we are using now BatchNorm or LayerNorm. Both normalization methods also use mean and std value, and I make irrelevant, which kind of method you are using. Then I prefere the TensorFlow idea as it is simpler one.
mr_birrd t1_j6xtb3u wrote
Reply to comment by Monoranos in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
Edit: Chatgpt uses GPT3. Search the dataset it used.
Google it they have full transparency. If you find a text by yourself there maybe ask if they can remove it. First of all, the data is only used for stachastic gradient descent and the model has no idea about the content it read, it only can model probabilities of words, e.g. it learned to speak but it only speaks such that it mostly outputs what makes sence in a bayesian way.
So the model is already trained and it didn't even read all of the data, those huge models often only read each instance of sample once at maximum, since they learn that "well".
Also in the law text you wrote I understand it that if you opt out in the future, it doesn't make past data processing wrong. The model is already trained, so they don't have to remove anything.
They also mostly have a whole ethics chapter in their papers, maybe you go check it out. Ethics etc is not smth unknows and especially such big companies also have some people working on that in their teams.
perpetualgrunt t1_j6xstj0 wrote
Reply to comment by TREDOTCOM in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Can you give more details?
ISitAndWatch t1_j6xsbeh wrote
Reply to comment by Imonfire1 in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
What do you mean ? It works ! It just sometimes completely forget some messages, sometimes fail to load an entire chat so I have to restart the app, sometimes crash without reason, sometimes audio refuses to work in video calls... But it launches ! I call that working by Microsoft standards.
Monoranos t1_j6xs7m3 wrote
Reply to comment by mr_birrd in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
I don't believe that they disclosed the data on which they trained chatGPT. If you know do you mind sharing ? :)
A_fellow t1_j6xqk7y wrote
Reply to comment by GoofAckYoorsElf in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Of course the unbiased side completely agrees with you at every step.
What a scam.
A_fellow t1_j6xq9gj wrote
Reply to comment by DigThatData in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Pretending stability had or will have any principles other than profit is laughable.
mr_birrd t1_j6xps33 wrote
Reply to comment by Monoranos in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
Do you know the dataset is was trained on even?
Monoranos t1_j6xp59x wrote
Reply to comment by mr_birrd in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
Just read my edit about the GDPR and explicit consent.
"in Europe should be able to opt out of everything if you want." Great point, I wonder how would OpenAI react if people want them to remove their data. Is it even possible ?
DigThatData t1_j6y35x2 wrote
Reply to comment by A_fellow in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
It's a startup that evolved out of a community of people who found each other through common interests in open source machine learning for public good (i.e. eleuther and laion), committed to providing the public with access to ML tools that were otherwise gated by corporate paywalls. For several years, that work was all being done by volunteers in their free time. We're barely a year old as an actual company and we're not perfect. But as far as intentions and integrity go: you're talking about a group of people who were essentially already functioning as a volunteer run non-profit, and then were given the opportunity to continue that work with a salary, benefits, and resources.
If profit was our chief concern, we wouldn't be giving these models away for free. Simple as that. There're plenty of valid criticisms you could lob our way, but a lack of principles and greed aren't among them. You might not like the way we do things or certain choices we've made, but if you think the intentions behind those decisions is primarily profit motivated: you should really learn more about the people you are criticizing, because you couldn't be more misinformed.