Recent comments in /f/MachineLearning

DigThatData t1_j6y35x2 wrote

It's a startup that evolved out of a community of people who found each other through common interests in open source machine learning for public good (i.e. eleuther and laion), committed to providing the public with access to ML tools that were otherwise gated by corporate paywalls. For several years, that work was all being done by volunteers in their free time. We're barely a year old as an actual company and we're not perfect. But as far as intentions and integrity go: you're talking about a group of people who were essentially already functioning as a volunteer run non-profit, and then were given the opportunity to continue that work with a salary, benefits, and resources.

If profit was our chief concern, we wouldn't be giving these models away for free. Simple as that. There're plenty of valid criticisms you could lob our way, but a lack of principles and greed aren't among them. You might not like the way we do things or certain choices we've made, but if you think the intentions behind those decisions is primarily profit motivated: you should really learn more about the people you are criticizing, because you couldn't be more misinformed.

1

koolaidman123 t1_j6y07he wrote

have you even read the instructGPT paper?

>In Stiennon et al. (2020), the RM is trained on a dataset of comparisons between two model outputs on the same input. They use a cross-entropy loss, with the comparisons as labels—the difference in rewards represents the log odds that one response will be preferred to the other by a human labeler. In order to speed up comparison collection, we present labelers with anywhere between K = 4 and K = 9 responses to rank. This produces (K C 2 ) comparisons for each prompt shown to a labeler. Since comparisons are very correlated within each labeling task, we found that if we simply shuffle the comparisons into one dataset, a single pass over the dataset caused the reward model to overfit.5 Instead, we train on all (K C 2 ) comparisons from each prompt as a single batch element. This is much more computationally efficient because it only requires a single forward pass of the RM for each completion (rather than (K 2 ) forward passes for K completions) and, because it no longer overfits, it achieves much improved validation accuracy and log loss. Specifically, the loss function for the reward model is: loss (θ) = − 1/ (K C 2 ) E(x,yw ,yl )∼D [log (σ (rθ (x, yw) − rθ (x, yl)))] (1) where rθ (x, y) is the scalar output of the reward model for prompt x and completion y with parameters θ, yw is the preferred completion out of the pair of yw and yl, and D is the dataset of human comparisons.

you know that figure you're referencing comes from the instructgpt paper... right?

−4

alpha-meta OP t1_j6xylk8 wrote

Could you help me understand what the far-away rewards represent here in this context? The steps are generating the individual words? So in this case you mean words that occur early in the text? In this case, a weighting scheme for the cross-entropy loss components could be used?

2

TrevorIRL t1_j6xvg76 wrote

Even a VERY conservative estimate here yields $4 000 000 a month in revenue which is more than enough to cover expenses and grow.

Very right that this is early days and yes, uninspired, but effective.

There will be new avenues for monetization once it matures. For example, opening the API for a fee would be another strategy that would earn huge dollars for OpenAI and allow some incredible apps to be developed!

2

melgor89 t1_j6xufba wrote

From my experience, they are equal now, especially when we are using now BatchNorm or LayerNorm. Both normalization methods also use mean and std value, and I make irrelevant, which kind of method you are using. Then I prefere the TensorFlow idea as it is simpler one.

3

mr_birrd t1_j6xtb3u wrote

Edit: Chatgpt uses GPT3. Search the dataset it used.

Google it they have full transparency. If you find a text by yourself there maybe ask if they can remove it. First of all, the data is only used for stachastic gradient descent and the model has no idea about the content it read, it only can model probabilities of words, e.g. it learned to speak but it only speaks such that it mostly outputs what makes sence in a bayesian way.

So the model is already trained and it didn't even read all of the data, those huge models often only read each instance of sample once at maximum, since they learn that "well".

Also in the law text you wrote I understand it that if you opt out in the future, it doesn't make past data processing wrong. The model is already trained, so they don't have to remove anything.

They also mostly have a whole ethics chapter in their papers, maybe you go check it out. Ethics etc is not smth unknows and especially such big companies also have some people working on that in their teams.

1

ISitAndWatch t1_j6xsbeh wrote

What do you mean ? It works ! It just sometimes completely forget some messages, sometimes fail to load an entire chat so I have to restart the app, sometimes crash without reason, sometimes audio refuses to work in video calls... But it launches ! I call that working by Microsoft standards.

19