Recent comments in /f/MachineLearning

EmmyNoetherRing t1_j510553 wrote

>Unfortunately, OpenAI aren't serious about publishing technical reports anymore.

Do OpenAI folks show up to any of the major research conferences? These days I mostly come into contact with AI when it wanders into the tech policy/governance world, and this seems like the sort of work that would get you invited to an OSTP workshop, but I'm not sure if that's actually happening.

OpenAI's latest not-so-technical report (on their website) has a few folks from Georgetown contributing to it, and since AAAI is in DC in a few weeks I was hoping OpenAI would be around and available for questions in some capacity, in some room at the conference.

5

Equivalent-Way3 t1_j50y33r wrote

Yep very simple. Say you have model1 that you trained already, then you just use the xgb_model argument in your next training.

In R (Python should be the same or close to it)

new_model <- xgb.train(data = new_data, xgb_model = model1, blah blah blah)
1

wind_dude t1_j50x6ad wrote

Yea, unless they master continual learning, the models will get stale quick, or need to rely on iterative training, very expensive and slow. I don't see hardware catching up soon.

I think you'll still need to run a fairly sophisticated LLM as the base model for a query based archetecture. But you can probably reduce the cost of running it by distilling it, and curating the input data. I actually don't think there has been a ton of research on curating the input data before training (OpenAI did something similar curating responses in chatGPT with the RLHF, so similar concept), although concerns/critiques may arise of what junk, which is why it hasn't been looked at in depth before. I believe SD did this in the latest checkpoint removing anything "pornographic", which is over censorship.

You look at something like CC that makes up a fairly large portion of the training data, run it through a classifier to remove junk before training. And even CC text, a lot of it is probably landing type pages, or even a blocked by paywall msging. To my knowledge the percent of these making up CC hasn't even been looked at, let alone trimmed from the training datasets used.

3

sammysammy1234 t1_j50vmjq wrote

The advantage of using chatgpt is that it can give more human-like answers and doing prompt engineering is much easier than labeling a lot of data.

However, I do agree that it is a very costly model, and in many applications a simpler one could be enough.

I don't know for sure because Chatgpt's capabilities are currently being explored, and there are other models coming up, so there is no tellibg what the scenario will be in a few months. Maybe we will jist switch to using third party models, similarly to how no one programs their own compilers.

3

Daos-Lies t1_j50vdq9 wrote

That is indeed fair enough.

Big fan of the concept of screaming at it until it forgets ;)

And I suppose it is very possible that as part of my 'v long conversations with it' if the topic of the conversation repeated at any stage, which I'm sure they would have done at points, then that could have fooled me into thinking it was remembering things from right at the start.

2

wind_dude t1_j50pmcc wrote

I would suspect similar to blenderbot2 from meta and parl.ai.

Chat memory is searched for relevant information and sent to the decoder for the final output.

https://medium.com/ai-network/is-there-a-chatbot-that-goes-beyond-the-gpt-3-blenderbot-2-0-17e42e674824

​

https://ai.facebook.com/blog/blender-bot-2-an-open-source-chatbot-that-builds-long-term-memory-and-searches-the-internet/

​

So it's in the model architecture.

5

hapliniste OP t1_j50pe93 wrote

Also, I think this could help improve the actual "logic" of the model by focusing the small LM on that task while the search part would serve the role of knowledge base.

Another benefit could be the ability to cite its sources.

It really seems like a no brainer to me.

12

JoeHenzi t1_j50pbv9 wrote

I'll take a look, thanks again. Building up a dataset, at the very least, that could be interesting to analyze or crunch. Would love to implement a GA to explore the space and have the example code from ChatGPT but need to dive deeper. As I may have mentioned on my GH comment, when trying to do predictions around parameters I end up blocking/slowing the API call so either my code is trash (likely!) or I'm trying to do too-too much at once.

On my short term list is using a T5-like model to produce summaries but I was trying to execute them at bad times, trying to make too many changes at once.

Thanks again for sharing. Enjoying playing in the space and love when you find people willing to share. (Unlike OpenAI who is slowly closing out the world to their toys).

2

Ouitos t1_j50cm0i wrote

Hi, thanks for the explanation !

Two comments :

> 1. Make "New probs" equal to "Initial probs" to initialize.

Shouldn't it be the opposite ? Make the initial be equal to the first occurence of new probs ? I mean equality is transitive, but here we think you change new probs to be equal to initial probs, but I contradicts the diagram that says that new probs is always the output of our LM.

> loss = min(ratio * R, clip(ratio, 0.8, 1.2) * R)

Isn't the min operation redundant with the clip ? How is that different from min(ratio * R, 1.2 * R) ? Does 0.8 have any influence at all ?

2