Recent comments in /f/MachineLearning
EmmyNoetherRing t1_j50zesm wrote
Reply to comment by DaLameLama in [D] Inner workings of the chatgpt memory by terserterseness
I've heard a diverse variety of folks talk about leaving chatGPT tabs/sessions open for for days or weeks and maintaining context plausibly well throughout.
MysteryInc152 t1_j50ym7g wrote
Reply to comment by Daos-Lies in [D] Inner workings of the chatgpt memory by terserterseness
There's a repo that actually uses embeddings for long term conversations you can try out.
Equivalent-Way3 t1_j50y33r wrote
Reply to comment by monkeysingmonkeynew in [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
Yep very simple. Say you have model1 that you trained already, then you just use the xgb_model argument in your next training.
In R (Python should be the same or close to it)
new_model <- xgb.train(data = new_data, xgb_model = model1, blah blah blah)
wind_dude t1_j50x6ad wrote
Yea, unless they master continual learning, the models will get stale quick, or need to rely on iterative training, very expensive and slow. I don't see hardware catching up soon.
I think you'll still need to run a fairly sophisticated LLM as the base model for a query based archetecture. But you can probably reduce the cost of running it by distilling it, and curating the input data. I actually don't think there has been a ton of research on curating the input data before training (OpenAI did something similar curating responses in chatGPT with the RLHF, so similar concept), although concerns/critiques may arise of what junk, which is why it hasn't been looked at in depth before. I believe SD did this in the latest checkpoint removing anything "pornographic", which is over censorship.
You look at something like CC that makes up a fairly large portion of the training data, run it through a classifier to remove junk before training. And even CC text, a lot of it is probably landing type pages, or even a blocked by paywall msging. To my knowledge the percent of these making up CC hasn't even been looked at, let alone trimmed from the training datasets used.
andreichiffa t1_j50x4ky wrote
Reply to comment by DaLameLama in [D] Inner workings of the chatgpt memory by terserterseness
Reported token size is 2048, but they likely do a hard attention mask. In about 1/4th of words
sammysammy1234 t1_j50vmjq wrote
The advantage of using chatgpt is that it can give more human-like answers and doing prompt engineering is much easier than labeling a lot of data.
However, I do agree that it is a very costly model, and in many applications a simpler one could be enough.
I don't know for sure because Chatgpt's capabilities are currently being explored, and there are other models coming up, so there is no tellibg what the scenario will be in a few months. Maybe we will jist switch to using third party models, similarly to how no one programs their own compilers.
Daos-Lies t1_j50vdq9 wrote
Reply to comment by MysteryInc152 in [D] Inner workings of the chatgpt memory by terserterseness
That is indeed fair enough.
Big fan of the concept of screaming at it until it forgets ;)
And I suppose it is very possible that as part of my 'v long conversations with it' if the topic of the conversation repeated at any stage, which I'm sure they would have done at points, then that could have fooled me into thinking it was remembering things from right at the start.
Kebet-Mendez OP t1_j50tplz wrote
Reply to comment by Dear-Acanthisitta698 in [D] What is the name of this NLP technique? by Kebet-Mendez
Thanks a lot!
Kebet-Mendez OP t1_j50to3u wrote
Reply to comment by Acceptable-Cress-374 in [D] What is the name of this NLP technique? by Kebet-Mendez
Thank you!
Kebet-Mendez OP t1_j50tn1o wrote
Reply to comment by Ok-Cartoonist8114 in [D] What is the name of this NLP technique? by Kebet-Mendez
Thank you!
Nightchanger t1_j50s3j3 wrote
Reply to [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116
It may be possible against specific models if you know them. It's the same as trying to recognize authors according to text
terath t1_j50rz6q wrote
Reply to [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar
They probably do use open source architectures and maybe code, but often train their own model on their own data. This is because the research training sets both don't match whatever domain companies need to use it with, but also because many of the research data sets licenses forbid commercial use.
EmmyNoetherRing t1_j50q53i wrote
Reply to comment by mycall in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Huh, fair. Got a concrete example?
MysteryInc152 t1_j50pw6e wrote
Reply to comment by IntelArtiGen in [D] Inner workings of the chatgpt memory by terserterseness
With embeddings, it should theoritically not have a hard limit at all. But experiments here suggest a sliding context window of 8096
https://mobile.twitter.com/goodside/status/1598874674204618753?t=70_OKsoGYAx8MY38ydXMAA&s=19
MysteryInc152 t1_j50pkxw wrote
Reply to comment by Daos-Lies in [D] Inner workings of the chatgpt memory by terserterseness
With embeddings, it should theoritically not have a hard limit at all. But experiments here suggest a sliding context window of 8096
https://mobile.twitter.com/goodside/status/1598874674204618753?t=70_OKsoGYAx8MY38ydXMAA&s=19
hapliniste OP t1_j50pe93 wrote
Also, I think this could help improve the actual "logic" of the model by focusing the small LM on that task while the search part would serve the role of knowledge base.
Another benefit could be the ability to cite its sources.
It really seems like a no brainer to me.
JoeHenzi t1_j50pbv9 wrote
Reply to comment by JClub in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub
I'll take a look, thanks again. Building up a dataset, at the very least, that could be interesting to analyze or crunch. Would love to implement a GA to explore the space and have the example code from ChatGPT but need to dive deeper. As I may have mentioned on my GH comment, when trying to do predictions around parameters I end up blocking/slowing the API call so either my code is trash (likely!) or I'm trying to do too-too much at once.
On my short term list is using a T5-like model to produce summaries but I was trying to execute them at bad times, trying to make too many changes at once.
Thanks again for sharing. Enjoying playing in the space and love when you find people willing to share. (Unlike OpenAI who is slowly closing out the world to their toys).
Kamal_Ata_Turk t1_j50j57t wrote
Reply to [D] Simple Questions Thread by AutoModerator
Writing a Single SQLite Query to mimic a R program Please help with this https://stackoverflow.com/questions/75174575/writing-a-single-sqlite-query-to-mimic-an-r-program
mycall t1_j50ibgp wrote
Reply to comment by EmmyNoetherRing in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
Not always. Imagination can be learning which is an expansion from steady state.
mycall t1_j50h4l7 wrote
Reply to comment by omniron in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
It probably is definitely complicated. There are many DAGs to reach similar or repeating patterns, or connections are suboptimal and thus never needed. How do you choose which to keep and which to delete.
[deleted] t1_j50fd7o wrote
Reply to [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar
They do, I'm not too sure what you're talking about.
And on the flipside, businesses to need to distinguish themselves or create value. So unless you are using the open source model in the context of an application, then what good is the open source model if anyone can run it?
Ouitos t1_j50cm0i wrote
Hi, thanks for the explanation !
Two comments :
> 1. Make "New probs" equal to "Initial probs" to initialize.
Shouldn't it be the opposite ? Make the initial be equal to the first occurence of new probs ? I mean equality is transitive, but here we think you change new probs to be equal to initial probs, but I contradicts the diagram that says that new probs is always the output of our LM.
> loss = min(ratio * R, clip(ratio, 0.8, 1.2) * R)
Isn't the min operation redundant with the clip ? How is that different from min(ratio * R, 1.2 * R) ? Does 0.8 have any influence at all ?
EmmyNoetherRing t1_j510553 wrote
Reply to comment by DaLameLama in [D] Inner workings of the chatgpt memory by terserterseness
>Unfortunately, OpenAI aren't serious about publishing technical reports anymore.
Do OpenAI folks show up to any of the major research conferences? These days I mostly come into contact with AI when it wanders into the tech policy/governance world, and this seems like the sort of work that would get you invited to an OSTP workshop, but I'm not sure if that's actually happening.
OpenAI's latest not-so-technical report (on their website) has a few folks from Georgetown contributing to it, and since AAAI is in DC in a few weeks I was hoping OpenAI would be around and available for questions in some capacity, in some room at the conference.