Recent comments in /f/MachineLearning
TheTwigMaster t1_j5097m6 wrote
Reply to [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar
Using open source models might be good for quickly experimenting and getting a feel/sense of the value of an approach for a particular problem. But at a company (especially big tech companies), there are many more things to consider:
- How do I scale this to my particular dataset? It’s a bigger pain to change my data to fit a given model than to change the model to fit my data
- How can I integrate my company’s infrastructure/tooling/monitoring to this? Often it ends up being simpler to revisit the implementation from scratch
- How easy is it to experiment with adjustments to this? Often we don’t want to pick a single architecture forever, so we want to be able to adjust and modify easily. Open source models may not always accommodate this.
At the risk of being flippant/dismissive: coding up a model/architecture is one of the easiest and fastest parts of the problem. So if you can make other things easier by making a model implementation from scratch, it’s makes sense to just do that.
hannahmontana1814 t1_j5093nz wrote
Reply to [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116
If you're looking for a model to detect GPT-generated text, you're out of luck.
CurrentlyJoblessFML OP t1_j508inw wrote
Reply to comment by Naive-Progress4549 in [D] Question about using diffusion to denoise images by CurrentlyJoblessFML
Hi! Thanks for the response. I’ll try my luck by just concatenating my noisy input with yt along the channel dimension and see if that works. In the SR3 paper, the authors also mention that they tried using a different way to condition the model but they found that simply concatenating it gave them the same generation quality so they just stuck with that.
Good luck with your project and HMU if you ever want to discuss this. I’ve been breaking my head on these diffusion models for the past couple of days so I feel your struggle.
Naive-Progress4549 t1_j507af0 wrote
I think that if you go in the guided_diffusion repository you can see that the super resolution network condition the output by concatenating the low resolution image. There are also other ways to condition, like the gradients during sampling.
I am trying to adapt the guided_diffusion repository for some other task since a couple of months now...I have to say I am facing quite some difficulties overall!
I hope this helps
starstruckmon OP t1_j501y7y wrote
Reply to comment by johnrachwan in [R] Massive Language Models Can Be Accurately Pruned in One-Shot by starstruckmon
From the paper
>One natural avenue for future work would be to investigate fine-tuning mechanisms for such large-scale models, which would allow further accuracy recovery. We conjecture that this should be possible, and that probably at least 80-90% sparsity can be achieved with progressive pruning and fine-tuning.
So, that comes next. Though I doubt the 80-90% guesstimate.
LanverYT t1_j501vdn wrote
That's a really interesting question, and I've been wondering about the same thing. I've never been able to figure it out, but I would love to see what others have to say about it. It sounds like you have a solid approach and understanding of the concept, so I'm curious to see how it turns out. Good luck with your experimentation and let us know how it goes
johnrachwan t1_j4zz9vw wrote
I'm curious if results improve with some slight retraining
z_fi t1_j4zy4dq wrote
Reply to [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar
Restrictive licensing and limited usefulness to industry problems.
Leptino t1_j4zxkyn wrote
Reply to [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116
The only people that have a prayer at doing this, is OpenAI themselves. It is likely they can insert an undetectable watermark in sufficiently generic text output for sufficiently many words which does not distort the meaning or quality appreciatively.
However, there is almost no way this can survive subsequent finetunings.. Like 'rewrite the previous paragraph with three new random words that doesn't change the meaning', and 'change all the nouns/verbs into synonyms that preserves the meaning of the paragraph'.
I strongly suspect (and might one day try my hand at the math) that there can be no such system that works in general against this sort of attack.
seventyducks t1_j4zvo3n wrote
Reply to comment by junetwentyfirst2020 in [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116
Where are the benchmarks and analyses that you're basing this statement on?
holy_onasandwich t1_j4zs0ls wrote
Should be the ~8k token context size, experiment done here: https://twitter.com/goodside/status/1598874674204618753?t=70_OKsoGYAx8MY38ydXMAA&s=19
Ok-Cartoonist8114 t1_j4zrur1 wrote
It is called Slot filling, extractive QA may works :)
IntelArtiGen t1_j4zr3iq wrote
Reply to comment by Daos-Lies in [D] Inner workings of the chatgpt memory by terserterseness
Yeah that's also what I would say, I doubt it's anything revolutionary as it's likely not necessary. It might be an innovative use of embeddings of a conversation but I wouldn't qualify that as "revolutionary".
They probably don't use only one embedding for the whole conv, perhaps they use one embedding per prompt and/or they keep in memory some tokens.
Dear-Acanthisitta698 t1_j4zqkv8 wrote
Text QA might work. Give descriptiom as passage and question as "how many number of rooms in this house?".
Czl2 t1_j4zqan4 wrote
Ask model to summarize whatever is about to be cut off as you slide the token window and replace what is lost with that summary? In this way your token window always has a summarized version of what is missing attached?
Daos-Lies t1_j4zpwjr wrote
This is just a suspicion, but I think it's just a matter of embedding the conversation and using that embedding as an input, in addition to your most recent question. (Which is just classic recurrence really).
I'm relatively confident that the mechanism would be something along those lines because they made a relatively big fuss about their new embedding service around the same time that chatgpt was released. (tho obviously that didn't get as much attention as chatgpt itself).
(and in response to u/DaLameLama asking if chatGPT goes past the token limit: Yes. it deffo can go past 8000 tokens, I have had some v v v long conversations with it.)
LcuBeatsWorking t1_j4znni1 wrote
Reply to [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar
One problem of many open-source models is that they are badly documented (there was a big discussion last year that many models used in scientific papers couldn't be replicated).
So reverse-engineering them is often harder than building your own from scratch.
PredictorX1 t1_j4zmy54 wrote
Reply to [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar
Can you give some examples of problems that an organization would solve with open source models?
xorbinant_ranchu t1_j4zmevh wrote
Reply to comment by DaLameLama in [D] Inner workings of the chatgpt memory by terserterseness
1 token ~= 4 chars in English 1 token ~= ¾ words 100 tokens ~= 75 words
https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them
I've had it reference much more than ~6000 words so there's definitely something else going on
DaLameLama t1_j4zhqqj wrote
Does ChatGPT actually get past the token limit? Codex supports ~8000 tokens. You might underestimate how much this is. Has anyone tested the limits?
Unfortunately, OpenAI aren't serious about publishing technical reports anymore.
JClub OP t1_j4zejga wrote
Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub
Yes, 100% agree with you. I believe that the researchers have also tried pseudo labeling or making the reward differentiable as you say, and maybe RL is the SOTA approach now. But these are just guesses!
mtocrat t1_j4zecpm wrote
Reply to comment by dataslacker in [R] A simple explanation of Reinforcement Learning from Human Feedback (RLHF) by JClub
What you're describing is a general approach to RL that is used in different forms in many methods: sample actions, weight or rank them in some way by the estimated return, regress to the weighted actions. So you're not suggesting to do something other than RL but to replace one RL approach with a different RL approach.
SnooHesitations8849 t1_j4zdxm7 wrote
Reply to comment by BenXavier in [R] Researchers out there: which are current research directions for tree-based models? by BenXavier
Yep.
Acceptable-Cress-374 t1_j50aej9 wrote
Reply to [D] What is the name of this NLP technique? by Kebet-Mendez
You could also look up Named Entity Recognition (NER)