Recent comments in /f/MachineLearning
[deleted] t1_j6x0948 wrote
Reply to comment by koolaidman123 in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
[deleted]
visarga t1_j6x07jz wrote
Reply to comment by MemeBox in [D] What does a DL role look like in ten years? by PassingTumbleweed
I was actually saying the opposite - AIs need human validation to do anything of value. Generating tons of text and images without manually checking them is useless. So there is work around AIs.
mr_birrd t1_j6x06eq wrote
Reply to comment by Monoranos in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
You think the whole internet is free to run? Anyways, they don't use any of your data to train it.
latefordinnerstudios OP t1_j6wwec2 wrote
Reply to comment by BobSteva in [P] An open source tool for repeatable PyTorch experiments by embedding your code in each model checkpoint by latefordinnerstudios
Thanks!
Jean-Porte t1_j6wvy2p wrote
Reply to [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
The traditional language modeling loss (negative log-likelihood) is misaligned with human expectations. One negation radically changes the meaning of a sentence. It doesn't radically change the loglikelihood. It isn't more important than a "the" or a superfluous word.
With RLHF, important words have important impact, and the loss is exactly aligned to human interests.
latefordinnerstudios OP t1_j6wvy0u wrote
Reply to comment by DingusFamilyVacation in [P] An open source tool for repeatable PyTorch experiments by embedding your code in each model checkpoint by latefordinnerstudios
This is exactly the kind of problems I’m trying to fix! Let me know how it works for you!
latefordinnerstudios OP t1_j6wvpbk wrote
Reply to comment by SatoshiNotMe in [P] An open source tool for repeatable PyTorch experiments by embedding your code in each model checkpoint by latefordinnerstudios
Mostly yes, when you load a snapshot it first saves your current directory (with git stash —-include-untracked) then it puts your current directory in the same state as the checkpoint. (It performs a git fetch followed by a git checkout).
I wrote more about it in the docs:
alpha-meta OP t1_j6wvgbr wrote
Reply to comment by koolaidman123 in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
Thanks for the response! I just double-checked the InstructGPT paper and you were right regarding the rankings -- they are pairwise, and I am not sure why I thought otherwise.
Regarding the updates on a sentence level, that makes sense. That would be more of a discrete problem as well for which you probably can't backpropagate (otherwise, you would be back to token-level).
alkibijad OP t1_j6wuvnc wrote
Reply to comment by TheDeviousPanda in [D] Apple's ane-transformers - experiences? by alkibijad
Can you please elaborate your answers and quantify?
I'm most interested in the effort for bullets 2 and 3. In your own experience, did it take hours, days, weeks?
i_wayyy_over_think t1_j6wup4m wrote
Reply to comment by londons_explorer in [D] Why is stable diffusion much smaller than predecessors? by dahdarknite
Also being able to easily fine tune a model makes gens on your particular subject higher quality than what you can get on anything else that’s not fine tuned.
koolaidman123 t1_j6wtmdj wrote
Reply to [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
- Outputs are not ranked 1-5, they're ranked 2 at a time head to head and the rm predicts which is more favored by humans
- Empirically they found rl outperformed supervised fine-tuning (sft) on human evaluations, meaning humans generally preferred the rlhf model vs the sft model. The sft model was ft using the top ranked answer
As to why rl outperform sft, not a lot of orgs have the resources to test this (yet), I've heard a plausible theory from ai2 that the main difference comes from the fact that sft uses a token level loss, whereas rl loss takes the entire sentence, so maybe instead of rl being "better" its just next token prediction task is worse
Reseachers ive spoken with dont believe rl is the critical component to enable these models, and that we could eventually discover the right training regime to enable sft to perform on par (or better) than rl
iqisoverrated t1_j6wswhz wrote
Reply to comment by bojohnsonyadig in [P] AI Poker/Machine Learning/Game-Theory by Much_Blacksmith_1857
Casino might just redistribute the money from the locked account once they detect such activity and deem it "bot beyond reasonable doubt". They have the hand histories so they could do that quite easily (talking about online casinos, obviously. If you manage to have bot info funneled to you at a live casino things will get tricky...but in that case you'll probably get sued for damages because they have all your personal info and your face on camera)
On the other hand: the casino got paid (the casino isn't playing poker. The casino is playing a different game called "rake") ...so they have no loss if someone cheats that way.
Their only incentive is to avoid bad PR if it were to become public that their site is overrun by bots.
But yes: As a player who was taken before the bot got caught you're probably SOL (if it was caught after your money was already withdrawn). Just like in most other crimes if the criminal already managed to spend your money.
Ronny_Jotten t1_j6wsav3 wrote
Reply to comment by znihilist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
If I remember your face, does my brain contain your face? Can your face be found anywhere inside my brain? Or has my brain created a sort of close-fit formula, embodied in connections of neurons, that can reproduce it to a certain degree of precision? If the latter, does that mean that I haven't memorized your face, even though I can draw a pretty good picture of it?
cachemonet0x0cf6619 t1_j6ws965 wrote
Reply to comment by bojohnsonyadig in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
me and my kids. i use it as a replacement for stack overflow and my kids use it for school.
uhules t1_j6wrx63 wrote
Reply to comment by Mefaso in [D] Why is stable diffusion much smaller than predecessors? by dahdarknite
Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.
Ronny_Jotten t1_j6wrlvv wrote
Reply to comment by znihilist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
I think pretty much everyone would have to agree that the brain - the original neural network - can memorize and reproduce images, though never 100% exactly. That's literally what we mean by the word memorize: to create a representation of something in a biological neural network in a way that it can be recalled and reproduced.
Can those pictures be found somewhere inside the brain, can you open a skull and point to them? Or is it just a function of neuronal connections that outputs such a picture? Is there "a difference between memorizing and pattern recreation"? It sounds like a "how many angels can dance on the head of a pin" sort of question that's not worth spending a lot of time on.
I don't think anyone should be surprised that an artificial neural network can exhibit a similar kind of behaviour, and that for convenience we would call it by the same word: "memorizing". I'm not saying that every single image is memorized, any more than I have memorized every image I've ever seen. But I do remember some very well - especially if I've seen them many times.
Some say that AIs "learn" from the images they "see", but somehow they refuse to say that they "memorize" too. If they're going to make such anthropomorphic analogies, it seems a bit selective, if not hypocritical.
The extent to which something is memorized, or the differences in qualities and how it takes place in an artificial vs. organic neural network, is certainly something to be discussed. But if you want to argue that it's not truly memorizing, like the argument that ANNs don't have true intelligence, well, ok... but that's also a kind of "no true Scotsman" argument that's a bit meaningless.
evanthebouncy t1_j6wpf34 wrote
I made a bet in 2019 to _not_ learn any more on how to fiddle with NN architectures. It paid off. Now I just send data to a huggingface API and it figures out the rest.
What will change? What are my thoughts?
All well identified problems become rat races. If there's a metric you can put on it, engineers will optimize it away. The comfort of knowing what you're doing has a well-defined metric is paid for in the anxiety of the rat race of everyone optimizing the same metric.
What do we do with this?
Work on problems that don't have a well defined metric. Work with people. Work with the real world. Work with things that defies quantification, that are difficult to reduce to a mere number that everyone agrees on. That way you have some longevity in the field.
SulszBachFramed t1_j6wp97b wrote
Reply to comment by Ronny_Jotten in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Right, hence why its relevant to large models trained on huge datasets. If the model can reconstruct data such that it is substantially similar to the original, then we have a problem. Whether from the viewpoint of copyright infringement or privacy law (gdpr).
bojohnsonyadig t1_j6wo8mi wrote
Reply to comment by iqisoverrated in [P] AI Poker/Machine Learning/Game-Theory by Much_Blacksmith_1857
So not before multiple people lose their money to it and the casino obvs never pays them back. Perfect lol
sharky6000 t1_j6wo5mi wrote
What do you want to know?
You should look up counterfactual regret (CFR) minimization, it has been the technique that underlies all the expert poker bots.
Then, if you are interested in hold'em variants, look up DeepStack, Libratus, Pluribus, ReBeL, and Player of Games.
All of the competitive bots on the hold'em variants use some form of specialized search (based on CFR or Monte Carlo CFR) over the public belief state tree.
The card draw variants are mostly untouched because the public tree methods are not as easily applicable.
Anyway feel free to dm me if you want to know more.
[deleted] t1_j6wo0lz wrote
Reply to comment by [deleted] in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
[deleted]
SatoshiNotMe t1_j6wnj9w wrote
Reply to [P] An open source tool for repeatable PyTorch experiments by embedding your code in each model checkpoint by latefordinnerstudios
Very nice and I appreciate you sharing the code as well as motivation on your blog. The code example to save snapshot looks simple. Did I understand correctly that when you reload a snapshot it puts your current directory into the git state corresponding the checkpoint ?
bojohnsonyadig t1_j6wnhha wrote
Will this attract the average joe user who just thought it was fun? Who do you think will be the target market/first adopters to pay?
Ronny_Jotten t1_j6wndrm wrote
Reply to comment by SulszBachFramed in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
The test for copyright infringment is whether it's "substantially similar", not "exactly the same".
visarga t1_j6x0qcm wrote
Reply to comment by Ronny_Jotten in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
I think their argument goes like this - when you encode an image to JPEG the actual image is replaced by DCT coefficients and reconstruction is only approximate. That doesn't make the image free of copyright.