Recent comments in /f/MachineLearning

Jean-Porte t1_j6wvy2p wrote

The traditional language modeling loss (negative log-likelihood) is misaligned with human expectations. One negation radically changes the meaning of a sentence. It doesn't radically change the loglikelihood. It isn't more important than a "the" or a superfluous word.

With RLHF, important words have important impact, and the loss is exactly aligned to human interests.

22

latefordinnerstudios OP t1_j6wvpbk wrote

Mostly yes, when you load a snapshot it first saves your current directory (with git stash —-include-untracked) then it puts your current directory in the same state as the checkpoint. (It performs a git fetch followed by a git checkout).

I wrote more about it in the docs:

http://jellyml.com/docs.html#eat

2

alpha-meta OP t1_j6wvgbr wrote

Thanks for the response! I just double-checked the InstructGPT paper and you were right regarding the rankings -- they are pairwise, and I am not sure why I thought otherwise.

Regarding the updates on a sentence level, that makes sense. That would be more of a discrete problem as well for which you probably can't backpropagate (otherwise, you would be back to token-level).

8

koolaidman123 t1_j6wtmdj wrote

  1. Outputs are not ranked 1-5, they're ranked 2 at a time head to head and the rm predicts which is more favored by humans
  2. Empirically they found rl outperformed supervised fine-tuning (sft) on human evaluations, meaning humans generally preferred the rlhf model vs the sft model. The sft model was ft using the top ranked answer

As to why rl outperform sft, not a lot of orgs have the resources to test this (yet), I've heard a plausible theory from ai2 that the main difference comes from the fact that sft uses a token level loss, whereas rl loss takes the entire sentence, so maybe instead of rl being "better" its just next token prediction task is worse

Reseachers ive spoken with dont believe rl is the critical component to enable these models, and that we could eventually discover the right training regime to enable sft to perform on par (or better) than rl

58

iqisoverrated t1_j6wswhz wrote

Casino might just redistribute the money from the locked account once they detect such activity and deem it "bot beyond reasonable doubt". They have the hand histories so they could do that quite easily (talking about online casinos, obviously. If you manage to have bot info funneled to you at a live casino things will get tricky...but in that case you'll probably get sued for damages because they have all your personal info and your face on camera)

On the other hand: the casino got paid (the casino isn't playing poker. The casino is playing a different game called "rake") ...so they have no loss if someone cheats that way.

Their only incentive is to avoid bad PR if it were to become public that their site is overrun by bots.

But yes: As a player who was taken before the bot got caught you're probably SOL (if it was caught after your money was already withdrawn). Just like in most other crimes if the criminal already managed to spend your money.

1

Ronny_Jotten t1_j6wsav3 wrote

If I remember your face, does my brain contain your face? Can your face be found anywhere inside my brain? Or has my brain created a sort of close-fit formula, embodied in connections of neurons, that can reproduce it to a certain degree of precision? If the latter, does that mean that I haven't memorized your face, even though I can draw a pretty good picture of it?

9

uhules t1_j6wrx63 wrote

Except DALL-E 2 also applies diffusion in latent space and Imagen performs diffusion in low-res pixel space. My initial hunch was the upscaling diffusion models, but they account for a relatively small portion of the total number of parameters and are more relevant speed-wise. The lackluster explanation is simply "SD does latent better", since you'd need to do an extensive ablation study to compare rather different architectures.

4

Ronny_Jotten t1_j6wrlvv wrote

I think pretty much everyone would have to agree that the brain - the original neural network - can memorize and reproduce images, though never 100% exactly. That's literally what we mean by the word memorize: to create a representation of something in a biological neural network in a way that it can be recalled and reproduced.

Can those pictures be found somewhere inside the brain, can you open a skull and point to them? Or is it just a function of neuronal connections that outputs such a picture? Is there "a difference between memorizing and pattern recreation"? It sounds like a "how many angels can dance on the head of a pin" sort of question that's not worth spending a lot of time on.

I don't think anyone should be surprised that an artificial neural network can exhibit a similar kind of behaviour, and that for convenience we would call it by the same word: "memorizing". I'm not saying that every single image is memorized, any more than I have memorized every image I've ever seen. But I do remember some very well - especially if I've seen them many times.

Some say that AIs "learn" from the images they "see", but somehow they refuse to say that they "memorize" too. If they're going to make such anthropomorphic analogies, it seems a bit selective, if not hypocritical.

The extent to which something is memorized, or the differences in qualities and how it takes place in an artificial vs. organic neural network, is certainly something to be discussed. But if you want to argue that it's not truly memorizing, like the argument that ANNs don't have true intelligence, well, ok... but that's also a kind of "no true Scotsman" argument that's a bit meaningless.

8

evanthebouncy t1_j6wpf34 wrote

I made a bet in 2019 to _not_ learn any more on how to fiddle with NN architectures. It paid off. Now I just send data to a huggingface API and it figures out the rest.

What will change? What are my thoughts?

All well identified problems become rat races. If there's a metric you can put on it, engineers will optimize it away. The comfort of knowing what you're doing has a well-defined metric is paid for in the anxiety of the rat race of everyone optimizing the same metric.

What do we do with this?

Work on problems that don't have a well defined metric. Work with people. Work with the real world. Work with things that defies quantification, that are difficult to reduce to a mere number that everyone agrees on. That way you have some longevity in the field.

5

sharky6000 t1_j6wo5mi wrote

What do you want to know?

You should look up counterfactual regret (CFR) minimization, it has been the technique that underlies all the expert poker bots.

Then, if you are interested in hold'em variants, look up DeepStack, Libratus, Pluribus, ReBeL, and Player of Games.

All of the competitive bots on the hold'em variants use some form of specialized search (based on CFR or Monte Carlo CFR) over the public belief state tree.

The card draw variants are mostly untouched because the public tree methods are not as easily applicable.

Anyway feel free to dm me if you want to know more.

2