Recent comments in /f/MachineLearning

307thML t1_ja85qu7 wrote

You're correct that RL has been struggling. Not because of the impressive results by LLMs and image generators, but because the progress within RL has been very slow. People who say otherwise have just forgotten what fast progress looks like; remember 2015-2018 when we first saw human-level Atari play, superhuman Go play, and then superhuman Atari play, as well as impressive results in Starcraft and Dota. I think if you'd asked someone back in 2018 what the next 5 years of RL would look like they would have expected progressively more complicated games to fall, and for agents to graduate from playing with game-state information, as AlphaStar and OpenAI Five did, to besting humans on a level playing field by playing based off of the pixels on the screen the way that agents in Atari do. This hasn't happened.

Instead it turned out that all of this progress was constrained to narrow fields; specifically, games with highly limited input spaces (hence why OpenAI Five and AlphaStar had to take the gamestate directly, which means they get access to information that humans don't) and games where exploration is easy (can be handled in large part or entirely by making random moves some percentage of the time).

I don't think this means the field is dead mind you but it certainly hasn't been making much progress lately.

1

badabummbadabing t1_ja7yxbg wrote

Don't use batch normalization. Lots of U-Nets use e.g. instance normalisation. A batch size of 1 should be completely fine (but you will need to play with the learning rate upon changing this). Check the 'no new U-Net' (aka NN-Unet) paper by Fabian Isensee for the definitive resource on what matters in U-Nets.

10

QuadmasterXLII t1_ja7yo0f wrote

Your problem is the U-Net backbone, not the loss function. Assuming that you're married to a batch size of 4, the final convolution to get to 4 x 200 x 500 x 500, crossentropy, and the backpropagation should only take maybe 10 GB, so cram your architecture into the remaining 30GB

import torch
x = torch.randn([4, 128, 500, 500]).cuda()
z = torch.nn.Conv2d(128, 200, 3)
z.cuda()
q = torch.randint(0, 200, (4, 498, 498)).cuda()
torch.nn.CrossEntropyLoss()(z(x), q).backward()

for example, takes 7.5 GB.

2

badabummbadabing t1_ja7yb9y wrote

The problem might be the number of output channels at high resolution. Instead of computing the final layer's activations and gradients in parallel for each channel, you should be able to sequentially compute each channel's loss and add their gradients in the end. This is easy, because the loss decomposes as a sum over the channels (and thus, also the channels' gradients).

In pytorch, this whole thing should then be as simple as running the forward and backward passes for the channels of the final layer sequentially (before calling optimizer.step() and optimizer.zero_grad() once). You will probably also need to retain_graph=True on every backward call, otherwise the activations in the preceding layers will be deleted before you get to the next channel.

16

mlmaster17 t1_ja777wt wrote

I actually find it (DGL) very easy to use. I switched from PyG just over a year ago because DGL was easier to install across MacOS, Linux, and Windows. Some of the recent PyG updates are interesting but not enough for me to move back. Anyway, I find both libraries to be very similar so I think either choice is good.

13

Tea_Pearce t1_ja753ng wrote

Imo it depends on what you mean by RL. If you interperet RL as the 2015-19 collection of algorithms that train deep NN agents tabula rasa (from zero knowledge), I'd be inclined to agree that it doesn't seem a particularly fruitful research direction to get into. But if you interperet RL as a general problem setting, where an agent must learn in a sequential decision making environment, you'll see that it's not going away.

To me the most interesting recent research in RL (or whatever you want to name it) is figuring out how to leverage existing datasets or models to get agents working well in sequential environments. Think SayCan, ChatGPT, Diffusion BC...

2