badabummbadabing t1_ja8bzhc wrote on February 27, 2023 at 4:27 PM

Reply to comment by LetterRip in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

https://cardiacmr.hms.harvard.edu/files/cardiacmr/files/isensee_etal_nature2021_nnunet.pdf Check Figure 4. Architecture barely matters on average.

JanBitesTheDust OP t1_ja8avd4 wrote on February 27, 2023 at 4:19 PM

Reply to comment by ch9ki7 in [P] Basic autodiff library for scalar values in C by JanBitesTheDust

I actually have never wrapped C code for python hahaha. I wonder how difficult it is?

Impressive-Smile5659 t1_ja8anh7 wrote on February 27, 2023 at 4:18 PM

Reply to comment by mlmaster17 in [N] New 1.0 release of Deep Graph Library (DGL) by jermainewang

I have been struggling to get the accuracy with DGL above 60. So i stick with PyG.

Will test out a few GNNs on some random dataset to see if it works better now.

sebzim4500 t1_ja8agwp wrote on February 27, 2023 at 4:17 PM

Reply to comment by coconautico in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

Are you using the output of ChatGPT to determine which inputs you copy across and which ones you don't? If not, I agree that you are probably in the clear. Otherwise idk.

coconautico OP t1_ja8abnh wrote on February 27, 2023 at 4:16 PM

Reply to comment by sebzim4500 in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

I can't use the output of ChatGPT to train other systems, but I can use my input however I want because, according to the TOS, I'm the owner of it.

Gamond_Jass t1_ja8aahj wrote on February 27, 2023 at 4:15 PM

Reply to [N] New 1.0 release of Deep Graph Library (DGL) by jermainewang

I was using it on my research, I’m glad the first stable version dropped. I’ll check the changes.

LetterRip t1_ja89hul wrote on February 27, 2023 at 4:10 PM

Reply to [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

use bitsandbites 8bit

https://github.com/TimDettmers/bitsandbytes

Franc000 t1_ja894hj wrote on February 27, 2023 at 4:08 PM

Reply to comment by st8ic in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152

Who cares if the research is actually good?

LetterRip t1_ja88v3i wrote on February 27, 2023 at 4:06 PM

Reply to comment by badabummbadabing in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

Which particular paper?

Abject-Stomach5708 t1_ja887ct wrote on February 27, 2023 at 4:01 PM

Reply to [D] Is RL dead/worth researching these days? by [deleted]

do what you like and you believe. Otherwise you are just doing something because of what other people think.

sebzim4500 t1_ja87cym wrote on February 27, 2023 at 3:56 PM

Reply to comment by coconautico in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

> You may not [...] (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI

sebzim4500 t1_ja874jk wrote on February 27, 2023 at 3:54 PM

Reply to comment by avocadoughnut in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico

Oh how the turntables.

ch9ki7 t1_ja86mi0 wrote on February 27, 2023 at 3:51 PM

Reply to [P] Basic autodiff library for scalar values in C by JanBitesTheDust

now you might want to provide the python wrapper for this. would be pretty interesting for smaller simpler optimization cases like arbitrary curve fitting.

307thML t1_ja85qu7 wrote on February 27, 2023 at 3:45 PM

Reply to [D] Is RL dead/worth researching these days? by [deleted]

You're correct that RL has been struggling. Not because of the impressive results by LLMs and image generators, but because the progress within RL has been very slow. People who say otherwise have just forgotten what fast progress looks like; remember 2015-2018 when we first saw human-level Atari play, superhuman Go play, and then superhuman Atari play, as well as impressive results in Starcraft and Dota. I think if you'd asked someone back in 2018 what the next 5 years of RL would look like they would have expected progressively more complicated games to fall, and for agents to graduate from playing with game-state information, as AlphaStar and OpenAI Five did, to besting humans on a level playing field by playing based off of the pixels on the screen the way that agents in Atari do. This hasn't happened.

Instead it turned out that all of this progress was constrained to narrow fields; specifically, games with highly limited input spaces (hence why OpenAI Five and AlphaStar had to take the gamestate directly, which means they get access to information that humans don't) and games where exploration is easy (can be handled in large part or entirely by making random moves some percentage of the time).

I don't think this means the field is dead mind you but it certainly hasn't been making much progress lately.

badabummbadabing t1_ja7yxbg wrote on February 27, 2023 at 2:58 PM

Reply to comment by Scared_Employer6992 in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

Don't use batch normalization. Lots of U-Nets use e.g. instance normalisation. A batch size of 1 should be completely fine (but you will need to play with the learning rate upon changing this). Check the 'no new U-Net' (aka NN-Unet) paper by Fabian Isensee for the definitive resource on what matters in U-Nets.

QuadmasterXLII t1_ja7yo0f wrote on February 27, 2023 at 2:56 PM

Reply to comment by QuadmasterXLII in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

Your problem is the U-Net backbone, not the loss function. Assuming that you're married to a batch size of 4, the final convolution to get to 4 x 200 x 500 x 500, crossentropy, and the backpropagation should only take maybe 10 GB, so cram your architecture into the remaining 30GB

import torch
x = torch.randn([4, 128, 500, 500]).cuda()
z = torch.nn.Conv2d(128, 200, 3)
z.cuda()
q = torch.randint(0, 200, (4, 498, 498)).cuda()
torch.nn.CrossEntropyLoss()(z(x), q).backward()

for example, takes 7.5 GB.

badabummbadabing t1_ja7yb9y wrote on February 27, 2023 at 2:54 PM

Reply to [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

The problem might be the number of output channels at high resolution. Instead of computing the final layer's activations and gradients in parallel for each channel, you should be able to sequentially compute each channel's loss and add their gradients in the end. This is easy, because the loss decomposes as a sum over the channels (and thus, also the channels' gradients).

In pytorch, this whole thing should then be as simple as running the forward and backward passes for the channels of the final layer sequentially (before calling optimizer.step() and optimizer.zero_grad() once). You will probably also need to retain_graph=True on every backward call, otherwise the activations in the preceding layers will be deleted before you get to the next channel.

Scared_Employer6992 OP t1_ja7xpjt wrote on February 27, 2023 at 2:49 PM

Reply to comment by QuadmasterXLII in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

I haven't tried with bs=1, but I also don't want to use bs=1 as I usually get bad results with it and my net has a lot of BN layers.

QuadmasterXLII t1_ja7wog6 wrote on February 27, 2023 at 2:42 PM

Reply to [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992

... does it fit with batch size 1?

gniorg t1_ja7sjkn wrote on February 27, 2023 at 2:10 PM

Reply to comment by cthorrez in [D] Is RL dead/worth researching these days? by [deleted]

So basically, batch reinforcement learning / offline RL? The family of algorithms is useful for recommender systems, amongst others.

Available_Lion_652 t1_ja7rsg9 wrote on February 27, 2023 at 2:04 PM

Reply to [R] Large language models generate functional protein sequences across diverse families by MysteryInc152

It s obvious. Because they memorize text

CellWithoutCulture t1_ja7dklj wrote on February 27, 2023 at 11:48 AM

Reply to comment by tdgros in [D] Is RL dead/worth researching these days? by [deleted]

> Toolformer

....oh you're right it didn't. I assumed they let it use any tool which would need RL. But it seems like they had pre-labelled ways to use tools.

Thanks for pointing that out.

mlmaster17 t1_ja777wt wrote on February 27, 2023 at 10:24 AM

Reply to comment by KBM_KBM in [N] New 1.0 release of Deep Graph Library (DGL) by jermainewang

I actually find it (DGL) very easy to use. I switched from PyG just over a year ago because DGL was easier to install across MacOS, Linux, and Windows. Some of the recent PyG updates are interesting but not enough for me to move back. Anyway, I find both libraries to be very similar so I think either choice is good.

Tea_Pearce t1_ja753ng wrote on February 27, 2023 at 9:53 AM

Reply to [D] Is RL dead/worth researching these days? by [deleted]

Imo it depends on what you mean by RL. If you interperet RL as the 2015-19 collection of algorithms that train deep NN agents tabula rasa (from zero knowledge), I'd be inclined to agree that it doesn't seem a particularly fruitful research direction to get into. But if you interperet RL as a general problem setting, where an agent must learn in a sequential decision making environment, you'll see that it's not going away.

To me the most interesting recent research in RL (or whatever you want to name it) is figuring out how to leverage existing datasets or models to get agents working well in sequential environments. Think SayCan, ChatGPT, Diffusion BC...

_der_erlkonig_ t1_ja74633 wrote on February 27, 2023 at 9:39 AM

Reply to comment by walk-the-rock in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152

Socher's been gone from Salesforce for years

Recent comments in /f/MachineLearning