Recent comments in /f/MachineLearning
JanBitesTheDust OP t1_ja8avd4 wrote
Reply to comment by ch9ki7 in [P] Basic autodiff library for scalar values in C by JanBitesTheDust
I actually have never wrapped C code for python hahaha. I wonder how difficult it is?
Impressive-Smile5659 t1_ja8anh7 wrote
Reply to comment by mlmaster17 in [N] New 1.0 release of Deep Graph Library (DGL) by jermainewang
I have been struggling to get the accuracy with DGL above 60. So i stick with PyG.
Will test out a few GNNs on some random dataset to see if it works better now.
sebzim4500 t1_ja8agwp wrote
Reply to comment by coconautico in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico
Are you using the output of ChatGPT to determine which inputs you copy across and which ones you don't? If not, I agree that you are probably in the clear. Otherwise idk.
coconautico OP t1_ja8abnh wrote
Reply to comment by sebzim4500 in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico
I can't use the output of ChatGPT to train other systems, but I can use my input however I want because, according to the TOS, I'm the owner of it.
Gamond_Jass t1_ja8aahj wrote
I was using it on my research, I’m glad the first stable version dropped. I’ll check the changes.
LetterRip t1_ja89hul wrote
Franc000 t1_ja894hj wrote
Reply to comment by st8ic in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
Who cares if the research is actually good?
LetterRip t1_ja88v3i wrote
Reply to comment by badabummbadabing in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992
Which particular paper?
Abject-Stomach5708 t1_ja887ct wrote
do what you like and you believe. Otherwise you are just doing something because of what other people think.
sebzim4500 t1_ja87cym wrote
Reply to comment by coconautico in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico
> You may not [...] (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI
sebzim4500 t1_ja874jk wrote
Reply to comment by avocadoughnut in [P] [N] Democratizing the chatGPT technology through a Q&A game by coconautico
Oh how the turntables.
ch9ki7 t1_ja86mi0 wrote
now you might want to provide the python wrapper for this. would be pretty interesting for smaller simpler optimization cases like arbitrary curve fitting.
307thML t1_ja85qu7 wrote
You're correct that RL has been struggling. Not because of the impressive results by LLMs and image generators, but because the progress within RL has been very slow. People who say otherwise have just forgotten what fast progress looks like; remember 2015-2018 when we first saw human-level Atari play, superhuman Go play, and then superhuman Atari play, as well as impressive results in Starcraft and Dota. I think if you'd asked someone back in 2018 what the next 5 years of RL would look like they would have expected progressively more complicated games to fall, and for agents to graduate from playing with game-state information, as AlphaStar and OpenAI Five did, to besting humans on a level playing field by playing based off of the pixels on the screen the way that agents in Atari do. This hasn't happened.
Instead it turned out that all of this progress was constrained to narrow fields; specifically, games with highly limited input spaces (hence why OpenAI Five and AlphaStar had to take the gamestate directly, which means they get access to information that humans don't) and games where exploration is easy (can be handled in large part or entirely by making random moves some percentage of the time).
I don't think this means the field is dead mind you but it certainly hasn't been making much progress lately.
badabummbadabing t1_ja7yxbg wrote
Reply to comment by Scared_Employer6992 in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992
Don't use batch normalization. Lots of U-Nets use e.g. instance normalisation. A batch size of 1 should be completely fine (but you will need to play with the learning rate upon changing this). Check the 'no new U-Net' (aka NN-Unet) paper by Fabian Isensee for the definitive resource on what matters in U-Nets.
QuadmasterXLII t1_ja7yo0f wrote
Reply to comment by QuadmasterXLII in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992
Your problem is the U-Net backbone, not the loss function. Assuming that you're married to a batch size of 4, the final convolution to get to 4 x 200 x 500 x 500, crossentropy, and the backpropagation should only take maybe 10 GB, so cram your architecture into the remaining 30GB
import torch
x = torch.randn([4, 128, 500, 500]).cuda()
z = torch.nn.Conv2d(128, 200, 3)
z.cuda()
q = torch.randint(0, 200, (4, 498, 498)).cuda()
torch.nn.CrossEntropyLoss()(z(x), q).backward()
for example, takes 7.5 GB.
badabummbadabing t1_ja7yb9y wrote
Reply to [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992
The problem might be the number of output channels at high resolution. Instead of computing the final layer's activations and gradients in parallel for each channel, you should be able to sequentially compute each channel's loss and add their gradients in the end. This is easy, because the loss decomposes as a sum over the channels (and thus, also the channels' gradients).
In pytorch, this whole thing should then be as simple as running the forward and backward passes for the channels of the final layer sequentially (before calling optimizer.step() and optimizer.zero_grad() once). You will probably also need to retain_graph=True on every backward call, otherwise the activations in the preceding layers will be deleted before you get to the next channel.
Scared_Employer6992 OP t1_ja7xpjt wrote
Reply to comment by QuadmasterXLII in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992
I haven't tried with bs=1, but I also don't want to use bs=1 as I usually get bad results with it and my net has a lot of BN layers.
QuadmasterXLII t1_ja7wog6 wrote
Reply to [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992
... does it fit with batch size 1?
gniorg t1_ja7sjkn wrote
Reply to comment by cthorrez in [D] Is RL dead/worth researching these days? by [deleted]
So basically, batch reinforcement learning / offline RL? The family of algorithms is useful for recommender systems, amongst others.
Available_Lion_652 t1_ja7rsg9 wrote
Reply to [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
It s obvious. Because they memorize text
CellWithoutCulture t1_ja7dklj wrote
Reply to comment by tdgros in [D] Is RL dead/worth researching these days? by [deleted]
> Toolformer
....oh you're right it didn't. I assumed they let it use any tool which would need RL. But it seems like they had pre-labelled ways to use tools.
Thanks for pointing that out.
mlmaster17 t1_ja777wt wrote
Reply to comment by KBM_KBM in [N] New 1.0 release of Deep Graph Library (DGL) by jermainewang
I actually find it (DGL) very easy to use. I switched from PyG just over a year ago because DGL was easier to install across MacOS, Linux, and Windows. Some of the recent PyG updates are interesting but not enough for me to move back. Anyway, I find both libraries to be very similar so I think either choice is good.
Tea_Pearce t1_ja753ng wrote
Imo it depends on what you mean by RL. If you interperet RL as the 2015-19 collection of algorithms that train deep NN agents tabula rasa (from zero knowledge), I'd be inclined to agree that it doesn't seem a particularly fruitful research direction to get into. But if you interperet RL as a general problem setting, where an agent must learn in a sequential decision making environment, you'll see that it's not going away.
To me the most interesting recent research in RL (or whatever you want to name it) is figuring out how to leverage existing datasets or models to get agents working well in sequential environments. Think SayCan, ChatGPT, Diffusion BC...
_der_erlkonig_ t1_ja74633 wrote
Reply to comment by walk-the-rock in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
Socher's been gone from Salesforce for years
badabummbadabing t1_ja8bzhc wrote
Reply to comment by LetterRip in [D] Training a UNet-like architecture for semantic segmentation with 200 outcome classes. by Scared_Employer6992
https://cardiacmr.hms.harvard.edu/files/cardiacmr/files/isensee_etal_nature2021_nnunet.pdf Check Figure 4. Architecture barely matters on average.