Recent comments in /f/MachineLearning
CriticalTemperature1 t1_j6g6xv5 wrote
Reply to [D] AI Theory - Signal Processing? by a_khalid1999
The S4 Transformer uses structured state spaces which is a concept from EE that models the hidden state with differential equations. Seems to have SOTA results on a lot of tasks
halohalobeetch t1_j6g5hgf wrote
tysam_and_co OP t1_j6g5fb8 wrote
Reply to comment by unhealthySQ in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
Thank you very much, I appreciate your kind words. Good luck to you in all of your future endeavors as well! :D :) <3 <3 :))))
unhealthySQ t1_j6g4zsz wrote
Reply to comment by tysam_and_co in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
Thank you for the answer!
your work is highly impressive and I wish you continued success in your efforts; as I could see the work you do here having very appealing applications down the line.
Professional-Ebb4970 t1_j6g3lf2 wrote
Reply to comment by Sofi_LoFi in [D] Remote PhD by TheRealMrMatt
Being a full-time commitment and being remote aren't mutually exclusive though
tysam_and_co OP t1_j6g3e49 wrote
Reply to comment by unhealthySQ in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
Hello! Thanks so much for comment, I really appreciate it. This is a convnet-based architecture, so it's carrying on the torch of some of the old DawnBench entries.
Transformers have the best top-end of all of the neural networks, and convolutional networks tend to have an edge in the smaller/tiny regime, IIRC. One could maximize training speed for a transformer architecture, but the cost of just 1-2 layers could be several times the cost of an entire forward pass through this very tiny convnet. I even tried to just add a really tiny 16x16 attention multiply at the end of the network and it totally tanked the training speed.
However, that said, I'd really like to pick up the work of https://arxiv.org/abs/2212.14034 and continue from there, the concept of getting an algorithm to really compress that info can start opening up the horizon to some of the hard laws that underlie neural network training in the limit. For example, somewhere along the way now, apparently we have really strong consistency with scaling laws on the convnet for this project. I'm not sure why.
But in any case -- language models are hopefully next (if I get the time and have the interest/don't burn myself out on this project in the meantime!). I'll probably be focused on picking up some part-time research work in the field between here and then first, as that's my first priority right now (aside from a few community code contributions. This codebase is my living resume after all, and I think a good one at that! :D)
Hope that helped answer your question, and if not, please let me know and I'll give you my best shot! :D
unhealthySQ t1_j6g2h65 wrote
Reply to comment by tysam_and_co in [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co
So just to be sure I read things correctly, this project is about optimizing training speed for Transformer neural networks?
Own_Quality_5321 t1_j6g2gk0 wrote
Reply to comment by Fancy-Jackfruit8578 in [D] Remote PhD by TheRealMrMatt
It depends on the university.
Own_Quality_5321 t1_j6g29ka wrote
Reply to comment by Sofi_LoFi in [D] Remote PhD by TheRealMrMatt
There are part-time PhD studentships. I'm pretty sure of that.
tysam_and_co OP t1_j6g0mvc wrote
Hello everyone,
We're continuing our journey to training CIFAR10 to 94% in under 2 seconds, carrying on the lovely work that David Page began when he took that one single-GPU dawnbench entry from over 10 minutes to 24 seconds. Things are getting much, much tighter now as there is not as much left to trim, but we do have a "comfortable" road ahead still, provided enough sweat, blood, and tears are put in to make certain methods work under the (frankly ridiculous) torrent of information being squeezed into this network. Remember, we're breaking 90% having only seen each training set image 5 times during training. 5. times! Then 94% at 10 times. To me, that is hard to believe.
I am happy to answer any questions, please be sure to read the v0.3.0 patch notes if you would like a more verbose summary of the changes that we've made to bring this network from ~12.34-12.38 seconds in the last patch to ~9.91-9.96 seconds in the current one. The baseline of this implementation started at around ~18.1 seconds total, so incredibly we have almost halved our starting speed, and that is only within a few months of the project's start back in October/November of last year.
Please do ask or say anything if it's on your mind, this project hasn't gotten a lot of attention and I'd love to talk to some like-minded people about it. This is pretty darn cool stuff!
Many thanks,
Tysam&co
JohnConquest t1_j6fzh66 wrote
Reply to comment by omgpop in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971
Thanks for the suggestion, just tried it out however and there seems to be a bug or two, one of which is where it loops the same subtitle over and over.
gunshoes t1_j6fyskw wrote
Reply to comment by MrEloi in [P] AI Content Detector by YoutubeStruggle
Eh, depends on context. People forget that all the things that go into writing (drafting, rewriting, sounding words out to make sure they articulate what you mean), is a pedagogical act in itself. Assignments aren't supposed to be busy work, they're additional opportunities for learning in which students have to evaluate their own writing strategies. Using AI tools removes that element of metacognition and reduces assignments to just prompt tuning. If you're just filling out reports and are suffering writer's block, sure, why not. But other cases the writing process is the lesson.
Maleficent-Rate6479 t1_j6fx4hp wrote
Reply to comment by RogerKrowiak in [D] Simple Questions Thread by AutoModerator
If your response variable is sex then you meed to make it binary, otherwise I do not see a problem I think.
JaCraig t1_j6fws2x wrote
Reply to comment by royalemate357 in [P] AI Content Detector by YoutubeStruggle
Just adding on that I used ChatGPT and adding any sort of "write it in the style of X" to the end fools it. Tell it to do some run on sentences, etc. same thing.
[deleted] t1_j6fviej wrote
[removed]
[deleted] t1_j6fv3vy wrote
[deleted]
Drisku11 t1_j6fnn3p wrote
Reply to comment by [deleted] in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971
You realize that's a dude, right?
idsardi t1_j6fjmjl wrote
Reply to [D] Remote PhD by TheRealMrMatt
In addition to what others have said, many institutions have a "residency" requirement for the PhD, requiring at least one year of full-time on-campus study. Personally (and I am a department chair), I think that this all needs to be modernized, but I don't expect that to happen for several years yet, even though COVID. There are some online programs, but in terms of admissions you're competing against people who are happy to be on-campus, so whether you like it or not, the admissions committees are going to prefer those people over you.
[deleted] t1_j6fhotk wrote
[removed]
Fancy-Jackfruit8578 t1_j6ffvgh wrote
Reply to [D] Remote PhD by TheRealMrMatt
Usually no, because phd students most of the most have to TA.
tealocked t1_j6ff2pz wrote
Reply to [D] Meta AI Residency 2023 by BeautyInUgly
I've applied for the UK one aswell, also didn't hear anything yet..
MadScientist-1214 t1_j6fbt5k wrote
Reply to [D] Remote PhD by TheRealMrMatt
Yes, but that depends on your supervisor. I did my PhD completely remotely for half a year but I'm not at a top institute.
mr_birrd t1_j6f7h26 wrote
Reply to [D] AI Theory - Signal Processing? by a_khalid1999
Well like reinforcement learning uses a lot of markov chains, forward/backwards filtering/smoothing etc. Kalman filters are also a sort of Gaussian Process Regression. There is a huge overlap in the classical ML part with signal processing. No specific paper but it's just that ML and especially deep learning often takes already existing ideas from physics or ee and try to apply it on some data, see what happens.
gunshoes t1_j6f796o wrote
Reply to [D] Remote PhD by TheRealMrMatt
It's not really a thing. Also kinda defeats the purpose of a PhD (research within an intellectual community of scholars). There's just too many variables across program needs and university funding limitations for it to be worth developing. Also, for a good number of people, you develop remote opportunities during the course of your PhD. Like mine is effectively remote in practice but that's just because of my research focus.
deep_noob t1_j6g9pgq wrote
Reply to comment by bombay_doors in [D] CVPR Reviews are out by banmeyoucoward
three weak accepts, remain same after rebuttal