Recent comments in /f/MachineLearning

tysam_and_co OP t1_j6g3e49 wrote

Hello! Thanks so much for comment, I really appreciate it. This is a convnet-based architecture, so it's carrying on the torch of some of the old DawnBench entries.

Transformers have the best top-end of all of the neural networks, and convolutional networks tend to have an edge in the smaller/tiny regime, IIRC. One could maximize training speed for a transformer architecture, but the cost of just 1-2 layers could be several times the cost of an entire forward pass through this very tiny convnet. I even tried to just add a really tiny 16x16 attention multiply at the end of the network and it totally tanked the training speed.

However, that said, I'd really like to pick up the work of https://arxiv.org/abs/2212.14034 and continue from there, the concept of getting an algorithm to really compress that info can start opening up the horizon to some of the hard laws that underlie neural network training in the limit. For example, somewhere along the way now, apparently we have really strong consistency with scaling laws on the convnet for this project. I'm not sure why.

But in any case -- language models are hopefully next (if I get the time and have the interest/don't burn myself out on this project in the meantime!). I'll probably be focused on picking up some part-time research work in the field between here and then first, as that's my first priority right now (aside from a few community code contributions. This codebase is my living resume after all, and I think a good one at that! :D)

Hope that helped answer your question, and if not, please let me know and I'll give you my best shot! :D

25

tysam_and_co OP t1_j6g0mvc wrote

Hello everyone,

We're continuing our journey to training CIFAR10 to 94% in under 2 seconds, carrying on the lovely work that David Page began when he took that one single-GPU dawnbench entry from over 10 minutes to 24 seconds. Things are getting much, much tighter now as there is not as much left to trim, but we do have a "comfortable" road ahead still, provided enough sweat, blood, and tears are put in to make certain methods work under the (frankly ridiculous) torrent of information being squeezed into this network. Remember, we're breaking 90% having only seen each training set image 5 times during training. 5. times! Then 94% at 10 times. To me, that is hard to believe.

I am happy to answer any questions, please be sure to read the v0.3.0 patch notes if you would like a more verbose summary of the changes that we've made to bring this network from ~12.34-12.38 seconds in the last patch to ~9.91-9.96 seconds in the current one. The baseline of this implementation started at around ~18.1 seconds total, so incredibly we have almost halved our starting speed, and that is only within a few months of the project's start back in October/November of last year.

Please do ask or say anything if it's on your mind, this project hasn't gotten a lot of attention and I'd love to talk to some like-minded people about it. This is pretty darn cool stuff!

Many thanks,

Tysam&co

36

gunshoes t1_j6fyskw wrote

Reply to comment by MrEloi in [P] AI Content Detector by YoutubeStruggle

Eh, depends on context. People forget that all the things that go into writing (drafting, rewriting, sounding words out to make sure they articulate what you mean), is a pedagogical act in itself. Assignments aren't supposed to be busy work, they're additional opportunities for learning in which students have to evaluate their own writing strategies. Using AI tools removes that element of metacognition and reduces assignments to just prompt tuning. If you're just filling out reports and are suffering writer's block, sure, why not. But other cases the writing process is the lesson.

1

idsardi t1_j6fjmjl wrote

In addition to what others have said, many institutions have a "residency" requirement for the PhD, requiring at least one year of full-time on-campus study. Personally (and I am a department chair), I think that this all needs to be modernized, but I don't expect that to happen for several years yet, even though COVID. There are some online programs, but in terms of admissions you're competing against people who are happy to be on-campus, so whether you like it or not, the admissions committees are going to prefer those people over you.

4

mr_birrd t1_j6f7h26 wrote

Well like reinforcement learning uses a lot of markov chains, forward/backwards filtering/smoothing etc. Kalman filters are also a sort of Gaussian Process Regression. There is a huge overlap in the classical ML part with signal processing. No specific paper but it's just that ML and especially deep learning often takes already existing ideas from physics or ee and try to apply it on some data, see what happens.

1

gunshoes t1_j6f796o wrote

It's not really a thing. Also kinda defeats the purpose of a PhD (research within an intellectual community of scholars). There's just too many variables across program needs and university funding limitations for it to be worth developing. Also, for a good number of people, you develop remote opportunities during the course of your PhD. Like mine is effectively remote in practice but that's just because of my research focus.

37