Recent comments in /f/MachineLearning
hcarlens OP t1_jbdv11j wrote
Reply to comment by senacchrib in [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Thanks! I didn't create a separate category for learning-to-rank problems because they often overlap with other domains.
For example, some of the conservation competitions (https://mlcontests.com/state-of-competitive-machine-learning-2022/#conservation-competitions) are L2R problems on image data.
Or the Amazon/AIcrowd competitions (https://mlcontests.com/state-of-competitive-machine-learning-2022/#nlp--search) which were L2R with NLP.
In reality the mapping of competition:(competition type) is almost always one:many, and I'm planning on updating the ML Contests website to reflect that!
If I'd had more time and better data I would have sliced the data in multiple different ways to also look into e.g. L2R problems specifically in more depth.
Hari___Seldon t1_jbduk7r wrote
Reply to [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
This is a conversation that would be best be had with researchers who are already well immersed in the fields relating to remote and automated medical procedures. It's a vibrant area of research that has a fairly low degree of public awareness because of how many different disciplines are required to interact at an incredibly high level of fidelity and reliability under the extremely unpredictable environment of the human body.
Groups like the Scientific Computing and Imaging Lab at the University of Utah (with whom I have no affiliation but I have followed their work for decades) have been working in partnership with similar labs around the world for years to pioneer the types of technology you'd like to support and explore. By far, your best bet is to start understanding who the active research centers are, begin to learn the types of problem-solving resources that exist and the types of challenges that are specific to your medical domain, and find the areas within the vast array of resources that will inspire and motivate you through the rest of your journey.
I say all this not to discourage you, rather to encourage you to prepare appropriately for the mountain you're looking to climb. The foundations for this type of work have grown steadily for five decades, so you have a strong, resilient terrain upon which to build a legacy.
icwhatudidthr t1_jbdu997 wrote
Reply to [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
Take a look at computer vision techniques in general.
Half of the challenge in doing what you want to achieve is to measure the mouth of the patient, and locate the medical instruments w.r.t. it.
Find a partner with expertise in robotics and computer vision, that will help with the technical part.
Blakut t1_jbdu8zu wrote
Reply to [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
idk why the idea of a robot digging into my teeth while i'm strapped to a chair seems terrifying.
stargazer1Q84 t1_jbdu2rq wrote
Reply to comment by ML4Bratwurst in [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
this is by far the most useful thing to do, OP. Sure, you can start learning the ML basics, but it's going to take a very long time until you can contribute something useful to the field if you start from 0. Data, however, is the big filter for ML projects and you are in the unique position to be able to collect such valuable data.
Get in touch with a local university and see how you can help them out
graphicteadatasci t1_jbdt33t wrote
Reply to comment by enjakuro in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
Yeah, because there's some very nice results on classification models where they remove data that doesn't contribute to learning and it made training faster and more accurate. But of course I can't remember at all what the paper was called.
ML4Bratwurst t1_jbds1il wrote
Reply to [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
I think in your position the most valuable thing you could do is to collect and provide dental training data to experts. You could also try to get in contact with a local university and see if they have some (future) medical/dental AI projects for their students and scientists which you could support by providing expert knowledge
londons_explorer t1_jbdrv43 wrote
Reply to [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
What do you have to dedicate to this?
Time? Money? How much of each?
If it's just your time, I would start with hobby/kit robotics stuff, perhaps remote controlled (ie. Nothing smart) and show it doing dental work on plastic models of teeth with real tools. Then make a YouTube channel about your work, successes and failures.
That YouTube channel will hopefully get the next generation interested in actually doing the task properly.
If you have serious money to dedicate to the cause, I would try to start a startup, hiring a robotics expert, and someone who has previously worked in the medical devices field (there are soooooo many laws - navigating the legal landscape is probably trickier than making a robot do a filling). Obviously you should also go get VC funding wherever possible, but by putting in a chunk of your own money that will be far easier.
Jepacor t1_jbdrovb wrote
Reply to comment by whata_wonderful_day in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
The link to the model is in the Google sheets they linked : https://github.com/facebookresearch/fairseq/blob/main/examples/megatron_11b/README.md
londons_explorer t1_jbdrgbx wrote
Reply to comment by wittfm in [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
The overlap between ML and robotics is increasing day by day...
wittfm t1_jbdrc1y wrote
Reply to [D] I'm a dentist and during my remaining lifetime I would like to take part in laying groundwork for future autonomic robots powered by AI that are capable of performing dental procedures. What technologies should I start to learn? by Armauer
ChatGPT definitely would not help there. And not even ML in general. What you want to accomplish is more related to the field of robotics and control systems.
__Maximum__ OP t1_jbdr6zj wrote
Reply to comment by Taenk in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Not quite. Assuming you have certain compute, if you have a model with 1B parameters, then use a dataset of 20B tokens. Look at the figures in Chinchilla paper, they demonstrate it nicely.
__Maximum__ OP t1_jbdqy5c wrote
Reply to comment by adt in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Thanks for the links. Looks like RoBERTa did not gain a lot from the additional trainings, only minor improvements, but yeah, it was a tiny model. How was this not a good lesson? Why did people need Chinchilla? Maybe it's just having a lot of data comes easy so people gather as much as possible, even though they know they will go maximum 1 epoch over it.
NitroXSC t1_jbdp6ge wrote
Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Interesting meta-study with many remarkable trends.
This is seen from the competitor's side, but what correctly the best website to set up a simple prediction competition? I'm asking this since I'm planning on creating a small competition for students of one of the courses given (no large files needed).
CKtalon t1_jbdjaxa wrote
Reply to comment by Taenk in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Instead of choosing a huge model and having it undertrained due to limited compute budget, choose the small but biggest model for your compute budget using their estimates. It doesn’t necessarily mean that a small model trained with larger datasets will naturally beat a bigger model.
Taenk t1_jbdidpy wrote
Reply to comment by CKtalon in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Can you rephrase that a little bit? Does it mean that Chinchilla answers „assuming that you have one Teraflop of compute time, use 20 tokens of data per parameter of model, then you hit diminishing returns in the sense that you could train another model from scratch faster“ and LLaMA answers „assuming you want optimal performance at inference time, regardless of compute budget, even small models can benefit from larger datasets“?
KD_A t1_jbdi9x4 wrote
Reply to comment by murrdpirate in [D] To Make Your Model Better, First Figure Out What's Wrong by pgao_aquarium
Notice that "significantly lower" can't actually be defined. There isn't a useful definition of overfitting which only requires observing a single model's train and test gap.^1 Here's a contrived but illustrative example: say model A has a training error rate of 0.10 and test error rate of 0.30. It's tempting to think "test error is 3x train error, we're overfitting". This may or may not be right; there absolutely could be a (more complex) model B with, e.g., training error rate 0.05, test error rate 0.27. Notice that the train-test gap increased going from A to B. But I don't care. Assuming these estimates are good, and all I care about is minimizing expected error rate, I'd confidently deploy model B over model A.
The useful definition of overfitting is that it refers to a region in function space where test error goes up and training error goes down (as model complexity goes up). Diagram (for underparametrized models). This definition tells us that the only good way to tell whether a model should be made more simple or more complex is to fit multiple models and compare them. This info is expensive to obtain for NNs, and obtaining it makes one look less clever. But it gives a reliable hint as to how a model should be iterated.^2 In the example above, if we really did observe that model B, then perhaps our next one should be even more complex.
If you're asking more specifically about reading NN loss curves, I haven't seen any science which puts claims like #4 here to the test.^3 I'd also like to mention another common issue w/ reading NN loss curves: people usually don't take care in estimating training loss. The standard NN training loop results in overestimates, which will make the gap between training and validation appear bigger than it actually is. I happened to write about this problem today, here in CrossValidated.
Footnotes
-
2 exceptions to this: (1) you're already quite familiar w/ the type of task and data, so you can correlate high gaps with overfitting based on previous experience (2) test error is higher than an intercept-only model, and training error is much lower.
-
Double descent complicates this workflow. For overparametrized models like NNs, one can be deceived into not going far enough when increasing model complexity. Or it's difficult to determine whether a certain intervention is actually increasing or decreasing complexity. This paper characterizes various forms of double descent.
-
My answer to #4 would be the same as what I wrote when criticizing the caption in my first comment. The provided answer—"reduce model capacity"—is too vague. The answer should be: select the model checkpoint from halfway, simply b/c its test error is the lowest. The graph alone doesn't tell you anything about how the model should be iterated beyond that info. That's b/c each point on the curve is the loss after being trained for n iterations, conditional on all of the other factors which modulate the model's complexity. There could absolutely be a model w/ more depth, more width, etc. which performs better than the simpler model trained halfway.
Alchera_QQ t1_jbdf7dx wrote
Reply to comment by yumiko14 in [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Great one, thanks ;)
saintshing t1_jbdbgwy wrote
Reply to comment by hcarlens in [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Is pytorch also better than TF for usecases where I have to do training/inference on mobile?
markasoftware t1_jbdbgm9 wrote
Reply to comment by ReginaldIII in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
see AGPL, which is more like what you were imagining.
_TheHalfTruth_ t1_jbdaf27 wrote
Reply to comment by rm-rf_ in [D] Are Genetic Algorithms Dead? by TobusFire
Metaheuristic algorithms like GA and simulated annealing are almost identical to Bayesian methods/MCMC. Metaheuristic algorithms are Bayesian methods if you can pretend that your objective function is proportional to a probability distribution that you want to maximize. They just take unique approaches to exploring the posterior distribution. But conceptually they’re identical
murrdpirate t1_jbd9755 wrote
Reply to comment by KD_A in [D] To Make Your Model Better, First Figure Out What's Wrong by pgao_aquarium
>It does not tell you about any other factors which modulate model complexity.
Can you expand on that? My general understanding is that if I'm seeing significantly lower training losses than validation losses, then my model complexity is too high compared to the data (unless there's something wrong with the data).
whata_wonderful_day t1_jbcxdwf wrote
Reply to comment by adt in [D] Can someone explain the discrepancy between the findings of LLaMA and Chinchilla? by __Maximum__
Nice! How did you get access to Megatron-11B? I can't find it online anywhere
hcarlens OP t1_jbdv2vl wrote
Reply to comment by XGDragon in [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Thanks! Yes, but I didn't manage to get as much data as I wanted for the competitions on there. I emailed some of the competition organisers but didn't get a response.