Recent comments in /f/MachineLearning
Epitaque t1_j4qvreg wrote
Reply to comment by serge_mamian in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27
Don’t know what country you’re in, but it regularly goes in stock during the week on Newegg at ~$1700
navillusr t1_j4qumlu wrote
Reply to [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
- If you list instructions step by step, the model doesn’t require reasoning to solve the problem. This is testing a very basic form of intelligence.
- Adept.ai can already solve more complex challenges than this (but still nowhere near AGI). They use a chatbot to automate simple tasks in common programs using LLMs.
- There’s a benchmark that already tests tasks like this, MiniWoB++
[deleted] t1_j4qu0xg wrote
> I attended a talk last year in which one speaker claimed that due to the way stochastic gradient descent works, it could be that some minimums are never reachable from some initialization states no matter how long one trains. Unfortunately I cannot find what paper/theorem he was referring to.
Some examples of NN failing to learn would be constant initializations. Especially easy to see with zero initialization.
​
As for a general framework. If you're familiar with the [Universal Approximation Theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem). Particular papers discuss convergence rates - https://proceedings.neurips.cc/paper/2020/file/2000f6325dfc4fc3201fc45ed01c7a5d-Paper.pdf
I think it would be a function of your particular problem. I've seen this examined from the perspective of learning frameworks as well such as PAC learning. Reading into that may answer some of your specific questions. I'm not aware of any general result outside of some comments on bounding generalization error based on data segmentation.
At a blush, your question about knowing you'll reach the minimum in K steps feels halting problem-ish. So I'd have to think about it later to convince myself fully.
data_wizard_1867 t1_j4qstlq wrote
Reply to comment by youregonnalovemynuts in [R] The Predictive Forward-Forward Algorithm by radi-cho
I like your likening of MNIST to mouse experiments. Someone should make a hierarchy of evidence equivalent for ML research. Since that's largely focused on medical research.
Dartagnjan OP t1_j4qs8zx wrote
Reply to comment by SetentaeBolg in [D] Are there any results on convergence guarantees when optimizing NNs? by Dartagnjan
Thank for confirming my suspicions. Do you happen to have a reference for that case when optimizations methods influence optimization in such a way to inhibit convergence to some better set of minimas?
WikiSummarizerBot t1_j4qs4st wrote
Reply to comment by Dendriform1491 in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
>SHRDLU was an early natural-language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the state of a simplified "blocks world", essentially a virtual box filled with different blocks. SHRDLU was written in the Micro Planner and Lisp programming language on the DEC PDP-6 computer and a DEC graphics terminal. Later additions were made at the computer graphics labs at the University of Utah, adding a full 3D rendering of SHRDLU's "world".
^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)
Dendriform1491 t1_j4qs2sf wrote
Reply to [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
Your unconquerable benchmark is below the level of achievement attained by research from 1970
Repulsive_Tart3669 t1_j4qqivs wrote
Reply to comment by BenoitParis in [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
This should be considered in the first place. For instance, gradient boosting trees that are mostly implemented in C/C++ and have GPU compute backends - XGBoost, CatBoost and LightGBM. Given daily updates, you'll have enough time not only to train a model, but also optimize its hyperparameters. In my experience, XGBoost + RayTune work just fine.
Yidam t1_j4qohth wrote
Reply to comment by niclas_wue in [P] I built arxiv-summary.com, a list of GPT-3 generated paper summaries by niclas_wue
>That’s actually a really good idea. Would you be willing to pay for such a feature? Something like 1$ per paper? That would cover the cost for the GPT tokens
That would bankrupt me (though i'm already basically bankrupt) others may find that acceptable however. Can it be applied on books? Book chapters? How does it deal with equations, does it need it to be in latex or pdf ok too? Does the pdf need to be converted to text?
Apprehensive-Tax-214 OP t1_j4qmlju wrote
Reply to comment by Unlikely-Advice-7168 in [P] Built an at-cost, pay per second, open-source API for Tortoise text-to-speech (best I've heard!) by Apprehensive-Tax-214
do you have a verified github email?
Apprehensive-Tax-214 OP t1_j4qml73 wrote
Reply to comment by Unlikely-Advice-7168 in [P] Built an at-cost, pay per second, open-source API for Tortoise text-to-speech (best I've heard!) by Apprehensive-Tax-214
do you have a verified github email?
Apprehensive-Tax-214 OP t1_j4qmkw3 wrote
Reply to comment by Unlikely-Advice-7168 in [P] Built an at-cost, pay per second, open-source API for Tortoise text-to-speech (best I've heard!) by Apprehensive-Tax-214
do you have a verified github email?
Apprehensive-Tax-214 OP t1_j4qmjlc wrote
Reply to comment by Unlikely-Advice-7168 in [P] Built an at-cost, pay per second, open-source API for Tortoise text-to-speech (best I've heard!) by Apprehensive-Tax-214
do you have a verified github email?
SetentaeBolg t1_j4qimm0 wrote
There are mathematical proofs of convergence for a single perceptron matching a linear classification, but for more realistic modern neural nets, I don't believe there are any proofs guaranteeing general convergence because I don't think convergence is actually guaranteed, for the reason pointed out, you can't be certain gradient descent will find the "right" minima.
Haunting-Ad-5191 t1_j4qia9y wrote
I mean some kind of home assistant that integrates CHATGPT is obvious right? If Alexa was as responsive and good at understanding what I want as chatGPT Id let Jeffy Bezo listen to whatever he wants
Although it'd probably take too long to load or give wordy responses. Plus it'd have to be able to open and use other apps not just respond with words. But I feel like combining these technologies is a no brainer
ureepamuree t1_j4qggbw wrote
Reply to comment by deep-yearning in [D] Unlocking the Potential of ChatGPT: A Community Discussion by North-Ad6756
Totally agreed. Even if it is capable of doing more, we should treat it as a level 3 equivalent of autonomous vehicles.
deep-yearning t1_j4qesyy wrote
It's best use is as an assistant, provided it is accurate. So far in my tests it has been pretty good at writing boilerplate code and email/letter templates. Better than githubs copilot for code.
royalemate357 t1_j4qdfwj wrote
Reply to comment by BeatLeJuce in [D] Tim Dettmers' GPU advice blog updated for 4000 series by init__27
Tbh I don't think it's an especially good name, but I believe the answer to your question is that it actually uses 32 bits to store a TF32 value in memory. its just that when they pass it into tensor cores to do matmuls, they temporarily downcast it to this 19-bit precision format.
>Dot product computation, which forms the building block for both matrix multiplies and convolutions, rounds FP32 inputs to TF32, computes the products without loss of precision, then accumulates those products into an FP32 output (Figure 1).
(from https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/)
TrueBirch t1_j4qdbf2 wrote
Reply to comment by eldenrim in [P] I built Adrenaline, a debugger that fixes errors and explains them with GPT-3 by jsonathan
It'll be something entirely new, but not capable of doing everything that my toddler can do. Systems will be designed to avoid those weaknesses. Again, think about replacing meter readers with cheap sensors instead of expensive robots.
Zestyclose-Check-751 t1_j4qcv02 wrote
Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice
Could someone explain how Data Scientists work as consulters?
I can imagine only a few cases:
* A company already has a DS team, but they are not deep enough in some domains and need help/consultation.
* The integration of the solution is simple enough and may be delivered as API.
* A company wants PoC / demo, after that they gonna hire someone to work on it.
But usually, DS needs insides into how business works and the integration of the solution may be really long-term, especially if it includes A/B tests, re-iterations over model training, datasets collection and so on. In this case, even onboarding may be long enough.
So, I'm wondering to hear about real cases that have been solved by consulters and how it generally may work.
mrconter1 OP t1_j4qctlb wrote
Reply to comment by Laser_Plasma in [R] The Unconquerable Benchmark: A Machine Learning Challenge for Achieving AGI-Like Capabilities by mrconter1
The thing is that there are a lot of other screenshots + instructions as well. What wouldn a system that can get 100% on this benchmark not be able to do?
BenoitParis t1_j4qbih9 wrote
Reply to [D] Is it possible to update random forest parameters with new data instead of retraining on all data? by monkeysingmonkeynew
Hoeffding Trees come to mind. The keyword you are looking for is 'online learning'. Apparently there's a python package dedicated to that:
https://scikit-multiflow.readthedocs.io/en/stable/api/api.html
But 250000 rows is not that high. Since your time requirements are daily I'd consider looking for other algorithms or implementations in other languages before that.
throwaway2676 t1_j4q8zuh wrote
Reply to comment by LetGoAndBeReal in [D] Fine-tuning open source models on specific tasks to compete with ChatGPT? by jaqws
Well, can you just run it from an SSD, but more slowly?
Impressive_Iron_6102 t1_j4q8bpv wrote
Reply to comment by nmfisher in [D] I’m a Machine Learning Engineer for FAANG companies. What are some places I can get started doing freelance work for ML? by doctorjuice
Tangential but would you recommend just blacklisting medium and towards datascience? They always reach the top of my google searches and it drains me just reading their articles majority of the time.
cblume OP t1_j4qwpte wrote
Reply to comment by iyeva in [P] featureimpact: A Python package for estimating the impact of features on ML models by cblume
Similar idea in that features are perturbed but both algorithms are quite different. E.g. sklearn uses scores but featureimpact doesn't; sklearn uses random perturbations but featureimpact uses quantiles.