Recent comments in /f/MachineLearning

navillusr t1_j4qumlu wrote

  1. If you list instructions step by step, the model doesn’t require reasoning to solve the problem. This is testing a very basic form of intelligence.
  2. Adept.ai can already solve more complex challenges than this (but still nowhere near AGI). They use a chatbot to automate simple tasks in common programs using LLMs.
  3. There’s a benchmark that already tests tasks like this, MiniWoB++
1

[deleted] t1_j4qu0xg wrote

> I attended a talk last year in which one speaker claimed that due to the way stochastic gradient descent works, it could be that some minimums are never reachable from some initialization states no matter how long one trains. Unfortunately I cannot find what paper/theorem he was referring to.

Some examples of NN failing to learn would be constant initializations. Especially easy to see with zero initialization.

https://medium.com/@safrin1128/weight-initialization-in-neural-network-inspired-by-andrew-ng-e0066dc4a566

​

As for a general framework. If you're familiar with the [Universal Approximation Theorem](https://en.wikipedia.org/wiki/Universal_approximation_theorem). Particular papers discuss convergence rates - https://proceedings.neurips.cc/paper/2020/file/2000f6325dfc4fc3201fc45ed01c7a5d-Paper.pdf

I think it would be a function of your particular problem. I've seen this examined from the perspective of learning frameworks as well such as PAC learning. Reading into that may answer some of your specific questions. I'm not aware of any general result outside of some comments on bounding generalization error based on data segmentation.

At a blush, your question about knowing you'll reach the minimum in K steps feels halting problem-ish. So I'd have to think about it later to convince myself fully.

5

WikiSummarizerBot t1_j4qs4st wrote

SHRDLU

>SHRDLU was an early natural-language understanding computer program, developed by Terry Winograd at MIT in 1968–1970. In the program, the user carries on a conversation with the computer, moving objects, naming collections and querying the state of a simplified "blocks world", essentially a virtual box filled with different blocks. SHRDLU was written in the Micro Planner and Lisp programming language on the DEC PDP-6 computer and a DEC graphics terminal. Later additions were made at the computer graphics labs at the University of Utah, adding a full 3D rendering of SHRDLU's "world".

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

1

Repulsive_Tart3669 t1_j4qqivs wrote

This should be considered in the first place. For instance, gradient boosting trees that are mostly implemented in C/C++ and have GPU compute backends - XGBoost, CatBoost and LightGBM. Given daily updates, you'll have enough time not only to train a model, but also optimize its hyperparameters. In my experience, XGBoost + RayTune work just fine.

2

Yidam t1_j4qohth wrote

>That’s actually a really good idea. Would you be willing to pay for such a feature? Something like 1$ per paper? That would cover the cost for the GPT tokens

That would bankrupt me (though i'm already basically bankrupt) others may find that acceptable however. Can it be applied on books? Book chapters? How does it deal with equations, does it need it to be in latex or pdf ok too? Does the pdf need to be converted to text?

1

SetentaeBolg t1_j4qimm0 wrote

There are mathematical proofs of convergence for a single perceptron matching a linear classification, but for more realistic modern neural nets, I don't believe there are any proofs guaranteeing general convergence because I don't think convergence is actually guaranteed, for the reason pointed out, you can't be certain gradient descent will find the "right" minima.

8

Haunting-Ad-5191 t1_j4qia9y wrote

I mean some kind of home assistant that integrates CHATGPT is obvious right? If Alexa was as responsive and good at understanding what I want as chatGPT Id let Jeffy Bezo listen to whatever he wants

Although it'd probably take too long to load or give wordy responses. Plus it'd have to be able to open and use other apps not just respond with words. But I feel like combining these technologies is a no brainer

3

royalemate357 t1_j4qdfwj wrote

Tbh I don't think it's an especially good name, but I believe the answer to your question is that it actually uses 32 bits to store a TF32 value in memory. its just that when they pass it into tensor cores to do matmuls, they temporarily downcast it to this 19-bit precision format.

>Dot product computation, which forms the building block for both matrix multiplies and convolutions, rounds FP32 inputs to TF32, computes the products without loss of precision, then accumulates those products into an FP32 output (Figure 1).

(from https://developer.nvidia.com/blog/accelerating-ai-training-with-tf32-tensor-cores/)

3

Zestyclose-Check-751 t1_j4qcv02 wrote

Could someone explain how Data Scientists work as consulters?

I can imagine only a few cases:
* A company already has a DS team, but they are not deep enough in some domains and need help/consultation.
* The integration of the solution is simple enough and may be delivered as API.
* A company wants PoC / demo, after that they gonna hire someone to work on it.

But usually, DS needs insides into how business works and the integration of the solution may be really long-term, especially if it includes A/B tests, re-iterations over model training, datasets collection and so on. In this case, even onboarding may be long enough.

So, I'm wondering to hear about real cases that have been solved by consulters and how it generally may work.

5

BenoitParis t1_j4qbih9 wrote

Hoeffding Trees come to mind. The keyword you are looking for is 'online learning'. Apparently there's a python package dedicated to that:

https://scikit-multiflow.readthedocs.io/en/stable/api/api.html

But 250000 rows is not that high. Since your time requirements are daily I'd consider looking for other algorithms or implementations in other languages before that.

6