Recent comments in /f/MachineLearning

millenial_wh00p t1_j98b5ma wrote

I apologize for how this post might come across, but your question is actually a very deep one and it will probably take a lot of up front work to get you an answer. Ai/ml is not like cinnamon- you can’t just sprinkle it on your business process and expect it to improve.

First you need to start with instrumenting your processes and building your data warehouse. Is your production flow instrumented for quality and efficiency measurement? If so, are the instruments verified? Do you have baseline performance metrics defined and expectations for improvement? Do you currently conduct any statistical process control? All of these questions have books that go with them, and we haven’t even built a trainable model yet.

I would start with some industrial engineering and applied stats textbooks and go from there. That should give you some idea of how to formulate a hypothesis and determine a method to validate it. From there you can start with the classics like an introduction to statistical learning by James et al and introduction to machine learning by alpaydin.

42

liquiddandruff t1_j989luo wrote

the stochastic parrot argument is a weak one; we are stochastic parrots

the phenomenon of "reasoning ability" may be an emergent one that arises out of the recursive identification of structural patterns in input data--which chatgpt is shown to do.

prove that "understanding" is not and cannot ever be reducible to "statistical modelling" and only then is your null position intellectually defensible

4

Kumacyin t1_j9889u0 wrote

what about clipping? from the point of the users, we're gonna focus on the stuff that we can notice right away and one of the biggest is clipping, where you gotta mix large motions and object collisions

15

nuclear_knucklehead t1_j9876bt wrote

Think of the zillions of FEA and CFD simulations done in the engineering world that a fast-running physics model would greatly accelerate and improve. These things are often less visible to the general audience than the high profile stuff you mention, but still have potentially billions of dollars in economic impact and productivity improvements.

6

liquiddandruff t1_j984iw5 wrote

Reply to comment by Ulfgardleo in [D] Please stop by [deleted]

the point you're missing is we're seeing surprising emergent behaviour from LLMs

ToM is not sentience but it is a necessary condition of sentience

> it is also not clear whether what we measured here is theory of mind

crucially, since we can define ToM, definitionally this is infact what is being observed

none of the premises you've used are sufficiently strong to preclude LLMs attaining sentience

  • it is not known if interaction with the real world is necessary for the development of sentience

  • memory is important to sentience but LLMs do have a form of working memory as part of its attention architecture and inference process. is this sufficient though? no one knows

  • sentience if it has it at all may be fleeting and strictly limited during inference stage of the LLM

mind you i agree it's exceedingly unlikely that current LLMs are sentient

but to arrive to "LLMs cannot ever achieve sentience" from these weak premises combined with our of lack of understanding of sentience, a confident conclusion like that is just unwarranted.

the intellectually defensible position is to say you don't know.

1

W_O_H t1_j97zesm wrote

You can fine tune stable diffusion,TTS and NLP. You can't expect authors to tend to every need for users they gave you the tool and have no requirement to teach you how to use it. Yes some models can't be fine tuned but in 99% of cases there is a different one you can fine tuned.

If you really don't like what's out there make your own, the papers exist.

7

TeamRocketsSecretary t1_j97xsud wrote

Reply to comment by pyepyepie in [D] Please stop by [deleted]

The reason of why overparameterized networks work at all theoretically is still an open question, but that we don’t have the full answer doesn’t mean that the weights are performing “human-like” processing the same way that classical mechanics pre-Einstein didn’t make the corpuscle theory of light any more valid. You all just love to anthromorphize anything and the amount of metaphysical mental snakeoil that chatGPT has generated is ridiculous.

But sure. ChatGPT is mildly sentient 🤷‍♂️

1

squidward2022 t1_j97veu5 wrote

Shifting the domain of sigmoid S from (-infty,infty) to (0,infty) is going to be kind of weird. In the first (original) case we would have S(-infty) = 0, S(0) = 1/2, S(infty) = 1, and thus the finite logit values w your network may output will be between -infty and infty and S(w) will give something meaningful. Now if you mentally shift S to be defined between (0, infty) you get S(0) = 0 S(infty) = 1. What value w would be needed to achieve S(w) = 1/2 ? infty / 2 ? It seems important that Sigmoid is defined on the open interval (-infty, infty) not just because we wish logits to be arbitrary valued, but also because we want S to be "expressive" around the logit values we see in practice, which must be finite.

Here is something you could do that doesn't require a shifted sigmoid: You have network f(x) = w which maps an input x to a score w. Take tanh(f(x)) and you get something with range (-1,1). Any negative w is mapped to a negative value in the range(-1,0) Now just take the ReLU of this, relu(tanh(f(x)) and all negative values from the tanh, which come from negative w's, go to 0 and all the positive values from the tanh, which come from positive w's, are unnafected.

In this way we have, negative w --> (-1,0) --> 0 and positive w --> (0,1) --> (0,1).

2

bremen79 t1_j97sb9r wrote

The sigmoid will make effectively very hard for the network to produce values close to 1, because it would require a pre activation value close to infinity. Would this be a good behavior in your application?

4

Repulsive_Tart3669 t1_j97p211 wrote

I believe a common approach is to use a linear activation function for regression problems unless target variable has certain semantics that suggest the use of other non-linearities (sigmoid, tanh etc.). Also consider rescaling your targets instead of trying to match the desired output with activation functions.

From you description (I might be wrong though), it seems like the 0 output is a special case. In this case you might want to use a binary classifier to classify input samples into two classes first. For class 0 the output is 0. For class 1 you use another model (regressor) that outputs a prediction.

2

GaseousOrchid t1_j97gxhy wrote

What are some good tools for data pipelines that scale well? I'm locked into Jax/Flax for work, but would like to disconnect from TensorFlow to the greatest extent possible. I was looking at the huggingface dataloaders, does anyone have experience with those?

1

tripple13 t1_j97cxap wrote

Yes, indeed. While the lightbulb may contain properties which may or may not exhibit the Quantum Tunnel Effect (QTE), one must take great care not to confuse this with the Superposition Lightspeed Diffraction (SDL), as it is of paramount importance, that we do not make light of such phenomena - Essentially making all of humanity into sub-particle atoms in the progress towards enlightenment.

1