Recent comments in /f/MachineLearning
liquiddandruff t1_j989luo wrote
Reply to comment by thecodethinker in [R] neural cloth simulation by LegendOfHiddnTempl
the stochastic parrot argument is a weak one; we are stochastic parrots
the phenomenon of "reasoning ability" may be an emergent one that arises out of the recursive identification of structural patterns in input data--which chatgpt is shown to do.
prove that "understanding" is not and cannot ever be reducible to "statistical modelling" and only then is your null position intellectually defensible
CoderHD t1_j989j2g wrote
Reply to comment by MadScientist-1214 in [D] Lion , An Optimizer That Outperforms Adam - Symbolic Discovery of Optimization Algorithms by ExponentialCookie
In my limited testing on a UNet like CNN, it doesnt even come close to the performance of adam sadly. With that said, i might be doing something wrong.
Kumacyin t1_j9889u0 wrote
Reply to [R] neural cloth simulation by LegendOfHiddnTempl
what about clipping? from the point of the users, we're gonna focus on the stuff that we can notice right away and one of the biggest is clipping, where you gotta mix large motions and object collisions
nuclear_knucklehead t1_j9876bt wrote
Reply to comment by Flag_Red in [R] neural cloth simulation by LegendOfHiddnTempl
Think of the zillions of FEA and CFD simulations done in the engineering world that a fast-running physics model would greatly accelerate and improve. These things are often less visible to the general audience than the high profile stuff you mention, but still have potentially billions of dollars in economic impact and productivity improvements.
liquiddandruff t1_j984iw5 wrote
Reply to comment by Ulfgardleo in [D] Please stop by [deleted]
the point you're missing is we're seeing surprising emergent behaviour from LLMs
ToM is not sentience but it is a necessary condition of sentience
> it is also not clear whether what we measured here is theory of mind
crucially, since we can define ToM, definitionally this is infact what is being observed
none of the premises you've used are sufficiently strong to preclude LLMs attaining sentience
-
it is not known if interaction with the real world is necessary for the development of sentience
-
memory is important to sentience but LLMs do have a form of working memory as part of its attention architecture and inference process. is this sufficient though? no one knows
-
sentience if it has it at all may be fleeting and strictly limited during inference stage of the LLM
mind you i agree it's exceedingly unlikely that current LLMs are sentient
but to arrive to "LLMs cannot ever achieve sentience" from these weak premises combined with our of lack of understanding of sentience, a confident conclusion like that is just unwarranted.
the intellectually defensible position is to say you don't know.
blablanonymous t1_j982taj wrote
Reply to [R] neural cloth simulation by LegendOfHiddnTempl
Damn, the more you know… what does the loss function look like for this problem?
W_O_H t1_j97zesm wrote
Reply to [D] Lack of influence in modern AI by I_like_sources
You can fine tune stable diffusion,TTS and NLP. You can't expect authors to tend to every need for users they gave you the tool and have no requirement to teach you how to use it. Yes some models can't be fine tuned but in 99% of cases there is a different one you can fine tuned.
If you really don't like what's out there make your own, the papers exist.
TeamRocketsSecretary t1_j97xsud wrote
Reply to comment by pyepyepie in [D] Please stop by [deleted]
The reason of why overparameterized networks work at all theoretically is still an open question, but that we don’t have the full answer doesn’t mean that the weights are performing “human-like” processing the same way that classical mechanics pre-Einstein didn’t make the corpuscle theory of light any more valid. You all just love to anthromorphize anything and the amount of metaphysical mental snakeoil that chatGPT has generated is ridiculous.
But sure. ChatGPT is mildly sentient 🤷♂️
squidward2022 t1_j97veu5 wrote
Reply to [D] Relu + sigmoid output activation by mrwafflezzz
Shifting the domain of sigmoid S from (-infty,infty) to (0,infty) is going to be kind of weird. In the first (original) case we would have S(-infty) = 0, S(0) = 1/2, S(infty) = 1, and thus the finite logit values w your network may output will be between -infty and infty and S(w) will give something meaningful. Now if you mentally shift S to be defined between (0, infty) you get S(0) = 0 S(infty) = 1. What value w would be needed to achieve S(w) = 1/2 ? infty / 2 ? It seems important that Sigmoid is defined on the open interval (-infty, infty) not just because we wish logits to be arbitrary valued, but also because we want S to be "expressive" around the logit values we see in practice, which must be finite.
Here is something you could do that doesn't require a shifted sigmoid: You have network f(x) = w which maps an input x to a score w. Take tanh(f(x)) and you get something with range (-1,1). Any negative w is mapped to a negative value in the range(-1,0) Now just take the ReLU of this, relu(tanh(f(x)) and all negative values from the tanh, which come from negative w's, go to 0 and all the positive values from the tanh, which come from positive w's, are unnafected.
In this way we have, negative w --> (-1,0) --> 0 and positive w --> (0,1) --> (0,1).
currentscurrents t1_j97v09x wrote
Reply to comment by Cheap_Meeting in [R] [N] In this paper, we show how a conversational model, 3.5x smaller than SOTA, can be optimized to outperform the baselines through Auxiliary Learning. Published in the ACL Anthology: "Efficient Task-Oriented Dialogue Systems with Response Selection as an Auxiliary Task." by radi-cho
In Bulgaria, no less.
__lawless t1_j97v07m wrote
Reply to [D] Relu + sigmoid output activation by mrwafflezzz
Easiest solution no sigmoid no relu in the last layer just clamp it between 0 and 1. Works surprisingly well
bremen79 t1_j97sb9r wrote
Reply to [D] Relu + sigmoid output activation by mrwafflezzz
The sigmoid will make effectively very hard for the network to produce values close to 1, because it would require a pre activation value close to infinity. Would this be a good behavior in your application?
Ol_OLUs22 t1_j97s5vv wrote
adversarial examples
fasttosmile t1_j97r2fc wrote
Reply to [D] Things you wish you knew before you started training on the cloud? by I_will_delete_myself
byobu > tmux
Repulsive_Tart3669 t1_j97p211 wrote
Reply to [D] Relu + sigmoid output activation by mrwafflezzz
I believe a common approach is to use a linear activation function for regression problems unless target variable has certain semantics that suggest the use of other non-linearities (sigmoid, tanh etc.). Also consider rescaling your targets instead of trying to match the desired output with activation functions.
From you description (I might be wrong though), it seems like the 0 output is a special case. In this case you might want to use a binary classifier to classify input samples into two classes first. For class 0 the output is 0. For class 1 you use another model (regressor) that outputs a prediction.
Ulfgardleo t1_j97nb2q wrote
Reply to [D] Relu + sigmoid output activation by mrwafflezzz
sigmoid of 0 is 0.5
tooquickforwords t1_j97lef7 wrote
Reply to comment by iacolippo in [D] Does langchain upload all user’s data to Openai? by westeast1000
If you use the Azure version, the data does not get used elsewhere. It has the same enterprise guarantees as most of Azure.
labloke11 t1_j97hz6s wrote
Reply to [D] Is Google a language transformer like ChatGPT except without the G (Generative) part? by Lets_Gooo_123
Can we remove this posting?
GaseousOrchid t1_j97gxhy wrote
Reply to [D] Simple Questions Thread by AutoModerator
What are some good tools for data pipelines that scale well? I'm locked into Jax/Flax for work, but would like to disconnect from TensorFlow to the greatest extent possible. I was looking at the huggingface dataloaders, does anyone have experience with those?
Appropriate_Ant_4629 t1_j97gjhy wrote
Reply to comment by royalemate357 in [D] Things you wish you knew before you started training on the cloud? by I_will_delete_myself
>egress fees / data transfer fees
On the bright side, ingress is often free.
It costs surprisingly little to stream live video ***into*** the cloud and spew back tiny embedding vectors from models running there.
trnka t1_j97ftj9 wrote
Reply to comment by not_mig in [D] Simple Questions Thread by AutoModerator
I haven't seen a guide on that, but I remember it being challenging! Feel free to post one that's giving you trouble.
buyIdris666 t1_j97eom6 wrote
Reply to comment by currentscurrents in [D] what are some open problems in computer vision currently? by Fabulous-Let-822
Interesting! I didn't realize that
Sandy_dude OP t1_j97eij3 wrote
Reply to comment by PhoibusApollo in [R] Looking for papers which are modified variational autoencoder (VAE) by Sandy_dude
Thanks!
tripple13 t1_j97cxap wrote
Reply to comment by IDefendWaffles in [D] Is Google a language transformer like ChatGPT except without the G (Generative) part? by Lets_Gooo_123
Yes, indeed. While the lightbulb may contain properties which may or may not exhibit the Quantum Tunnel Effect (QTE), one must take great care not to confuse this with the Superposition Lightspeed Diffraction (SDL), as it is of paramount importance, that we do not make light of such phenomena - Essentially making all of humanity into sub-particle atoms in the progress towards enlightenment.
millenial_wh00p t1_j98b5ma wrote
Reply to [R] Using AI/ML for Quality Control for a factory? by aumzzzz
I apologize for how this post might come across, but your question is actually a very deep one and it will probably take a lot of up front work to get you an answer. Ai/ml is not like cinnamon- you can’t just sprinkle it on your business process and expect it to improve.
First you need to start with instrumenting your processes and building your data warehouse. Is your production flow instrumented for quality and efficiency measurement? If so, are the instruments verified? Do you have baseline performance metrics defined and expectations for improvement? Do you currently conduct any statistical process control? All of these questions have books that go with them, and we haven’t even built a trainable model yet.
I would start with some industrial engineering and applied stats textbooks and go from there. That should give you some idea of how to formulate a hypothesis and determine a method to validate it. From there you can start with the classics like an introduction to statistical learning by James et al and introduction to machine learning by alpaydin.