Recent comments in /f/MachineLearning
Animated-AI OP t1_j9gvww8 wrote
Reply to comment by marcus_hk in [P] The First Depthwise-separable Convolution Animation by Animated-AI
Thanks for the feedback! I agree; the animations are only meant to be visual aids in the context of some larger explanation (lecture, blog post, etc). In my case, I'm making YouTube videos to serve as complete explanations.
Transformers have been the most requested topic on my YouTube channel. So I'm going to attempt to make videos/animations about that when I finish my current series on convolution.
currentscurrents t1_j9gvv4k wrote
Reply to comment by _Arsenie_Boca_ in [D] Bottleneck Layers: What's your intuition? by _Arsenie_Boca_
a) Lower-dimensional features are useful for most tasks, not just output and b) Real data almost always has a lower intrinsic dimension.
For example if you want to recognize faces, you'd have a much easier time recognizing patterns in things like gender, shape of facial features, hair color, etc rather than raw pixel data. Most pixel values are irrelevant.
blueSGL t1_j9gutu0 wrote
Reply to comment by limpbizkit4prez in [R] ChatGPT for Robotics: Design Principles and Model Abilities by CheapBreakfast9
> Why not just write the 5-10lines of code?
In order to write 5-10 lines of code, you need to know how to code.
I know how to code, if I can avoid writing more code than needed I do.
vladosaurus OP t1_j9gu3cr wrote
Reply to comment by Delacroid in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
Ideally we have to generate many examples as such without seeing them and wrap them in some test suite using automatic differentiation to see how many will come out correct.
Something similar to what the authors did in the OpenAI Codex model. They provided the function signature and the docstrings and promted the model to generate the rest of it. Then they wrapped the generated function into test suites and calculated how many of them pass. It's the pass@K metric.
I am not aware if something similar is done for differentiation, maybe there is, I have to search for.
Delacroid t1_j9grwzu wrote
Reply to comment by vladosaurus in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
I'll admit that my comment may come off as elitist, but I think that you have to admit that this was a very low effort post. Maybe a more correct sub for this post would have been r/learnmachinelearning.
Thin_Rise4746 t1_j9grk0d wrote
Such a great job! Congrats!
htrp t1_j9gqx65 wrote
i think its abstracting the human machine interface that is of value....
telling alexa to have your roomba only vaccuum the living room has some value and eventually builds towards:
Tea, Earl Grey, Hot
gdpoc t1_j9gqaue wrote
I'll be using this content to illustrate, thanks!
Own_Quality_5321 t1_j9gp7ne wrote
I teach Deep Learning and I send you a big thank you. I will refer students to your website and channel âşď¸
currentscurrents t1_j9gp4uq wrote
> From an information theory standpoint, it creates potential information loss due to the lower dimensionality.
Exactly! That's the point.
The bottleneck forces the network to throw away the parts of the data that don't contain much information. It learns to encode the data in an information-dense representation so that the decoder on the other side of the bottleneck can work with high-level ideas instead of pixel values.
If you manually tweak the values in the bottleneck, you'll notice it changes high-level ideas in the data like the gender or shape of a face, not pixel values. This is how autoencoders work; a unet is basically an autoencoder with skip connections.
Interestingly, biological neural networks that handle feedforward perception seem to do the same thing. Take a look at the structure of an insect antenna; thousands of input neurons bottleneck down to only 150 neurons, before expanding again for processing in the rest of the brain.
Animated-AI OP t1_j9gojs5 wrote
Reply to comment by dahitokiri in [P] The First Depthwise-separable Convolution Animation by Animated-AI
I'm using Blender and making heavy use of the Geometry Nodes feature. Unfortunately, these animations have taken a lot of effort and blender-specific knowledge, and building on top of my work for a new application would require more of both. But if others aren't deterred by that, I could publish the blender files.
dahitokiri t1_j9gnuhp wrote
Can you share how you go about creating these animations? A tutorial on that would help others in the field produce helpful animations as well.
vladosaurus OP t1_j9gnm8n wrote
Reply to comment by friend_of_kalman in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
In the recent update it was mentioned that the math skills were improved. So I was just curious to see. But thanks for your opinion.
vladosaurus OP t1_j9gnh7n wrote
Okay everyone, easy with the negative sentiment, I'm just experimenting with it, was curious to hear some opinion. But there it is I have it now .... nothing constructive in general except one comment.
pyepyepie t1_j9gnf6y wrote
Reply to comment by abnormal_human in [D] What would be the ideal map for "learning" machine learning? by Ashb0rn3_
This anecdote I have heard but I was kind of hoping for non-trivial cases from everyday life at work. I feel I understand SGD perfectly fine without learning to solve complicated DE but it's probably limiting me on other tasks, or my ability to analyze ML algorithms. Are you sure it's the right hierarchy to say that SGD is rooted in differential equations? I mean, I agree you are right, it's a differential equation, but are the methods you learn in differential equations courses useful for ML?
I found a nice article about the link to SGD: https://tivadardanka.com/blog/why-does-gradient-descent-work - but I am not sure if I am convinced (again, I am still an idiot about it, I shouldn't have any opinion regarding links to differential equations lol - but for me trying to fit SGD to the framework of differential equations is against the KISS principle). Sorry if I go too deep, I just try to figure out how much effort (I can actually study it all day for fun but we have work and so on) to put into it since we only have some amount of time :)
Thanks for the answer! I was convinced (by your message and myself today) it's terrible I don't know it and I should learn it ASAP.
vladosaurus OP t1_j9gnehk wrote
Reply to comment by blablanonymous in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
ChatGPT running on ChatGPT like a Turing Machine. GTFO!
vladosaurus OP t1_j9gncvj wrote
Reply to comment by ninjadude93 in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
Yes, I see your point, thanks. Probably I got a lucky shot. In any case, ChatGPT was updates with math reasoning so I was just curious.
vladosaurus OP t1_j9gn4i0 wrote
Reply to comment by Delacroid in [D] Can we use ChatGPT to implement first-order derivatives? by vladosaurus
Dude it's ok, I know it is high-school math you proved your point you are genius you know high-school math I don't.
That was not my aim. It was to treat the ChatGPT implementation as a black-box without touching it, and see whether is correct.
waffles2go2 t1_j9gmpcb wrote
Reply to [D] Maybe a new prompt injection method against newBing or ChatGPT? Is this kind of research worth writing a paper? by KakaTraining
Not sure malicious "prompt injection" output given the maturity of the product is of value.
Also the point of your paper would be "injection attacks to break Bing"?
limpbizkit4prez t1_j9gmme3 wrote
If there are existing APIs that make these tasks so simple, what's the point of using ChatGPT? Why not just write the 5-10lines of code?
MonsieurBlunt t1_j9glzsp wrote
Accomodating as much space for information as you can is not really a good idea. It is prone to overfitting and also harder to learn. You can think of it as a way of regularisation, you are forcing the model to get the useful information and not the rest or, you leave less space where it can encode the training data to overfit.
Professional_Poet489 t1_j9gk545 wrote
Reply to comment by _Arsenie_Boca_ in [D] Bottleneck Layers: What's your intuition? by _Arsenie_Boca_
Re: regularization - by using fewer numbers to represent the same output info, you are implicitly reducing the dimensionality of your function approximate.
Re: (a), (b) Generally in big nets, you want to regularize because you will otherwise overfit. Itâs not about the output dimension, itâs that you have a giant approximator (ie a billion params) fitting a much smaller data dimensionality and you have to do something about that. The output can be âcat or notâ and youâll still have the same problem.
TemperatureStatus435 t1_j9gk2mn wrote
Regularization in some vague sense applies, but there are different kinds of that, so you must be more specific. For example, an Autoencoder uses a bottleneck layer to learn information-dense representations of the domain space, and it may employ some mathematical regularization so that the raw numbers donât explode to infinity.
However, a Variational Autoencoder employs the above methods, but also an additional type of regularization. The effect of this is to normalize the shape of the bottleneck layer so that it is close to Gaussian. This is extremely useful to do, but for entirely different reasons.
Long story short, donât just say âregularizationâ and think you understand whatâs going on.
PuzzledWhereas991 t1_j9gk23h wrote
Reply to [D] Simple Questions Thread by AutoModerator
Whatâs the current best AI to clone voices that I can run on my local pc?
Grimm___ t1_j9gy9am wrote
Reply to [P] The First Depthwise-separable Convolution Animation by Animated-AI
đ¤Ż