Recent comments in /f/MachineLearning
DisWastingMyTime t1_j6p1t2y wrote
Reply to comment by bananonymos in [D] Have researchers given up on traditional machine learning methods? by fujidaiti
What untill you hear what is the underlying concept of everything ever.
CPU? Thats just bool logic! Differential equations for flight control? My man, that's just a few dxdys faking stable solutions!
What a boring take, every time I've heard it in person it proved to be said by a person who knew very little about the complexities of the topic they're reducing to the "underlying" concepts.
No offense to any of you "sophisticated" fellows
TypicalFeeling8465 t1_j6p0mj8 wrote
Reply to [D] Meta AI Residency 2023 by BeautyInUgly
I also applied and haven't heard back. I reached to some current residents asking for general info, and they said they had their interviews in late March. So I presume it'll be a while before we hear back. Also, they said the layoffs and hiring freezes aren't affecting the residency program
MysteryInc152 t1_j6p0ipa wrote
Reply to comment by Zetsu-Eiyu-O in [D] Generative Model FOr Facts Extraction by Zetsu-Eiyu-O
Sure
bananonymos t1_j6p0fjr wrote
Reply to comment by Imaginary_Parfait944 in [D] Have researchers given up on traditional machine learning methods? by fujidaiti
Wooooosh
Zetsu-Eiyu-O OP t1_j6oztfw wrote
Reply to comment by MysteryInc152 in [D] Generative Model FOr Facts Extraction by Zetsu-Eiyu-O
Oh, I see thanks, I have a few questions about the basics of training a large language model, do you mind if I shoot you a message?
deepstatefarm t1_j6oy6ps wrote
Reply to [D] DL university research PC suggestions? by seanrescs
for half, $12k, (and get two) I would go with something AMD with PCIe 5.0 and 4x GPU slots. And I would personally get used 3090, but if you want warantee new 4090. I haven't RMA anything for a long time but be warned could take over a year to get a replacement card. 4090 might not support the new up in coming 4bit LLM mode. Not sure.
Server rack, with 2.5gigE ethernet, 1-to-1 VRAM to RAM (with 20% more CPU ram), NVM 2tb or better, and 40-60TB storage.
amassivek t1_j6owo9f wrote
Reply to comment by blimpyway in [R] The Predictive Forward-Forward Algorithm by radi-cho
Here is an reversed view, where ANNs provide inspiration for neuroscience to investigate the brain. Forward learning models provide a new perspective on how neurons without "feedback" or "learning" connections have the ability to still learn, a common scenerio. We make note of this and show the conceptual framework for forward learning: https://arxiv.org/abs/2204.01723. This conceptual framework is applicable to neuroscience models, providing an investigative path forward.
Internal-Diet-514 t1_j6oujtg wrote
Reply to comment by JimmyTheCrossEyedDog in [D] Have researchers given up on traditional machine learning methods? by fujidaiti
Time series tabular data would have shape 3 (number of series, number of time points, # of features). For gradient boosted tree models isnt the general approach to flatten the space to (number of series, number of time points X # of features). Where as a cnn would be employed to extract time dependent features before flattening the space.
If there’s examples that boosted tree models perform better in this space, and I think you’re right there are, than I think that just goes to show how traditional machine learning isnt dead, but rather if we could find ways to combine it with the thing that makes deep learning work so well (feature extraction) it’d probably do even better.
TheCoconutTree t1_j6otii7 wrote
Reply to comment by SawtoothData in [D] Simple Questions Thread by AutoModerator
That's a good point about longitude looping. I hadn't thought about that. I'm designing a classifier, and would like to include geographic location as one of the input variables.
JimmyTheCrossEyedDog t1_j6osqa5 wrote
Reply to comment by Internal-Diet-514 in [D] Have researchers given up on traditional machine learning methods? by fujidaiti
Good call, shape is the much better term to avoid confusion.
> If we’re considering the dimensions to be the number of datapoints
To clarify - not the number of datapoints, the number of input features. The number of datapoints has nothing to do with the dimensionality (only the shape).
> Deep learning or CNNs are great because of its ability to extract meaningful features from data with shape > 2
This is where I'd disagree (but maybe you have a source that suggests otherwise). Even for time series tabular data, gradient boosted tree models typically outperform NNs.
Overall, shape rarely has anything to do with how a model performs. CNNs are built to take knowledge of the shape of the data into account (restricting kernels to convolutions of spatially close datapoints), but not all NNs do that. If we were using a network with only fully connected layers, for example, then there is no notion of spatial closeness - we might as well have transformed an NxN image into a N^2 x1 vector and your network would be the same.
So, neural networks handling inputs that have spatial (or temporal) relationships well has nothing to do with it being a neural network, but with the assumptions we've baked into the architecture (like convolutional layers).
amassivek t1_j6opolw wrote
Reply to comment by blackkettle in [R] The Predictive Forward-Forward Algorithm by radi-cho
As the depth and width of the network grows, the computational advantage grows. Forward only learning algorithms, such as FF and PFF, have this advantage.
There is also a compatability advantage: forward only learning algorithm work on limited resource devices (edge devices) and neuromorphic chips.
For an analysis on efficiency, refer to section IV: https://arxiv.org/abs/2204.01723
We demonstrate reasonable performance on cifar-10 and cifar-100, in that same paper (Section IV). So, the performance gap may decrease over time.
For a review of forward only learning, with an explaination on why it has efficiency and compatibility: https://amassivek.github.io/sigprop
dancehowlstyle3 t1_j6opbsx wrote
Just popping in to say I really love this thread!
ockham_blade t1_j6onxem wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hi! I am working on a clustering project on a dataset that has some numerical variables, and one categorical variable with very high cardinality (~150 values). I was thinking if it is possible to create an embedding for that feature, after one-hot encoding (ohe) it. I was initially thinking of running an autoencoder on the 150 dummy features that result from the ohe, but then I thought that it may not make sense as they are all uncorrelated (mutually exclusive). What do you think about this?
On the same line, I think that applying PCA is likely wrong. What would you suggest to find a latent representation of that variable? One other idea was: use the 15p dummy ohe columns to train a NN for some classification task, including an embedding layer, and then use that layer as low-dimensional representation... does it make any sense? Thank you in advance!
Silvestron OP t1_j6on0u5 wrote
Reply to comment by worriedshuffle in [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron
It could start a new debate about the dress on twitter.
SawtoothData t1_j6olw50 wrote
Reply to comment by TheCoconutTree in [D] Simple Questions Thread by AutoModerator
I don't know your application but, if lat/lon don't work very well, you could also try something like geohashing.
Something that's weird about longitude is that it loops. you might have weird things at the boundary. It's also odd that the distance between two points is also a function of latitude.
MysteryInc152 t1_j6okowf wrote
Reply to comment by Zetsu-Eiyu-O in [D] Generative Model FOr Facts Extraction by Zetsu-Eiyu-O
Not sure what you mean by penalize but say you wanted an LLM that wasn't instruction fine-tuned to translate between 2 languages it knows.
Your input would be
Language x: "text of language x"
Language y: "translated language x text"
You'd do this for a few examples. 2 or 3 should be good. Or even one depending on the task. Then finally
Language x: "text you want translated"
Language y: The model would translate the text and output here
All transformer generative LLMs work the same way with enough scale. GPT-2 (only 1.5b parameters) does not have the necessary scale.
Internal-Diet-514 t1_j6oi9qu wrote
Reply to comment by JimmyTheCrossEyedDog in [D] Have researchers given up on traditional machine learning methods? by fujidaiti
If we’re considering the dimensions to be the number of datapoints in an input than I’ll stick to that definition and use the shape of the data instead of dimensions. I don’t think I was wrong to use dimensions to describe the shape of the data but I get that it could be confusing because high dimensional data is synonymous with a large number of features, whereas I meant high dimensions to be data with shape > 2.
Deep learning or CNNs are great because of its ability to extract meaningful features from data with shape > 2 and then pass that representation to an mlp. But the feature extraction phase is a different task than what traditional ml is meant to do, which is to take a set of already derived features and learn a decision boundary. So I’m trying to say a traditional ml model is not super comparable to the convolutional portion (feature extraction phase ) of a cnn.
[deleted] t1_j6oi6t4 wrote
[deleted]
worriedshuffle t1_j6oi2n0 wrote
Reply to comment by Silvestron in [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron
Yes, and imagine how annoying it would be for people to keep saying “that’s not really blue”.
Silvestron OP t1_j6oh3yy wrote
Reply to comment by worriedshuffle in [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron
That will always happen on anything that we decide pretty much arbitrarily. Where does the color blue start and end in the electromagnetic spectrum? No one can objectively say so, but it's still useful referring to things as blue or whatever color they are.
Flogirll t1_j6og5xa wrote
Reply to [D] Simple Questions Thread by AutoModerator
Can you adjust gantry length in a claw machine?
I’m sorry if this is dumb but I can’t seem to find this anywhere. I know absolutely nothing about the parts inside a claw machine other than the names. I have a cabinet but I am unable to find a gantry the exact size. Do I need a new cabinet or can something be done? Thanks!
curiousshortguy t1_j6oflky wrote
Reply to comment by lonelyrascal in [R] Question: what is the best approach to find similarity between a set of product titles and user query? by lonelyrascal
How do you use these other features? Do you just vectorize and sum the vectors? Or do you do something else?
I think you can leverage data from current production to create a labeled test dataset.
worriedshuffle t1_j6odvxw wrote
Reply to comment by Silvestron in [Discussion] Misinformation about ChatGPT and ML in media and where to find good sources of information by Silvestron
And I’m saying that intelligence is just a word we use to point to ourselves. It doesn’t have an objective meaning which is why there is no test people can agree on.
lonelyrascal OP t1_j6odjl4 wrote
Reply to comment by curiousshortguy in [R] Question: what is the best approach to find similarity between a set of product titles and user query? by lonelyrascal
I have product brand, type and color other than titles. Yes I'll try cosine distances next. User queries are just tests done by me. Because there's no other way around except for A/B testing. Thank you.
red75prime t1_j6p2lrl wrote
Reply to comment by peatfreak in [D] Have researchers given up on traditional machine learning methods? by fujidaiti
2031: "GPMART-6 discovers new interesting properties of SVMs."