Recent comments in /f/MachineLearning

Silvestron OP t1_j6odcec wrote

I think there will always be a discussion on where to set the bar on what's considered intelligence, but the bar has to be set somewhere, because if anything that is alive is to be considered intelligent then there'd be no point in talking about what intelligence is. Even plants have learned through evolution to point the leaves towards the sun. Should we consider that intelligence too?

1

JimmyTheCrossEyedDog t1_j6odc4c wrote

> When talking about dimensions I meant (number of rows, number of features) is 2 dimensions for tabular data...

Right, but my point is that when people say "NNs work well on high dimensional data", that's not what they mean.

> You could consider an image to have width x height x channels features

It does have that many input features, i.e. dimensions, like you've written below.

> but thats not what a CNN does, the cnn extracts meaningful features from the high dimensional space.

Now we're talking about composite or higher level features, which is different from what we've been talking about up to this point. It's true that for tabular data (or old school, pre-NN computer vision) you generally start to construct these yourself whereas with images you can just throw the raw data in and the NN does this more effectively than you ever could, but this is irrelevant to the input dimensionality.

3

dancingnightly t1_j6oaxeo wrote

This is commercial, not research but: A lot of scenarios where explainable AI is needed use simple statistical solutions.

​

For example a company I knew had to identify people in poverty in order to distribute a large ($M) grant fund to people in need, and they had only basic data about some relatively unrelated information, like how often these people travelled lets say, their age, etc.

​

In order to create an explainable model where factors can be understood by higher ups, and considered for bias easily, they used a k-means approach with just 3 factors.

​

It captured close to as much information as deep learning, but with more robustness to data drift, and with clear graphs segmenting the target group and general group. It also reduced use of data, being pro-privacy.

​

This 30 line of code solution with a dozen explanatory output graphs about EDA probably got sold for >500k in fees... but they did make the right choices in this circumstance. They saved on a complex ML model, bias/security/privacy/deployment hell, and left a maintainable solution.

​

Now for research, it's interesting from the perspective of applied AI (which is arguably still dominantly GOFAI/simple statistics) and communication about AI with the public, although I wouldn't say it's in vogue.

5

Silvestron OP t1_j6oad9q wrote

Computers are machines, merely calculators. I'd not say a calculator is smart because it can do advanced math operations faster than any human being ever could.

I wasn't talking about AI in general, only about ChatGPT. While the definition of what intelligence is can be subjective, my frustration was more about the focus ChatGPT gets on things that are beyond its capabilities, like giving correct information or doing math. That happens because people see how good it is at some very complicated things but it can't do extremely basic things.

Maybe I should have used better words to express myself, but what I meant is that people seem to expect ChatGPT to be AGI, which is not.

1

qalis t1_j6o79xv wrote

A better distinction would be that deep learning excels in application that require representation learning, i.e. transformation from domains that do not lie in Euclidean metric space (e.g. graphs) or that are too problematic in the raw form and require processing in another domain (e.g. images, audio). This is very similar to feature extraction, but representation learning is a bit more general term.

Tabular ML does not need this in general, since after obtaining feature vectors we already have a representation and deep learning like MLP can only apply (exponentially) nonlinear transformation of that space, instead of really learning fundamentally new representations of that data, which is the case e.g. for images, going from raw pixel values space into vector space that captures semantic features in the image.

2

tysam_and_co OP t1_j6o72ma wrote

Okay, I ran some other experiments and I'm starting to get giddy (you're the first 'ta know! :D). It appears that for most hyperparameters, twiddling on CIFAR100 is just a flat response, or a slight downward trend (!!!) I haven't messed with them all yet, though, but that bodes very, very well (!!!!).

Also, doing the classical range boost of changing from depth 64->128 and num_epochs 10->80 results in a boost to about 80% in 3 minutes of training or so, which is about where CIFAR100 was in early 2016 or so. It's harder for CIFAR10 as I think that was slightly more popular and there was a monstrous jump, then a long flat area during that period, but if you do some linear/extremely coarse piecewise interpolation from the average starting point of CIFAR10 to the current day of CIFAR10 as far as accuracy goes on PapersWithCode, and do the same roughly for CIFAR100, adding this extra capacity+training time moves them both from ~2015 SOTA numbers to ~early 2016 SOTA numbers. Wow!! That's incredible! This is starting to make me really giddy, good grief.

I'm curious if cutout or anything else will help, we'll see! There's definitely a much bigger train<->eval % gap here, but adding more regularization may not help as much as it would seem up front.

2

qalis t1_j6o6cou wrote

That's a nice paper. There is also an interesting, but very niche line of using gradient boosting as a classification head for neural networks. Gradient flows through it normally, after all, just tree addition is used instead of gradient descent steps. But sadly I could not find any trustworthy open sourced implementation of this approach. If this works, it could bridge a gap between deep learning and boosting models.

3

qalis t1_j6o5zha wrote

Yeah, I like her works. iModels library (linked in my comment under "rule-based learning" link) is also written by her coworkers IIRC, or at least implements a lot of models from her works. Although I disagree with her arguments in "Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead", paper which she is arguably the most well known for.

6

worriedshuffle t1_j6o441o wrote

And in your example, continuing the pattern, just because a computer can’t do that it’s not intelligent? That’s an extremely narrow view of intelligence.

Animals evolved to be good at some very specific things to fill an ecological niche. Humans evolved to be good at different things. I mainly see people discounting computer capabilities by measuring them against humans. Things that are easy for us are hard for computers and vice versa. But it’s highly unlikely that computers would be at all similar to people, since we’ve been specializing for millions of years.

1

tysam_and_co OP t1_j6o25u4 wrote

Oh, I see! Yeah, I probably will want to leave process spawning/forking stuff to the side as that can require some bug-resistant refactoring IIRC. However! I believe that would only require some change around the dataloaders and maybe some stuff at the beginning of the file. I am unfortunately terribly rusty on this, but you might be able to get away with changing the num_dataloaders=2 -> num_dataloaders->0 in your file, and I believe that would run 'much' more slowly the first time, then the same after, without any forking issues?

As for CIFAR100, I made the absolute minimum number of changes/additions to it, which was 3 characters. I added 0s to each of the two main dataloaders, and then one 0 to the num_classes parameter. On the first run with this, I'm averaging about 75.49% validation accuracy, which matches roughly what the 2015 SOTA was for CIFAR100. The 2015 SOTA for CIFAR10 was roughly 94%, so I believe that we are in very good hands here! This bodes quite well, I think, but I am unsure. This also was the first blind run (well, I had to do it again on the right notebook base as I accidentally pulled an older version that was about ~.8% below this one -- and in over 10 seconds! Interestingly to me, we're still running at right about ~9.91-9.94 seconds or so, I would have thought the extra 90 classes would have added some appreciable overhead to this! Creazy! :D That opens a lot of cool avenues (Imagenet?!?!) that I've been sorta hardcore ignoring as a result. Goes to show I guess that there's basically no replacement for really good testing! :D :)))) ), no other tuning or anything. I wouldn't be surprised if one could get more performance with more tuning -- though it would be surprising if we were simply at a local maxima already! Either way, I find it somewhat validating.

Thank you for being the first person to comment on and support my work. You really made my day back then, and as of yesterday the project was being tweeted by Karpathy. I am appreciative at about the same level to both of you for your support and kindness -- much love! <3 :)))) <3 :D

2

worriedshuffle t1_j6o0srj wrote

GPTZero claims to measure the perplexity of a sample of text. Am I missing something or is that a complete scam? You can’t measure perplexity without access to the model logits, which aren’t available for GPT-3.

You could guess what the logits would be by gathering text samples but there’s no way a pet project could gather enough data to accurately estimate conditional probabilities.

1

Internal-Diet-514 t1_j6nzvcc wrote

When talking about dimensions I meant (number of rows, number of features) is 2 dimensions for tabular data. (Number of series, number of time steps, number of features) is 3 dimensions for time series and (number of images, width, height, channels) is 4 dimensions for image data. for deep learning classification, regardless of the number of dimensions it originally ingests it will become (number of series, features) or (number of images, features) when we get to the point of applying an mlp for classification.

You could consider an image to have width x height x channels features but thats not what a CNN does, the cnn extracts meaningful features from the high dimensional space. The feature extraction phase is what makes deep learning great for computer vision. Traditional ML models don’t have that phase.

0

farmingvillein t1_j6nxa0i wrote

> If you generate 10 complete implementations, you have 10 programs. If you generate 10 implementations of four subfunctions, you have 10,000 programs. By decomposing problems combinatorially, you call the language model less

Yup, agreed--this was my positive reference to "the big idea". Decomposition is almost certainly very key to any path forward in scaling up automated program generation in complexity, and the paper is a good example of that.

> Parsel is intentionally basically indented natural language w/ unit tests. There's minimal extra syntax for efficiency and generality.

I question whether the extra formal syntax is needed, at all. My guess is, were this properly ablated, it probably would not be. LLMs are--in my personal experience, and this is obviously born out thematically--quite flexible to different ways in representing, say, unit input and outputs. Permitting users to specify in a more arbitrary manner--whether in natural language, pseudocode, or extant programming languages--seems highly likely to work equally well, with some light coercion (i.e., training/prompting). Further, natural language allows test cases to be specified in a more general way ("unit tests: each day returns the next day in the week, Sunday=>Monday, ..., Saturday=>Sunday") that LLMs are well-suited to work with. Given LLM's ability to pick up on context and apply it, as well, there is a good chance that free-er form description of test cases are likely to drive improved performance.

If you want to call that further research--"it was easier to demonstrate the value of hierarchical decomposition with a DSL"--that's fine and understood, but I would call it out as a(n understandable) limitation of the paper and an opportunity for future research.

4