Recent comments in /f/MachineLearning

ThirdMover t1_j760ojx wrote

I think it's going to be interesting if we manage to teach a model to actually have a notion of "factual" and "counterfactual" - right now every prompt is treated as equally valid, GPT3 doesn't have an "opinion" as to what is actually really true. I am not sure that is even possible with text (maybe with some sort of special marker token?) but multimodality might lead the way there.

12

janpf t1_j75zh5u wrote

Reply to comment by asarig_ in [R] Graph Mixer Networks by asarig_

Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)

But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).

3

yldedly t1_j75rw5b wrote

Speaking as someone also working on an ambitious project that deviates a lot from mainstream ML, I encourage you to do the same thing I'm struggling with:

Try to implement the simplest possible version of your idea and test it on some toy problem to quickly get some insight.

Maybe start with one type of modulatory node and see how NEAT ends up using it?

5

Arthropodesque t1_j75ls38 wrote

Maybe it's so the devs can get used to working with AI Assitance. It will be an experiment to overhaul a software with AI Assistance. This is the future.

We can rebuild him: Stronger Faster

The 10 Billion Dollar Man that will then be an asset that can increase productivity 20% as of now, but will get exponentially better.

1

Lengador t1_j74ro7q wrote

That's the number in the headline, but if you look at the tables you can see their 223M parameter model beats the 175B parameter model significantly as well. That's 0.1% the size! Absolutely insane.

53

throwaway2676 t1_j74iilz wrote

Imo, chain-of-thought and program-of-thought reasoning will be the next major generation of progress for LLMs. Probably another year or two and we will be able to eliminate those goofy instances where the models confidently produce nonsense (well, mostly anyway).

53

ID4gotten t1_j74esuq wrote

I think you might be a little too in love with words like "neuromodulatory", while overlooking whether a simple deep FF network might be able to achieve what you're proposing. Just add a layer, nodes, and weights and you get this "modulatory" effect through linear combinations of the subsequent layers. Maybe I'm not grasping your intent, but I think if you can reduce it to math, you can then try to prove this is something that isn't already achieved through FF and backprop.

6