Recent comments in /f/MachineLearning
ThirdMover t1_j760ojx wrote
Reply to comment by throwaway2676 in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
I think it's going to be interesting if we manage to teach a model to actually have a notion of "factual" and "counterfactual" - right now every prompt is treated as equally valid, GPT3 doesn't have an "opinion" as to what is actually really true. I am not sure that is even possible with text (maybe with some sort of special marker token?) but multimodality might lead the way there.
PedroGonnet t1_j760d03 wrote
janpf t1_j75zh5u wrote
Reply to comment by asarig_ in [R] Graph Mixer Networks by asarig_
Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)
But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).
yldedly t1_j75rw5b wrote
Reply to [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162
Speaking as someone also working on an ambitious project that deviates a lot from mainstream ML, I encourage you to do the same thing I'm struggling with:
Try to implement the simplest possible version of your idea and test it on some toy problem to quickly get some insight.
Maybe start with one type of modulatory node and see how NEAT ends up using it?
jimmymvp t1_j75qyff wrote
Reply to comment by based_goats in [D] Normalizing Flows in 2023? by wellfriedbeans
Would be interested in that yes
Arthropodesque t1_j75ls38 wrote
Reply to [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Maybe it's so the devs can get used to working with AI Assitance. It will be an experiment to overhaul a software with AI Assistance. This is the future.
We can rebuild him: Stronger Faster
The 10 Billion Dollar Man that will then be an asset that can increase productivity 20% as of now, but will get exponentially better.
overzealous_dentist t1_j75lesd wrote
Reply to comment by LeumasInkwater in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
"Slash marker"
YOLOBOT666 t1_j75i86x wrote
Reply to comment by mostlyhydrogen in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen
Out of curiosity, what are you trying to achieve as in when is the iterative process going to stop, what would be the heuristics? Would appreciate if you could share some papers for this!
icanelectoo t1_j75h90j wrote
Look up some papers that discuss them, then look up the papers those paper refers to. Write out a summary as if you had to explain it to someone else who's never seen it before.
Alternatively you could ask chatGPT.
Parzival_007 t1_j75c9u0 wrote
SAbdusSamad OP t1_j759v4v wrote
Reply to comment by Erosis in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad
I agree that having a background in RNNs and attention with RNNs can make the learning process for transformers, and by extension ViT, much easier.
SAbdusSamad OP t1_j75922f wrote
Reply to comment by atharvat80 in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad
These courses seem to have excellent content. I will definitely consider these as great resources.
SAbdusSamad OP t1_j758r29 wrote
Reply to comment by the_architect_ai in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad
Great advice. This seems to be a good starting point.
SAbdusSamad OP t1_j757w05 wrote
Reply to comment by SimonJDPrince in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad
I recently obtained a PDF of the book and began searching for information on ViT. Unfortunately, it appears that the book does not cover this topic. However, I plan to utilize the Transformer chapter to gain an understanding of ViT.
zbyte64 t1_j74y5o9 wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
What kind of hardware do I need to train this?
Lengador t1_j74ro7q wrote
Reply to comment by AiChip in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
That's the number in the headline, but if you look at the tables you can see their 223M parameter model beats the 175B parameter model significantly as well. That's 0.1% the size! Absolutely insane.
[deleted] t1_j74lqob wrote
AiChip t1_j74ku5a wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Wow! This is huge! 1B parameters model beating 175 B parameters model…
throwaway2676 t1_j74iilz wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Imo, chain-of-thought and program-of-thought reasoning will be the next major generation of progress for LLMs. Probably another year or two and we will be able to eliminate those goofy instances where the models confidently produce nonsense (well, mostly anyway).
Feeling_Card_4162 OP t1_j74gohh wrote
Reply to comment by ID4gotten in [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162
The point is to be more efficient and dynamic than a normal FF network w/ backpropagation
ID4gotten t1_j74esuq wrote
Reply to [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162
I think you might be a little too in love with words like "neuromodulatory", while overlooking whether a simple deep FF network might be able to achieve what you're proposing. Just add a layer, nodes, and weights and you get this "modulatory" effect through linear combinations of the subsequent layers. Maybe I'm not grasping your intent, but I think if you can reduce it to math, you can then try to prove this is something that isn't already achieved through FF and backprop.
comfytoday t1_j745ig2 wrote
Reply to comment by blacksnowboader in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
I was hoping for a sample of your mildly passive aggressive emails.
Feeling_Card_4162 OP t1_j744rzv wrote
Reply to comment by blimpyway in [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162
As I stated, either a combined score over a set of tasks or abstracted away by using rtNEAT. In the case of rtNEAT, it would be up to the agent when to reproduce depending on the provided dangers, etc. in the simulated environment
ThirdMover t1_j760u5i wrote
Reply to comment by PedroGonnet in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Well, if you are at a billion the difference between continuous and discrete quantities becomes kind of hair splitting anyway....