ThirdMover t1_j760u5i wrote on February 4, 2023 at 9:42 AM

Reply to comment by PedroGonnet in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

Well, if you are at a billion the difference between continuous and discrete quantities becomes kind of hair splitting anyway....

ThirdMover t1_j760ojx wrote on February 4, 2023 at 9:39 AM

Reply to comment by throwaway2676 in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

I think it's going to be interesting if we manage to teach a model to actually have a notion of "factual" and "counterfactual" - right now every prompt is treated as equally valid, GPT3 doesn't have an "opinion" as to what is actually really true. I am not sure that is even possible with text (maybe with some sort of special marker token?) but multimodality might lead the way there.

PedroGonnet t1_j760d03 wrote on February 4, 2023 at 9:35 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

fewer than 1B params 😶

janpf t1_j75zh5u wrote on February 4, 2023 at 9:22 AM

Reply to comment by asarig_ in [R] Graph Mixer Networks by asarig_

Ha, the funny thing is that in the Google paper at least they replace the O(n^(2)) by a O(n*D_S), where D_S is a constant, so linear. But it so happens that D_S > n in their studies, so it's not really faster :) ... (edit: there is another constant in the transformers version also, but effectively the mixer was using same order of magnitute amount of TPU time to train)

But MLP-Mixers are a very interesting proposition anyway. Other types of mixers used are things like FFT (FNet).

[deleted] t1_j75xi18 wrote on February 4, 2023 at 8:53 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

[removed]

yldedly t1_j75rw5b wrote on February 4, 2023 at 7:34 AM

Reply to [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162

Speaking as someone also working on an ambitious project that deviates a lot from mainstream ML, I encourage you to do the same thing I'm struggling with:

Try to implement the simplest possible version of your idea and test it on some toy problem to quickly get some insight.

Maybe start with one type of modulatory node and see how NEAT ends up using it?

jimmymvp t1_j75qyff wrote on February 4, 2023 at 7:21 AM

Reply to comment by based_goats in [D] Normalizing Flows in 2023? by wellfriedbeans

Would be interested in that yes

Arthropodesque t1_j75ls38 wrote on February 4, 2023 at 6:16 AM

Reply to [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

Maybe it's so the devs can get used to working with AI Assitance. It will be an experiment to overhaul a software with AI Assistance. This is the future.

We can rebuild him: Stronger Faster

The 10 Billion Dollar Man that will then be an asset that can increase productivity 20% as of now, but will get exponentially better.

overzealous_dentist t1_j75lesd wrote on February 4, 2023 at 6:11 AM

Reply to comment by LeumasInkwater in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

"Slash marker"

YOLOBOT666 t1_j75i86x wrote on February 4, 2023 at 5:34 AM

Reply to comment by mostlyhydrogen in [D] Querying with multiple vectors during embedding nearest neighbor search? by mostlyhydrogen

Out of curiosity, what are you trying to achieve as in when is the iterative process going to stop, what would be the heuristics? Would appreciate if you could share some papers for this!

icanelectoo t1_j75h90j wrote on February 4, 2023 at 5:24 AM

Reply to [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

Look up some papers that discuss them, then look up the papers those paper refers to. Write out a summary as if you had to explain it to someone else who's never seen it before.

Alternatively you could ask chatGPT.

Parzival_007 t1_j75c9u0 wrote on February 4, 2023 at 4:33 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

This is big. Thanks for sharing this !

SAbdusSamad OP t1_j759v4v wrote on February 4, 2023 at 4:13 AM

Reply to comment by Erosis in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

I agree that having a background in RNNs and attention with RNNs can make the learning process for transformers, and by extension ViT, much easier.

SAbdusSamad OP t1_j75922f wrote on February 4, 2023 at 4:07 AM

Reply to comment by atharvat80 in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

These courses seem to have excellent content. I will definitely consider these as great resources.

SAbdusSamad OP t1_j758r29 wrote on February 4, 2023 at 4:04 AM

Reply to comment by the_architect_ai in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

Great advice. This seems to be a good starting point.

SAbdusSamad OP t1_j757w05 wrote on February 4, 2023 at 3:56 AM

Reply to comment by SimonJDPrince in [D] Understanding Vision Transformer (ViT) - What are the prerequisites? by SAbdusSamad

I recently obtained a PDF of the book and began searching for information on ViT. Unfortunately, it appears that the book does not cover this topic. However, I plan to utilize the Transformer chapter to gain an understanding of ViT.

zbyte64 t1_j74y5o9 wrote on February 4, 2023 at 2:33 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

What kind of hardware do I need to train this?

Lengador t1_j74ro7q wrote on February 4, 2023 at 1:42 AM

Reply to comment by AiChip in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

That's the number in the headline, but if you look at the tables you can see their 223M parameter model beats the 175B parameter model significantly as well. That's 0.1% the size! Absolutely insane.

[deleted] t1_j74lqob wrote on February 4, 2023 at 12:55 AM

Reply to [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162

[removed]

AiChip t1_j74ku5a wrote on February 4, 2023 at 12:48 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

Wow! This is huge! 1B parameters model beating 175 B parameters model…

throwaway2676 t1_j74iilz wrote on February 4, 2023 at 12:30 AM

Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501

Imo, chain-of-thought and program-of-thought reasoning will be the next major generation of progress for LLMs. Probably another year or two and we will be able to eliminate those goofy instances where the models confidently produce nonsense (well, mostly anyway).

Feeling_Card_4162 OP t1_j74gohh wrote on February 4, 2023 at 12:17 AM

Reply to comment by ID4gotten in [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162

The point is to be more efficient and dynamic than a normal FF network w/ backpropagation

ID4gotten t1_j74esuq wrote on February 4, 2023 at 12:03 AM

Reply to [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162

I think you might be a little too in love with words like "neuromodulatory", while overlooking whether a simple deep FF network might be able to achieve what you're proposing. Just add a layer, nodes, and weights and you get this "modulatory" effect through linear combinations of the subsequent layers. Maybe I'm not grasping your intent, but I think if you can reduce it to math, you can then try to prove this is something that isn't already achieved through FF and backprop.

comfytoday t1_j745ig2 wrote on February 3, 2023 at 10:57 PM

Reply to comment by blacksnowboader in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata

I was hoping for a sample of your mildly passive aggressive emails.

Feeling_Card_4162 OP t1_j744rzv wrote on February 3, 2023 at 10:52 PM

Reply to comment by blimpyway in [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162

As I stated, either a combined score over a set of tasks or abstracted away by using rtNEAT. In the case of rtNEAT, it would be up to the agent when to reproduce depending on the provided dangers, etc. in the simulated environment

Recent comments in /f/MachineLearning