Recent comments in /f/MachineLearning
ThirdMover t1_j77bf6z wrote
Reply to comment by yaosio in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
> I think it's likely the ability to determine what is true and what isn't will come from a capability of the model rather than it being told what is and isn't true. It's not possible to mark text as true or not true as this assumes whomever is mafking these things is the sole authority on the truth and never makes mistakes.
I think there is a bit of a misunderstanding here. The issue isn't that GPT3 has wrong opinions about stuff. The issue is that it doesn't have any opinions about what is real or isn't whatsoever. Of course any future AI will operate on limited and flawed information and thus have opinions that are not perfectly true. But before we can even get to that point a model needs to even have the idea of "real" and "not real" as fundamental categories. For GPT3 everything is just text, Harry Potter is as real as Obama. Maybe I am wrong and inference can actually get you there through pure consistency checks, as you say. But we will have to see about that.
Dry_Painter9816 t1_j77agm0 wrote
Reply to 15 years old and bad at math [D] by Daniel_C_____
DM me. I can help you create a model so you can see for yourself. Honestly you just need to simplify your data(looks like excel spreadsheet). And learn how to interpret graphs(confusion matrix, roc curve, feature rankinga, MDA value etc). Basically like 3/4 chapters out of stat book. And I have high level machine learning model project and interpretation by professionals including myself to share. But all in all ofcourse knowing the math deepens your understanding. But is not quite necessary to createand interpret a model. As I am currently obtaining my AI and ethics grad cert. Along with among other ai/ machine learning classes.
oldkottor t1_j779n81 wrote
Reply to 15 years old and bad at math [D] by Daniel_C_____
You can use common tools and frameworks without math knowledge. Math is needed though if you want to be able to make a breakthrough (that is to even have a chance).
You also can concentrate on computer science stuff instead of math and go into optimizing learning process and inference, it is highly in demand right now.
Lopsided-Factor-780 t1_j7743wv wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Question from a noob:
When they say H_Fuse is fed into the decoder model, such that Y = Decoder(H_Fuse), how is it fed in? Is it fed in like the encoder output in an encoder-decoder transformer with cross-attention? Or something else?
Also, if there is a separate encoder and decoder component, are they trained together or separately?
VectorSpaceModel t1_j76zubh wrote
The IR basics are timeless. I’ve read parts of the first textbook and it’s really good.
EmbarrassedHelp OP t1_j76zkur wrote
Reply to [N] GitHub CEO on why open source developers should be exempt from the EU’s AI Act by EmbarrassedHelp
The future of open source AI seems to be up in the air right now, with the EU potentially seeking to place heavy restrictions on generative AI that would severely hamper or outright ban open source projects.
The EU industry chief Thierry Breton wants generative AI like ChatGPT to be considered "high risk" and thus tightly controlled (including downstream applications), which would make open source versions extremely difficult or even impossible to release: https://www.reuters.com/technology/eus-breton-warns-chatgpt-risks-ai-rules-seek-tackle-concerns-2023-02-03/
jaqws t1_j76zkhu wrote
Reply to comment by __lawless in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Ah, yeah I would agree that's not a fair comparison. Thanks for sharing.
__lawless t1_j76xpgk wrote
Reply to comment by jaqws in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Just 2 points a) They fine tuned this model to death. Where as GPT3.5 has a handful of examples to fine tune b) This is a multi modal model which consumes the image directly. Where as GPT can only consume text, so they fed it caption of the image
__lawless t1_j76wlwb wrote
Reply to comment by zbyte64 in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
They did it on 4 V100 with 32GB RAM
jaqws t1_j76wfb1 wrote
Reply to comment by __lawless in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Why do you say it isn't a fair comparison?
yaosio t1_j76vwr2 wrote
Reply to comment by ThirdMover in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
I think it's likely the ability to determine what is true and what isn't will come from a capability of the model rather than it being told what is and isn't true. It's not possible to mark text as true or not true as this assumes whomever is mafking these things is the sole authority on the truth and never makes mistakes.
At a certain level of capability the AI will be able to use all of its knowledge to determine what is and isn't true. For example, if you know enough about physics and the Earth, you'll know that the sky is blue without seeing it. For something that can't be confirmed or denied, such as, "Bob puts his shoes on before his pants." The AI could determine the likelihood of such an action based on what it knows about Bob, pants, and shoes.
If it's trained on lies it could determine they are lies because the data is not consistent. If I train you that every number plus another number is a number, but 2+2 is special and equals chair, you could determine I'm lying because it's not consistent with all the data as a whole.
Truth has a consistency to it that lies don't have, and a model can learn that.
__lawless t1_j76vq7h wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Just finished reading. Although imho not a very fair comparison with GPT it still is super impressive
dancingnightly t1_j76uuee wrote
Reply to comment by Feeling_Card_4162 in [R] Topologically evolving new self-modifying multi-task learning algorithms by Feeling_Card_4162
In this goal, you may find Mixture of Experts architectures interesting.
I like your idea. I have always thought too that in ML we are trying to replicate one human on one task with the worlds data for that task, or one human on many tasks, more recently.
But older ideas and replicating societies and communication for one or many tasks could be equally or more effective. Which this heads in the direction of. There is a library called GeNN which is pretty useful for these experiments, although it's a little slow due to deliberate true-to-biology design.
badabummbadabing t1_j76tfqt wrote
Reply to comment by jimmymvp in [D] Normalizing Flows in 2023? by wellfriedbeans
Fully agree from a technical perspective with you.
The difference is that at best, you only get the likelihood under your model of choice. If that happens to be a bad model of reality (which I'd argue is the case more often than not with NFs), you might be better off just using some approximate likelihood (or ELBO) of a more powerful model.
But I am not an expert in MCMC models, so I might be talking out of my depth here. I was mainly using these models for MAP estimation.
dancingnightly t1_j76t0gh wrote
Reply to comment by zbyte64 in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
In theory training T5 alongiside the image embedding models they use (primarily DETR?) shouldn't take much more than a 3090 or Collab Pro GPU. You could train T5s on even consumer high end GPUs in 2020, for example, but the DETR image model probably needs to be ran for each image at the same time which might take up quite a bit of GPU together. The `main.py` script looks like a nice and fairly short typical training script you'd be able to quickly run if you download their repo, pull the scienceQA dataset and send the training args to see if it crashes.
teenaxta t1_j76i085 wrote
most ViT discussions or videos I saw assume you have an idea of attention and transformers
watch this video series to get an idea of attention and transformers in general and then you'll be good to go
cruddybanana1102 t1_j76eb6v wrote
Schutze and Manning's book on Information Retrieval is your best guide.
matth0x01 t1_j76dt6k wrote
Depends a bit on your skill level and what you want to achieve.
I started with the Introduction to Information Retrieval (2008) book, which was quite math-heavy back then. But I learned a lot and found it a good starting point.
You get the concept of decompounding, reverse index, ranking functions, etc.
Newer IR strategies involve word2vec methods for item representation instead of handcrafted ones or directly learning the search ranking function, which is a different beast compared to traditional search engines.
Desticheq t1_j76ad08 wrote
Reply to comment by sponsored-by-potato in Information Retrieval book recommendations? [D] by Ggronne
RemindMe! 1 week
larswl1 t1_j7688oq wrote
I don't know about the new books, but these seem important to me to start with. They set the main tasks of information retrieval. And to solve some specific problems, there are many different articles, for example, ss conferences SIGIR
RemindMeBot t1_j767him wrote
Reply to comment by sponsored-by-potato in Information Retrieval book recommendations? [D] by Ggronne
I will be messaging you in 1 day on 2023-02-05 11:18:50 UTC to remind you of this link
CLICK THIS LINK to send a PM to also be reminded and to reduce spam.
^(Parent commenter can ) ^(delete this message to hide from others.)
| ^(Info) | ^(Custom) | ^(Your Reminders) | ^(Feedback) |
|---|
sponsored-by-potato t1_j767frp wrote
RemindMe! 1 day
[deleted] t1_j766r6p wrote
HunteronX t1_j761xqh wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
The economics is getting there for these models to be big news...
The key features of this work seem to be:
-
A multimodal embedding representation obtained by individual modality encoders (patch-level for images, token level for text), combined via attention.
-
Generate rationales first, then infer answers from them, due to accuracy reduction on answers.
(Not an expert: but is the greater % of hallucinated rationales in baseline case - no vision features - due to large 'context' needed for both rationale + answer, without those features?)
Seems that multimodal representations (language + n=? other modalities) may be important for introducing a loose physical grounding to avoid hallucinating plausible ideas/suggestions + efficient representation of the remaining ideas.
MediterraneanPirate t1_j77bh7n wrote
Reply to 15 years old and bad at math [D] by Daniel_C_____
Calculus, linear algebra and statistics are a must in ML, which are almost always taught at 11th or 12th grade (at least in my country). So I'd recommend not to worry about math immediately. Best thing you can do is to see what you can build with what you know TODAY. If it doesn't satisfy you, you'll immediately know what you should learn next. As an added bonus, anything math-related you learn about ML will probably help you at school too.