currentscurrents t1_jadte26 wrote on February 28, 2023 at 6:44 PM

Reply to comment by 1azytux in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

T5 and Flan-T5 have weights available.

farmingvillein t1_jadt897 wrote on February 28, 2023 at 6:43 PM

Reply to comment by deliciously_methodic in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

FWIW, I was trying to make a more subtle point than OP's response--see my other reply.

bluebolt789 t1_jads2m6 wrote on February 28, 2023 at 6:36 PM

Reply to comment by 2blazen in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]

Oh ok.

I didn’t think this was such an obvious question to ask since I don’t have labeled data, but I’ll take your advice…

Btw, since it’s an easy question, if you have any input (besides googling or cross posting) to give me, happy to hear it!

2blazen t1_jadrdgu wrote on February 28, 2023 at 6:31 PM

Reply to comment by bluebolt789 in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]

I think what he means is your question is beneath the sub's standards lol

You may have more luck googling specifically about cross domain sentiment analysis, asking chatgpt, or asking it on r/MLQuestions or r/learndatascience

farmingvillein t1_jadqg1l wrote on February 28, 2023 at 6:26 PM

Reply to comment by MysteryInc152 in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

You're missing the point here, or I wasn't clear--the question isn't whether performance will improve with more params (and potentially) data; no doubt there.

The question is whether a model trained at scale on text & images will outperform a model trained at scale solely on text, in the text-only domain (or similarly, the image-only).

To-date, all* of the public research (and Kosmos is no different) on multimodal models have showed, at best, multimodal models generally performing equal to unimodal variants in unimodal domains. And often they are a shade worse (like Kosmos).

(*=unless you count code+natural language.)

The holy grail, of course, is that the two help one another, so that your multimodal variant outperforms the unimodal variants on unimodal tasks. GPT-* gets better at talking to you because it has ingested all of the Youtube videos in the world, e.g.

If you can demonstrate that (and it certainly makes intuitive human sense that this could/should be true), then of course there is a giant truckload of image (including video!) and audio data you can slam into your text models to make text-based scenarios better (and similarly for images, etc.). (And it also more plausibly suggests that massive amounts of synthetic world exploration data could be accretive, too...)

There is a bunch of research (https://arxiv.org/abs/2301.03728 being one of the most exciting) suggesting that this can occur, with enough data/params, but no one has publicly demonstrated it. (And it'd surprise no one, probably, if this was part of GPT-4's or Gato-2's mix.)

1azytux t1_jadp0aa wrote on February 28, 2023 at 6:17 PM

Reply to comment by [deleted] in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

do you know which foundation models we can use though, or are open sourced? It seems like every other model is either not available or their weights aren't released yet. It's case with, CoCa, Florence, Flamingo, BEiT3, FILIP, ALIGN. I was able to find weights for ALBEF.

1azytux t1_jadmvbe wrote on February 28, 2023 at 6:03 PM

Reply to [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

can we download the model weights? is it open sourced? or maybe perform zero shot tasks by ourselves?

not_particulary OP t1_jadmsf9 wrote on February 28, 2023 at 6:02 PM

Reply to comment by neu_jose in [D] More stable alternative to wandb? by not_particulary

Huh, so I'm only using the offline sync, so it can only be wandb that has a memory leak.

bluebolt789 t1_jadmcuq wrote on February 28, 2023 at 6:00 PM

Reply to comment by professorlust in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]

Yeah but most of it is supervised (not necessarily a neural network, even more classic approaches) or pre-trained on tweets/product reviews. I haven’t found anything pre-trained on it support tickets.

Am I missing something obvious?

[deleted] t1_jadkcqd wrote on February 28, 2023 at 5:47 PM

Reply to comment by Beli_Mawrr in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

[deleted]

nativedutch t1_jadjn3y wrote on February 28, 2023 at 5:43 PM

Reply to comment by Javlington in [P] [R] Neural Network in Fortran! by Etterererererer

I still like assembler, whatever platform. It gives you a very direct feel between machine and function. It is more work though.

dancingnightly t1_jadj7fa wrote on February 28, 2023 at 5:40 PM

Reply to comment by Beli_Mawrr in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

Edit: Seems like for this one yes. They do consider human instructions (similarish to the goal of a RLHF which requires more RAM), by adding them directly in the text dataset, as mentioned in 3.3 Language-Only Instruction Tuning-

For other models, like OpenAssistant coming up, one thing to note is that, although the generative model itself may be runnable locally, the reward model (the bit that "adds finishing touches" and ensures following instructions) can be much bigger. Even if the GPT-J underlying model is 11GB on RAM and 6B params, the RLHF could seriously increase that.

This models is in the realm of the smaller T5, BART and GPT-2 models released 3 years ago and runnable then on decent gaming GPUs

Javlington t1_jadhyz3 wrote on February 28, 2023 at 5:32 PM

Reply to comment by nativedutch in [P] [R] Neural Network in Fortran! by Etterererererer

Chris Sawyer wrote Rollercoaster Tycoon purely in assembly!

Javlington t1_jadhslj wrote on February 28, 2023 at 5:31 PM

Reply to comment by royalemate357 in [P] [R] Neural Network in Fortran! by Etterererererer

Parts of scipy is in FORTRAN, not numpy afaik!

professorlust t1_jadgwa2 wrote on February 28, 2023 at 5:26 PM

Reply to comment by bluebolt789 in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]

Sentiment analysis is pretty “standard” NLP ml at this point.

There’s literally 1000s of tutorials, medium articles, YouTube video etc on the topic.

That’s without getting into more academic/research focused articles

uhules t1_jadfp2z wrote on February 28, 2023 at 5:18 PM

Reply to comment by hackinthebochs in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152

At the point where it stops being a P(w|h) estimator.

[deleted] t1_jadcm6k wrote on February 28, 2023 at 4:58 PM

Reply to [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

[removed]

Socratic-Inquisitor t1_jadclww wrote on February 28, 2023 at 4:58 PM

Reply to comment by maximalentropy in [D] CVPR Rebuttal scores are out! by ElPelana

Nope, the decision is usually made by 2 ACs and 1 senior AC. Take the L and try your luck in the next slot machine.

ahiddenmessi2 OP t1_jad9upi wrote on February 28, 2023 at 4:41 PM

Reply to comment by I_will_delete_myself in [D] Training transformer on RTX2060 by ahiddenmessi2

Thank you. I am looking at codeBERT which might satisfy my needs

curiousshortguy t1_jad9s4t wrote on February 28, 2023 at 4:40 PM

Reply to comment by Beli_Mawrr in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152

it is, you can probably do 2 to 8 billion on your average gaming pc, and 16 on a high end one

I_will_delete_myself t1_jad9amj wrote on February 28, 2023 at 4:37 PM

Reply to [D] Training transformer on RTX2060 by ahiddenmessi2

ChatGPT uses GPT-3.5, which is a pre-trained model. Google uses pertained models. Facebook created a pre-trained model recently.

If these models satisfy their needs it will definitely satisfy yours. Unless if you are going beyond a kind of problem that hasn't been tackled before, a pre-trained model will save you so much time training and require a lot less data to get it to converge and actually be useful.