Recent comments in /f/MachineLearning
farmingvillein t1_jadt897 wrote
Reply to comment by deliciously_methodic in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
FWIW, I was trying to make a more subtle point than OP's response--see my other reply.
bluebolt789 t1_jads2m6 wrote
Reply to comment by 2blazen in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]
Oh ok.
I didn’t think this was such an obvious question to ask since I don’t have labeled data, but I’ll take your advice…
Btw, since it’s an easy question, if you have any input (besides googling or cross posting) to give me, happy to hear it!
2blazen t1_jadrdgu wrote
Reply to comment by bluebolt789 in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]
I think what he means is your question is beneath the sub's standards lol
You may have more luck googling specifically about cross domain sentiment analysis, asking chatgpt, or asking it on r/MLQuestions or r/learndatascience
farmingvillein t1_jadqg1l wrote
Reply to comment by MysteryInc152 in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
You're missing the point here, or I wasn't clear--the question isn't whether performance will improve with more params (and potentially) data; no doubt there.
The question is whether a model trained at scale on text & images will outperform a model trained at scale solely on text, in the text-only domain (or similarly, the image-only).
To-date, all* of the public research (and Kosmos is no different) on multimodal models have showed, at best, multimodal models generally performing equal to unimodal variants in unimodal domains. And often they are a shade worse (like Kosmos).
(*=unless you count code+natural language.)
The holy grail, of course, is that the two help one another, so that your multimodal variant outperforms the unimodal variants on unimodal tasks. GPT-* gets better at talking to you because it has ingested all of the Youtube videos in the world, e.g.
If you can demonstrate that (and it certainly makes intuitive human sense that this could/should be true), then of course there is a giant truckload of image (including video!) and audio data you can slam into your text models to make text-based scenarios better (and similarly for images, etc.). (And it also more plausibly suggests that massive amounts of synthetic world exploration data could be accretive, too...)
There is a bunch of research (https://arxiv.org/abs/2301.03728 being one of the most exciting) suggesting that this can occur, with enough data/params, but no one has publicly demonstrated it. (And it'd surprise no one, probably, if this was part of GPT-4's or Gato-2's mix.)
1azytux t1_jadp0aa wrote
Reply to comment by [deleted] in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
do you know which foundation models we can use though, or are open sourced? It seems like every other model is either not available or their weights aren't released yet. It's case with, CoCa, Florence, Flamingo, BEiT3, FILIP, ALIGN. I was able to find weights for ALBEF.
1azytux t1_jadmvbe wrote
Reply to [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
can we download the model weights? is it open sourced? or maybe perform zero shot tasks by ourselves?
not_particulary OP t1_jadmsf9 wrote
Reply to comment by neu_jose in [D] More stable alternative to wandb? by not_particulary
Huh, so I'm only using the offline sync, so it can only be wandb that has a memory leak.
bluebolt789 t1_jadmcuq wrote
Reply to comment by professorlust in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]
Yeah but most of it is supervised (not necessarily a neural network, even more classic approaches) or pre-trained on tweets/product reviews. I haven’t found anything pre-trained on it support tickets.
Am I missing something obvious?
[deleted] t1_jadkcqd wrote
nativedutch t1_jadjn3y wrote
Reply to comment by Javlington in [P] [R] Neural Network in Fortran! by Etterererererer
I still like assembler, whatever platform. It gives you a very direct feel between machine and function. It is more work though.
dancingnightly t1_jadj7fa wrote
Reply to comment by Beli_Mawrr in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
Edit: Seems like for this one yes. They do consider human instructions (similarish to the goal of a RLHF which requires more RAM), by adding them directly in the text dataset, as mentioned in 3.3 Language-Only Instruction Tuning-
For other models, like OpenAssistant coming up, one thing to note is that, although the generative model itself may be runnable locally, the reward model (the bit that "adds finishing touches" and ensures following instructions) can be much bigger. Even if the GPT-J underlying model is 11GB on RAM and 6B params, the RLHF could seriously increase that.
This models is in the realm of the smaller T5, BART and GPT-2 models released 3 years ago and runnable then on decent gaming GPUs
Javlington t1_jadhyz3 wrote
Reply to comment by nativedutch in [P] [R] Neural Network in Fortran! by Etterererererer
Chris Sawyer wrote Rollercoaster Tycoon purely in assembly!
Javlington t1_jadhslj wrote
Reply to comment by royalemate357 in [P] [R] Neural Network in Fortran! by Etterererererer
Parts of scipy is in FORTRAN, not numpy afaik!
professorlust t1_jadgwa2 wrote
Reply to comment by bluebolt789 in [Discussion] Can you use a model trained on tweets/product reviews to do sentiment analysis on IT support tickets? by [deleted]
Sentiment analysis is pretty “standard” NLP ml at this point.
There’s literally 1000s of tutorials, medium articles, YouTube video etc on the topic.
That’s without getting into more academic/research focused articles
uhules t1_jadfp2z wrote
Reply to comment by hackinthebochs in [R] Large language models generate functional protein sequences across diverse families by MysteryInc152
At the point where it stops being a P(w|h) estimator.
Socratic-Inquisitor t1_jadclww wrote
Reply to comment by maximalentropy in [D] CVPR Rebuttal scores are out! by ElPelana
Nope, the decision is usually made by 2 ACs and 1 senior AC. Take the L and try your luck in the next slot machine.
ahiddenmessi2 OP t1_jad9upi wrote
Reply to comment by I_will_delete_myself in [D] Training transformer on RTX2060 by ahiddenmessi2
Thank you. I am looking at codeBERT which might satisfy my needs
curiousshortguy t1_jad9s4t wrote
Reply to comment by Beli_Mawrr in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
it is, you can probably do 2 to 8 billion on your average gaming pc, and 16 on a high end one
I_will_delete_myself t1_jad9amj wrote
Reply to [D] Training transformer on RTX2060 by ahiddenmessi2
ChatGPT uses GPT-3.5, which is a pre-trained model. Google uses pertained models. Facebook created a pre-trained model recently.
If these models satisfy their needs it will definitely satisfy yours. Unless if you are going beyond a kind of problem that hasn't been tackled before, a pre-trained model will save you so much time training and require a lot less data to get it to converge and actually be useful.
abnormal_human t1_jad6qae wrote
RetroPenguin_ t1_jad51qy wrote
Reply to comment by abnormal_human in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
For the >10B closed source models, I’d be really curious how many of those weights are zero with fp16 precision.
Beli_Mawrr t1_jad4r9n wrote
Reply to comment by [deleted] in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
That's almost in the realm of my computer can run it, no?
MysteryInc152 OP t1_jad4h86 wrote
Reply to comment by deliciously_methodic in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
I just mean a bigger model, that is more parameters.
currentscurrents t1_jadte26 wrote
Reply to comment by 1azytux in [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
T5 and Flan-T5 have weights available.