Recent comments in /f/MachineLearning

MysteryInc152 OP t1_jaccf9c wrote

>A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.

40

SatoshiNotMe t1_jacas00 wrote

I never liked wandb aggressive forced annual Subscription pricing. I’ve been a happy user of ClearML for a year now. I only use their hosted service for experiment tracking, I don’t have my own server.

No specific experience with long running jobs etc.

2

Magnesus t1_jac83a7 wrote

There was some discovery made recently about something to do with offset noise during training - people are speculating that MJ did that while others didn't. Here is video explaining how it works: https://m.youtube.com/watch?v=cVxQmbf3q7Q

On the other hand if that was it MJ would be better at generating dark images, so maybe not? Shame they don't share how they do it.

7

Zatujit t1_jac6493 wrote

Maybe it is just that physicists love Fortran. Probably for good reasons and that do not have something with deep learning. Sometimes you stick to what you really know well. I have a math background and I was really surprised when a physicist friend said he coded in Fortran...

2

ggf31416 t1_jac61sd wrote

2060 has 6GB of VRAM, right?

It should be possible to train with that amount https://huggingface.co/docs/transformers/perf_train_gpu_one#optimizer

If you need to train from scratch (most people will just finetune) this will take a while, original training took 90 hours in 8xV100, each one should be faster than your GPU https://www.arxiv-vanity.com/papers/1910.01108/

2

PredictorX1 t1_jac4fmg wrote

I encourage you to continue your own exploration, regardless of what anyone says "what everybody else is doing". The truth is, all of this is applied math, and the computers are merely where we do our work. Personally, I find this field much more interesting at the algorithm level: If you do something genuinely interesting, it is not somehow less valid because of the tools you used.

2

JClub t1_jabyi76 wrote

GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2022, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT

1

JClub t1_jabyhe8 wrote

GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2020, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT

1

JClub t1_jabyh73 wrote

GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2022, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT

1

-xylon t1_jabxwa6 wrote

I did an applied Math masters in 2016 and they teached us Fortran (Matlab too), along with the usual commercial software such as ansys + ofc all the PDE theory necessary.

Point being: it's niche but it's still there. Classmates who ended up tightly adhering to the masters career path now write Fortran for a living.

And don't try to sell me "but C++ does the same and it's better/more modern". I've written Fortran, I've written C++, and Fortran is neither arcane nor hard, especially when you use it for its intended purpose (FORmula TRANslation, i.e. physics sims), in fact it blows C++ out of the water in usability if you are not a computer scientist... Which is why physicists and mathematicians keep using it.

3

trajo123 t1_jabwl15 wrote

>I know it’s common for massive projects to use Fortran in order to train NN.

It is definitely not common. Yes, Fortran is used in scientific computation applications due to efficient and well tested linear algebra libraries and other numerical computing legacy code.

Fortran code is or can be used under-the-hood of higher level libraries / languages, such as Numpy for Python or Matlab. Even PyTorch uses LAPACK for linear algebra computations when running on the CPU. In this sense, yes, Fortran code is used, indirectly for training NNs. But using Fortran to actually implement a NN model and train it is virtually unheard of, as far as I know.

Maybe having a look at LAPACK will give you more insight.

3