Recent comments in /f/MachineLearning
SatoshiNotMe t1_jacas00 wrote
Reply to [D] More stable alternative to wandb? by not_particulary
I never liked wandb aggressive forced annual Subscription pricing. I’ve been a happy user of ClearML for a year now. I only use their hosted service for experiment tracking, I don’t have my own server.
No specific experience with long running jobs etc.
ahiddenmessi2 OP t1_jacahig wrote
Reply to comment by CKtalon in [D] Training transformer on RTX2060 by ahiddenmessi2
Thank you . I will take a look of my number of parameters .
RingoCatKeeper t1_jac8qzr wrote
Reply to comment by Magnesus in [D] What is the most "opaque" popular machine learning model in 2023? by fromnighttilldawn
Thanks for the link, this methods sounds workable. The earilest version of MJ results was somehow blurry and noisey, I wonder if it was because of this method.
Magnesus t1_jac83a7 wrote
Reply to comment by RingoCatKeeper in [D] What is the most "opaque" popular machine learning model in 2023? by fromnighttilldawn
There was some discovery made recently about something to do with offset noise during training - people are speculating that MJ did that while others didn't. Here is video explaining how it works: https://m.youtube.com/watch?v=cVxQmbf3q7Q
On the other hand if that was it MJ would be better at generating dark images, so maybe not? Shame they don't share how they do it.
Zatujit t1_jac6493 wrote
Reply to comment by Etterererererer in [P] [R] Neural Network in Fortran! by Etterererererer
Maybe it is just that physicists love Fortran. Probably for good reasons and that do not have something with deep learning. Sometimes you stick to what you really know well. I have a math background and I was really surprised when a physicist friend said he coded in Fortran...
ggf31416 t1_jac61sd wrote
Reply to [D] Training transformer on RTX2060 by ahiddenmessi2
2060 has 6GB of VRAM, right?
It should be possible to train with that amount https://huggingface.co/docs/transformers/perf_train_gpu_one#optimizer
If you need to train from scratch (most people will just finetune) this will take a while, original training took 90 hours in 8xV100, each one should be faster than your GPU https://www.arxiv-vanity.com/papers/1910.01108/
charlesGodman t1_jac55ex wrote
Reply to [D] More stable alternative to wandb? by not_particulary
Neptune.ai
PredictorX1 t1_jac4fmg wrote
Reply to [P] [R] Neural Network in Fortran! by Etterererererer
I encourage you to continue your own exploration, regardless of what anyone says "what everybody else is doing". The truth is, all of this is applied math, and the computers are merely where we do our work. Personally, I find this field much more interesting at the algorithm level: If you do something genuinely interesting, it is not somehow less valid because of the tools you used.
aigoritma-1 t1_jac492v wrote
Reply to [D] Training transformer on RTX2060 by ahiddenmessi2
I can recommend a free open-source lib to help you train on cloud if you need more resources https://skypilot.readthedocs.io/en/latest/
aigoritma-1 t1_jac4029 wrote
Looks great! I think it's handy to gather open source tools together and make it easier to use, but the downside is you have to learn another API to on top of those. Maybe a UI can help? Can I ask the motivation you started with this? Amazing progress indeed.
RingoCatKeeper t1_jac20hg wrote
Vote for Midjourney. I don't know how they improved their performance, no paper or publications.
nativedutch t1_jac20a6 wrote
Reply to comment by -xylon in [P] [R] Neural Network in Fortran! by Etterererererer
Wont sell anything I have no hard opinion on it, where i worked fortran, cobol just disappeared.
I never used it being in real time machine language (assembler) programming, which is another universe.
Borky_ t1_jac1gy9 wrote
Reply to comment by jamesj in [D] What is the most "opaque" popular machine learning model in 2023? by fromnighttilldawn
he also had videos on building mini chat-gpt, man's a treasure
[deleted] t1_jac0x53 wrote
Reply to comment by step21 in [D] What is the most "opaque" popular machine learning model in 2023? by fromnighttilldawn
[removed]
jamesj t1_jac05ai wrote
Look up Andrej karpathys YouTube videos of building makemore from scratch
step21 t1_jabzt1w wrote
If you say you had a good understanding until then, what changed? The GPT architecture as far as I know in newer editions didn’t change completely, but made smaller changes and spent a lot of time on better data, better curation/guidelines etc.
pyonsu2 t1_jabzq81 wrote
Reply to [D] More stable alternative to wandb? by not_particulary
What are potential alternatives?
Mlflow, tensorboard?
JClub t1_jabyi76 wrote
Reply to comment by AiChip in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2022, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT
JClub t1_jabyhe8 wrote
Reply to [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2020, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT
JClub t1_jabyh73 wrote
Reply to comment by astonzhang in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
GPT was never trained with image data, why is this a fair comparison? The UnifiedQA model is from 2022, so it doesn't seem fair either. Why don't we have some comparisons with other SOTA multimodal models? Such as OFA or UniT
-xylon t1_jabxwa6 wrote
Reply to comment by nativedutch in [P] [R] Neural Network in Fortran! by Etterererererer
I did an applied Math masters in 2016 and they teached us Fortran (Matlab too), along with the usual commercial software such as ansys + ofc all the PDE theory necessary.
Point being: it's niche but it's still there. Classmates who ended up tightly adhering to the masters career path now write Fortran for a living.
And don't try to sell me "but C++ does the same and it's better/more modern". I've written Fortran, I've written C++, and Fortran is neither arcane nor hard, especially when you use it for its intended purpose (FORmula TRANslation, i.e. physics sims), in fact it blows C++ out of the water in usability if you are not a computer scientist... Which is why physicists and mathematicians keep using it.
trajo123 t1_jabwl15 wrote
Reply to [P] [R] Neural Network in Fortran! by Etterererererer
>I know it’s common for massive projects to use Fortran in order to train NN.
It is definitely not common. Yes, Fortran is used in scientific computation applications due to efficient and well tested linear algebra libraries and other numerical computing legacy code.
Fortran code is or can be used under-the-hood of higher level libraries / languages, such as Numpy for Python or Matlab. Even PyTorch uses LAPACK for linear algebra computations when running on the CPU. In this sense, yes, Fortran code is used, indirectly for training NNs. But using Fortran to actually implement a NN model and train it is virtually unheard of, as far as I know.
Maybe having a look at LAPACK will give you more insight.
nativedutch t1_jabwc5v wrote
Reply to comment by tysam_and_co in [P] [R] Neural Network in Fortran! by Etterererererer
never too old to learn, i didnt know that. Amazing.
Nothing about Cobol?
I liked Forth, but that died.
Etterererererer OP t1_jabvoad wrote
Reply to comment by tysam_and_co in [P] [R] Neural Network in Fortran! by Etterererererer
That’s a very good idea. I probably will do that, after getting a lot of these responses I’ve been thinking about just doing C++
MysteryInc152 OP t1_jaccf9c wrote
Reply to [R] Microsoft introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot) by MysteryInc152
>A big convergence of language, multimodal perception, action, and world modeling is a key step toward artificial general intelligence. In this work, we introduce Kosmos-1, a Multimodal Large Language Model (MLLM) that can perceive general modalities, learn in context (i.e., few-shot), and follow instructions (i.e., zero-shot). Specifically, we train Kosmos-1 from scratch on web-scale multimodal corpora, including arbitrarily interleaved text and images, image-caption pairs, and text data. We evaluate various settings, including zero-shot, few-shot, and multimodal chain-of-thought prompting, on a wide range of tasks without any gradient updates or finetuning. Experimental results show that Kosmos-1 achieves impressive performance on (i) language understanding, generation, and even OCR-free NLP (directly fed with document images), (ii) perception-language tasks, including multimodal dialogue, image captioning, visual question answering, and (iii) vision tasks, such as image recognition with descriptions (specifying classification via text instructions). We also show that MLLMs can benefit from cross-modal transfer, i.e., transfer knowledge from language to multimodal, and from multimodal to language. In addition, we introduce a dataset of Raven IQ test, which diagnoses the nonverbal reasoning capability of MLLMs.