Recent comments in /f/MachineLearning
phamtuanminhmeo t1_jb9owvy wrote
Did you put the prompt inside the text "The answer for the question "<prompt>" would be:" and make it the input? I think it would limit a lot of the generated text because it would give it a fixed context. Can we please try without it?
friend_of_kalman t1_jb9nfzj wrote
Reply to What is the future of AI in medicine? [D] by adityyya13
I'm working with a small group at a big local university hospital. We have a hug dataset of patience data from the Neurological ICU and are currently applying AI for risk detection. For Example we are doing time series forecasting on all sorts of medical indicators(vital data, blood gas analysis etc.) Adoption is slow though and most hospitals don't have the proper infrastructure in place for this.
enjakuro t1_jb9l86l wrote
Reply to comment by graphicteadatasci in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
Ah it was the rare text thing I believe. Now that I'm more awake I also realized that they copied the source to target, meaning the same language as source and target while keeping the rest bilingual. If I can recall correctly, you can have up to 50% copied data which makes the training set much bigger. I guess if the images aren't exactly the same this would have the same effect. Basically training a language model.
abnormal_human t1_jb9kyzr wrote
Reply to comment by ReginaldIII in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Actually, it doesn't. GPLv3 just requires that if OP distributes a binary to someone, the source used to produce that binary is also made available. With server side code the binary isn't being distributed, so no obligation to distribute source.
[deleted] t1_jb9k17s wrote
Reply to To RL or Not to RL? [D] by vidul7498
[deleted]
alterframe t1_jb9i70h wrote
Reply to comment by cztomsik in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho
Interesting. With many probabilistic approaches, where we have some intermediate variables in a graph like X -> Z -> Y, we need to introduce sampling on Z to prevent mode collapse. Then we also decay the entropy of this sampler with temperature.
This is quite similar to this early dropout idea, because there we also have some sampling process that effectively works only at the beginning of the training. However, in those other scenarios, we rather attribute it to something like exploration vs. exploitation.
If we had an agent that almost immediately assigns very high probability to a bad initial actions, then it may be never able find a proper solution. On a loss landscape in worst case scenario we can also end up in a local minimum very early on, so we use higher lr at the beginning to make it less likely.
Maybe in general random sampling could be safer than using higher lr? High lr can still fail for some models. If, by parallel, we do it just to boost early exploration, then maybe randomness could be a good alternative. That would kind of counter all claims based on analysis of convex functions...
ReginaldIII t1_jb9goco wrote
Reply to comment by ortegaalfredo in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Link to your code? It needs to be GPLv3 to be compliant with LLama's licensing.
How are you finding the quality of the output? I've had a little play around with the model but wasn't overly impressed. That said, a nice big parameter set like this is a nice test bed for looking at things like pruning methods.
radi-cho OP t1_jb9gmgi wrote
Reply to comment by No-Intern2507 in [P] diffground - A simplistic Android UI to access ControlNet and instruct-pix2pix. by radi-cho
Thanks for the suggestion:)
alushamir t1_jb9fdgy wrote
Reply to comment by TikiTDO in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
I agree that mislabels are also an issue.
You can see some examples in this video:
https://www.youtube.com/watch?v=s6qamoFzyis&t=7s
We have used fastdup to analyse Laion-400M.
Jurph t1_jb9e3cw wrote
Reply to comment by zaptrem in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
Yes, but if you read to the end of the book, you find out that actually, the Doctor is the real monster.
bo_peng OP t1_jb9bdw3 wrote
Reply to comment by I_will_delete_myself in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng
Directly from RWKV-LM Github:
RWKV is a RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
chris_myzel t1_jb9bbqz wrote
Reply to comment by etesian_dusk in [N] tinygrad 0.5.0 released by Balance-
pytorch installations go typically into the gigabytes, while tinygrad keeps it core at <1000 lines.
No-Intern2507 t1_jb9an0c wrote
Reply to comment by radi-cho in [P] diffground - A simplistic Android UI to access ControlNet and instruct-pix2pix. by radi-cho
I recommend using dreamlike photoreal, its 768 res and much more quality than regular 1.5
graphicteadatasci t1_jb9afw5 wrote
Reply to comment by enjakuro in [R] We found nearly half a billion duplicated images on LAION-2B-en. by von-hust
Really? Because copying all your data once is the same as running your dataset twice per epoch instead of once. Doesn't sound right. Unless your test data is drawn from the same dataset and duplication happens before splitting in which case you would certainly expect metric improvements. Or was this a case of duplicating rare text in which case it is the opposite of having duplicate images in LAION.
cztomsik t1_jb995yy wrote
Reply to comment by alterframe in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho
And maybe also related to lr decay?
Also interesting thing is random sampling - at least at the start it seems to help when training causal LMs.
polawiaczperel t1_jb98qce wrote
Reply to comment by wywywywy in [R] Created a Discord server with LLaMA 13B by ortegaalfredo
Even with one rtx 3090 https://github.com/oobabooga/text-generation-webui/issues/147#issuecomment-1456626387
wywywywy t1_jb97nl6 wrote
Nice one.
With dual 3090s, I think 30b should be possible in 8bit?
BogBodySalad t1_jb96vut wrote
Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? by doctorjuice
I'm in the same boat as you (MLE, lots of work exp incl. in a US AI startup, just started freelancing journey). Here are some more client acquisition/marketing ideas:
- Have a popular open-source project (ML related) on github (takes time)
- Content marketing: Write articles/blog posts and promote those on LN
- Give talks on conferences/meetup
- Find established freelancers and ask to be a subcontractor
- Chase prospects on LN (e.g. identify a niche like "YC startup founders" follow them and engage in a conservation.
everyone: pm me if you want to connect on LN (or if you have a project for me 🤗)
SrPeixinho t1_jb96nyt wrote
Can I donate or help somehow to make it 65B?
etesian_dusk t1_jb94rak wrote
Reply to comment by nucLeaRStarcraft in [N] tinygrad 0.5.0 released by Balance-
Ok, that doesn't sound like much. I don't understand why I should abandon standard and verified tools for this.
On top of that the whole "George Hotz Twitter internship" thing was just embarassing. I trust him to jailbreak playstations, but that's the end of it.
nucLeaRStarcraft t1_jb9289f wrote
Reply to comment by etesian_dusk in [N] tinygrad 0.5.0 released by Balance-
they claim it's fast on apple m1 and some embedded arm devices, but i have no idea how easy it is to use ootb.
maizeq t1_jb90rkr wrote
Reply to comment by Mrkvitko in [D] Best way to run LLMs in the cloud? by QTQRQD
Underwhelming how?
InsidiousApe t1_jb903nb wrote
Reply to [D] I’m a Machine Learning Engineer for FAANG companies. What are some places looking for freelance / contract work for ML? by doctorjuice
I'm looking someone....
lifesthateasy OP t1_jb8zyyu wrote
Reply to comment by Disastrous_Elk_6375 in [D] Neat project that would "fit" onto a 4090? by lifesthateasy
Ooh great I'll look into those! Thank you!
backhanderer t1_jb9ph2v wrote
Reply to [R] Analysis of 200+ ML competitions in 2022 by hcarlens
Thanks for this. I knew PyTorch was dominant but didn’t realise it was this dominant for deep learning!