Recent comments in /f/MachineLearning
Balance- t1_j8copxz wrote
Reply to comment by __lawless in [R] Multimodal Chain-of-Thought Reasoning in Language Models - Amazon Web Services Zhuosheng Zhang et al - Outperforms GPT-3.5 by 16% (75%->91%) and surpasses human performance on ScienceQA while having less than 1B params! by Singularian2501
Damn, imagine what happens when you throw a A100 or H100 datacenter against it for a few months
leepenkman t1_j8co3gr wrote
Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
Also checkout https://text-generator.io its a multi modal model so visits any input links, downloads web pages and images are analyzed with NNs to make better text.
Also does speech to text/text to speech so can talk
As many have said lots of these things will likely/hopefully come together into something big, needs a few things like the when to train new tools/model zoo thing, but internally Text Generator is based on multiple models too and has some internal decision making for which model is best on every request (so you dont need to pick a code/text model it does it automatically) which is similar but it's not training new nets.
logsinh t1_j8cnqzr wrote
Reply to comment by No_Network_3714 in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
Pls upload it somewhere, preferably, wav format. I will do it when I have time.
[deleted] t1_j8cnf3r wrote
Reply to comment by belacscole in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
[deleted]
drcopus t1_j8cn7av wrote
Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
It would be interesting if it learned which API to use from a description of the API so as to allow it to generalise to new ones!
askingforhelp1111 t1_j8cmbr6 wrote
Reply to comment by machineko in [D] Speed up HuggingFace Inference Pipeline by [deleted]
Much thanks for the reply, would love to read your resources on compression and inference.
I'm keen on cutting down costs. Previously ran on GPU via AWS EC2 instance but gotta tighten the company's belt this year and my manager suggested running on CPU. Love to hear your suggestions too (if any).
hwyly t1_j8cm1uv wrote
Reply to comment by TLfanbasit in [D] What ML dev tools do you wish you'd discovered earlier? by TikkunCreation
Same
Taenk t1_j8ckvh2 wrote
Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
Now what if the tool the LLM uses is the training API for itself …
franztesting t1_j8cjfgm wrote
Reply to [D] Quality of posts in this sub going down by MurlocXYZ
It certainly has. I hope the moderators will fix it otherwise the community will become as annoying and unusable as many other technology-related subreddits like /r/datascience or /r/python.
pommedeterresautee OP t1_j8ciq4d wrote
Reply to comment by master3243 in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee
As written at top of the post, unfortunately, the way openAI designed Whisper makes it non compliant with PyTorch 2.0
People at OpenAI said they will rework the package when PyTorch 2.0 is released. Then we will be able to optimize it.
[deleted] t1_j8cimz2 wrote
Reply to [D] Quality of posts in this sub going down by MurlocXYZ
[deleted]
cajmorgans t1_j8chwh9 wrote
Reply to comment by cantfindaname2take in [D] Can Google sue OpenAI for using the Transformer in their products? by t0t0t4t4
Do you know the ”swipe to write” feature that exists on iPhone and Android, where you can keep your finger down and “draw” the words?
There is some small company suing the big guys atm for this “feature” (imo I think a fraction actually uses it). When I heard it, I lost it as, how can you patent such a thing? I mean yea, it might not be the most simple software to write but it just feels so weird to be able to patent such a (useless) technique
TLfanbasit t1_j8chmsj wrote
Commenting for later usage
Varpie t1_j8cftrx wrote
Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
I'm surprised this hasn't been done before. This paper mostly cites works from the last 2-3 years, but surely, something similar was done previously (maybe not using the same kind of model)? In fact, isn't it pretty close to what search engines do to provide instant results when given an equation or an address for instance? Does anyone know of such work?
tysam_and_co t1_j8cf8al wrote
Reply to comment by piman01 in [D] Quality of posts in this sub going down by MurlocXYZ
I...I...this is the first time I've heard this. Machine learning is often used as the hype-shelter word for "AI", because it triggers very few people (in the hype sense -- or at least it used to).
I'm not quite sure what to say, this is very confusing to me.
tysam_and_co t1_j8cf1o9 wrote
Reply to comment by ArnoF7 in [D] Quality of posts in this sub going down by MurlocXYZ
That is a really good point.
Though, minor contention, it seems like most of the comments in the post are pretty well-informed. I see the main difference is batchnorm before or after the activation, which oddly enough years-later seems to be better in the form of being before the activation due to the efficiency increases offered by fusing.
I'm surprised they were so on the mark even 6 years ago about being skeptical of this internal covariate shift business. I guess keeping the statistics centered and such is helpful but as we've seen since then, batchnorm seems to do so much more than just that (and is a frustratingly utilitarian, if limiting tool, in my experience, unfortunately).
robotix_dev t1_j8cekxc wrote
Reply to comment by belacscole in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
I’ve long thought this is the next stepping stone in the path the path to AGI. The next big step IMO is dynamic, online model augmentation to enable learning new concepts.
Both of those combined seem like a basic approximation of what goes on in our brain.
JurgenSchmidthuber t1_j8ce8bs wrote
Reply to comment by sunbunnyprime in [D] Critique of statistics research from machine learning perspectives (and vice versa)? by fromnighttilldawn
Lol
tetelestia_ t1_j8cde0k wrote
Reply to comment by sloganking in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
And if we can extend this to creating synthetic training data with a set of known APIs, this could be a big step forward to indexing external information
Disastrous_Elk_6375 t1_j8cd4x4 wrote
Reply to comment by radi-cho in [R] [P] OpenAssistant is a fully open-source chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. by radi-cho
I think it will depend on how small the LLMs that it uses are. If they can be run on consumer GPUs, then it will probably take off. If you need to rent 8xGPU servers just for inference, probably not.
Stablediffusion took off because in the first two weeks you could run it on 4GB VRAM GPUs. Then when "finetuning" aka dreambooth came along, it went from 24 to 16 to 8 GB in a matter of weeks. Same effect there.
sloganking t1_j8cculc wrote
Reply to comment by TheRealMichaelScoot in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
It's not just calling APIs. This model is independently teaching itself how to use new APIs and when to use them. The process is pretty much the same for any API, and doesn't require much extra effort by the programmer to add a new one.
This paper also states it is one of the first to have models learn to use APIs in an unsupervised way, meaning they teach themselves instead of relying on a ton of human annotated data.
leondz t1_j8cc933 wrote
Reply to comment by berryaroberry in [D] Quality of posts in this sub going down by MurlocXYZ
As an academic, the non-academic nature of the sub has always been one of its great advantages. I get enough academic research in the day job
rafgro t1_j8cc6ne wrote
Reply to [D] Quality of posts in this sub going down by MurlocXYZ
Agreed. The quality of discussions under posts is also pretty bad.
IMO it's the result of outdated rules and lax moderation. On the rules, there's definitely a need to address low-effort chatgpt posts and comments. Some of them are straight scam posts! On the moderation, it's not about quality but about the quantity, realistically this sub has just a few moderators (because some/most of these 9 lads are very busy engineers), with no new moderators added in the last two years, while it has seen enormous huge growth in members.
piman01 t1_j8cbt1b wrote
Reply to [R] DIGIFACE-1M — synthetic dataset with one million images for face recognition by t0ns0fph0t0ns
Wow nice. Surprisingly good performance
Khal_Doggo t1_j8cpwdk wrote
Reply to [D] Simple Questions Thread by AutoModerator
I have a matrix of data I want to run NMF on. The range of values is from -13.1 to 13.4. What's the best way to prep this data for NMF? I've seen people just take all the negative values and make them 0 but that seems to me like it massively cripples the variance in the data. Would it make sense to just add the absolute minimum to each value in the matrix so that it ranges from 0 to 26 instead? Or rescale the data from 0 to 1?