Recent comments in /f/MachineLearning
adventuringraw t1_j9ll3fp wrote
Reply to comment by relevantmeemayhere in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
I can see how the quote could be made slightly more accurate. In particular, tabular data in general is still better tackled with something like XGBoost instead of deep learning, so deep learning certainly hasn't turned everything into a nail for one universal hammer yet.
VirtualHat t1_j9lkto4 wrote
Reply to comment by GraciousReformer in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Oh wow, super weird to be downvoted just for asking for a reference. r/MachineLearning isn't what it used to be I guess, sorry about that.
GraciousReformer OP t1_j9ljjqu wrote
Reply to comment by activatedgeek in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
>inductive biases
Then why does DL have inductive biases and others do not?
1973DodgeChallenger t1_j9lgjq4 wrote
Reply to comment by currentscurrents in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
Just for example, you work at a company that has spent millions investing in a proprietary software product. You're saying everyone should have access to the source code, through Chat GPT or otherwise?
Can I have all of your and your companies source code please. I'll send you my email address.
ekbravo t1_j9ldq4z wrote
Reply to [P] MIT Introduction to Data-Centric AI by anishathalye
Thank you so much! I’ve been struggling with class imbalance and outliers in my project. Will dive right in.
currentscurrents t1_j9ld7we wrote
Reply to comment by 1973DodgeChallenger in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
>it spits out something that it mined from GitHub.
Having used GitHub Copilot a bunch, it's doing a lot more than just mining snippets. It learns patterns and can use them to creatively solve new problems.
It does memorize short snippets in some cases (especially when a snippet is repeated many times in training data), but in the general case it comes up with new code to match your specifications.
>I set all of my github projects to private but I don't know if that helps.
Honestly, kinda selfish. We'll all benefit from these powerful new tools and I don't appreciate you trying to hamper them.
Wiskkey OP t1_j9l8lzi wrote
Reply to [N] U.S. Copyright Office decides that Kris Kashtanova's AI-involved graphic novel will remain copyright registered, but the copyright protection will be limited to the text and the whole work as a compilation by Wiskkey
My take: It is newsworthy but not surprising that images generated by a text-to-image AI using a text prompt with no input image, with no human-led post-generation modification, would not be considered protected by copyright in the USA, per the legal experts quoted in various links in this post of mine.
kvutxdy t1_j9l7on1 wrote
Universal approximation theorem only states that DNN can approximate Lipschitz functions, but not necessarily all functions.
currentscurrents t1_j9l627o wrote
Reply to comment by CapaneusPrime in [N] U.S. Copyright Office decides that Kris Kashtanova's AI-involved graphic novel will remain copyright registered, but the copyright protection will be limited to the text and the whole work as a compilation by Wiskkey
>Literally no one was suggesting the author didn't have a valid copyright on the text or the composition.
The copyright office initially indicated that it was considering revoked the copyright registration on the entire work.
>AI-assisted works were never in play here. These images were AI-created.
They're still AI-assisted, since the human directed the AI through the prompt process.
It's much like pointing a camera. You don't even need specific artistic intent to get copyright on camera images, your random snaps half-covered with your thumb are copyrighted too. As their lawyer points out, only a modicum of creativity is required for copyright.
Ultimately, the copyright office isn't the final authority on copyright; the courts are. One way or another, we will see a bunch of new case law being written in the next few years.
Chemputer t1_j9l3gz8 wrote
Reply to comment by bloodmummy in [D] Please stop by [deleted]
Please tell me you explained the Dunning-Kruger effect to him.
currentscurrents t1_j9l30jy wrote
Reply to comment by iidealized in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
It's definitely a derivative work, but whether it violates copyright is complicated and depends what you're doing with it.
Similarly, a scaled-down thumbnail of an image is also a derivative work. You couldn't print and sell thumbnail-sized reproductions of copyrighted artworks. But many uses of thumbnails, for example in search engine results, do not violate copyright.
Naynoona111 t1_j9l1vf1 wrote
Reply to [D] Simple Questions Thread by AutoModerator
I am looking for a dataset of Compilable source codes, labeled with their vulnerability CVE.
used for training a static analysis framework, it must be compilable though, meaning the source code should be complete.
Currently we are using Juliet test suites as a dataset, but it is purely synthetic and not human-generated.
CapaneusPrime t1_j9l1om8 wrote
Reply to [N] U.S. Copyright Office decides that Kris Kashtanova's AI-involved graphic novel will remain copyright registered, but the copyright protection will be limited to the text and the whole work as a compilation by Wiskkey
As it should be.
From the lawyer's blog post,
> We received the decision today relative to Kristina Kashtanova's case about the comic book Zarya of the Dawn. Kris will keep the copyright registration, but it will be limited to the text and the whole work as a compilation. > > In one sense this is a success, in that the registration is still valid and active.
How is that a "success?" Literally no one was suggesting the author didn't have a valid copyright on the text or the composition.
> However, it is the most limited a copyright registration can be and it doesn't resolve the core questions about copyright in AI-assisted works.
Ummmm.... AI-assisted works were never in play here. These images were AI-created. Per the author's own depiction of the process.
> Those works may be copyrightable, but the USCO did not find them so in this case.
AI-assisted works may be copyrightable, yes, but that's not what you were representing.
There are many artists who are doing amazing work using Generative AI as a tool. This wasn't that.
The biggest problem is one of terminology, we don't have good terms to distinguish between someone who feeds a prompt into a Generative AI and and calls it a day and someone who uses a Generative AI as just another tool in their toolkit, so they all get lumped in together. This lawyer muddying the waters by suggesting Kashtanova's works were AI-assisted does no one any good.
holyplasmate t1_j9l1fnj wrote
Reply to [D] Simple Questions Thread by AutoModerator
What is the best way to clone a voice in real time TTS? Elevenlabs? I've been trying to use tortoise tts fast branch and tried a few others and they aren't producing the quality I want or at the speed I need. I haven't tried elevenlabs api, is it fast?
currentscurrents t1_j9l0g64 wrote
Reply to comment by hpstring in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
As long as it's capable of making art that can compete with human art, they're still not going to like it.
"Never argue with a man whose job depends on not being convinced. It is difficult to get a man to understand something, when his salary depends upon his not understanding it." - Upton Sinclair
vyasnikhil96 OP t1_j9kzwpq wrote
Reply to comment by iidealized in [R] Provable Copyright Protection for Generative Models by vyasnikhil96
Assuming you are asking from the perspective of copyright law, I am not sure. I think the notion of “remix”/sufficient transformation also depends on the context in which the new work is being used.
30299578815310 t1_j9kziil wrote
This is just not true. As others noted there are other algorithms which are universal approximators and run at scale. The key to the success of DNNs is unknown. A hypothesis is called the lottery ticket hypothesis.
​
1973DodgeChallenger t1_j9kz2wb wrote
It's an interesting problem... I ask ChatGPT for code, it spits out something that it mined from GitHub. Microsoft didn't just by github to spend money. They knew it was one of the best, if not the best, source for AI code mining. So Yoink! I set all of my github projects to private but I don't know if that helps. The user agreement may be structured to "anonymously mine" code private or otherwise.
So ya...if you store your code on GitHub...I'd bet a dollar Microsoft/OpenAi will be mining it and eventually burp it out in Chat GPT.
relevantmeemayhere t1_j9kygtx wrote
Reply to comment by Featureless_Bug in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Any problem where you want things like effect estimates lol. Or error estimates. Or models that generate joint distributions
So, literally a ton of them. Which industries don’t like things like that?
iidealized t1_j9kws3r wrote
Reply to [P] MIT Introduction to Data-Centric AI by anishathalye
Cool to see these topics being taught. Definitely agree these are important concepts that most ML classes skip for some reason
iidealized t1_j9kwe6h wrote
Are adversarial examples (eg minimally perturbed versions of images) considered violation of copyright? Or are they a sufficient “remix”?
Featureless_Bug t1_j9kvek5 wrote
Reply to comment by relevantmeemayhere in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Ok, name one large scale problem where GLMs are the best prediction algorithm possible.
Adorable-Breakfast t1_j9kv52b wrote
Reply to comment by throwaway2676 in [D] Simple Questions Thread by AutoModerator
Not a direct answer to your question, but I interviewed a while back with this company called Fathom Radiant that's developing an optical computing system for AI training. It's further off than the next few years, but their ultimate goal is to use that technology to establish a computing center that outcompetes other options for training very large models, then leverage that position to support AI safety by allocating resources to groups that meet certain safety standards. It's an interesting approach and seems like a cool technology.
VirtualHat t1_j9ll5i2 wrote
Reply to comment by relevantmeemayhere in [D] "Deep learning is the only thing that currently works at scale" by GraciousReformer
Yes, that's right. For many problems, a linear model is just what you want. I guess what I'm saying is that the dividing line between when a linear model is appropriate vs when you want a more expressive model is often related to how much data you have.