Recent comments in /f/MachineLearning

currentscurrents t1_j9ld7we wrote

>it spits out something that it mined from GitHub.

Having used GitHub Copilot a bunch, it's doing a lot more than just mining snippets. It learns patterns and can use them to creatively solve new problems.

It does memorize short snippets in some cases (especially when a snippet is repeated many times in training data), but in the general case it comes up with new code to match your specifications.

>I set all of my github projects to private but I don't know if that helps.

Honestly, kinda selfish. We'll all benefit from these powerful new tools and I don't appreciate you trying to hamper them.

4

Wiskkey OP t1_j9l8lzi wrote

My take: It is newsworthy but not surprising that images generated by a text-to-image AI using a text prompt with no input image, with no human-led post-generation modification, would not be considered protected by copyright in the USA, per the legal experts quoted in various links in this post of mine.

1

currentscurrents t1_j9l627o wrote

>Literally no one was suggesting the author didn't have a valid copyright on the text or the composition.

The copyright office initially indicated that it was considering revoked the copyright registration on the entire work.

>AI-assisted works were never in play here. These images were AI-created.

They're still AI-assisted, since the human directed the AI through the prompt process.

It's much like pointing a camera. You don't even need specific artistic intent to get copyright on camera images, your random snaps half-covered with your thumb are copyrighted too. As their lawyer points out, only a modicum of creativity is required for copyright.

Ultimately, the copyright office isn't the final authority on copyright; the courts are. One way or another, we will see a bunch of new case law being written in the next few years.

13

currentscurrents t1_j9l30jy wrote

It's definitely a derivative work, but whether it violates copyright is complicated and depends what you're doing with it.

Similarly, a scaled-down thumbnail of an image is also a derivative work. You couldn't print and sell thumbnail-sized reproductions of copyrighted artworks. But many uses of thumbnails, for example in search engine results, do not violate copyright.

2

Naynoona111 t1_j9l1vf1 wrote

I am looking for a dataset of Compilable source codes, labeled with their vulnerability CVE.

used for training a static analysis framework, it must be compilable though, meaning the source code should be complete.

Currently we are using Juliet test suites as a dataset, but it is purely synthetic and not human-generated.

1

CapaneusPrime t1_j9l1om8 wrote

As it should be.

From the lawyer's blog post,

> We received the decision today relative to Kristina Kashtanova's case about the comic book Zarya of the Dawn. Kris will keep the copyright registration, but it will be limited to the text and the whole work as a compilation. > > In one sense this is a success, in that the registration is still valid and active.

How is that a "success?" Literally no one was suggesting the author didn't have a valid copyright on the text or the composition.

> However, it is the most limited a copyright registration can be and it doesn't resolve the core questions about copyright in AI-assisted works.

Ummmm.... AI-assisted works were never in play here. These images were AI-created. Per the author's own depiction of the process.

> Those works may be copyrightable, but the USCO did not find them so in this case.

AI-assisted works may be copyrightable, yes, but that's not what you were representing.

There are many artists who are doing amazing work using Generative AI as a tool. This wasn't that.

The biggest problem is one of terminology, we don't have good terms to distinguish between someone who feeds a prompt into a Generative AI and and calls it a day and someone who uses a Generative AI as just another tool in their toolkit, so they all get lumped in together. This lawyer muddying the waters by suggesting Kashtanova's works were AI-assisted does no one any good.

6

holyplasmate t1_j9l1fnj wrote

What is the best way to clone a voice in real time TTS? Elevenlabs? I've been trying to use tortoise tts fast branch and tried a few others and they aren't producing the quality I want or at the speed I need. I haven't tried elevenlabs api, is it fast?

1

1973DodgeChallenger t1_j9kz2wb wrote

It's an interesting problem... I ask ChatGPT for code, it spits out something that it mined from GitHub. Microsoft didn't just by github to spend money. They knew it was one of the best, if not the best, source for AI code mining. So Yoink! I set all of my github projects to private but I don't know if that helps. The user agreement may be structured to "anonymously mine" code private or otherwise.

So ya...if you store your code on GitHub...I'd bet a dollar Microsoft/OpenAi will be mining it and eventually burp it out in Chat GPT.

0

Adorable-Breakfast t1_j9kv52b wrote

Not a direct answer to your question, but I interviewed a while back with this company called Fathom Radiant that's developing an optical computing system for AI training. It's further off than the next few years, but their ultimate goal is to use that technology to establish a computing center that outcompetes other options for training very large models, then leverage that position to support AI safety by allocating resources to groups that meet certain safety standards. It's an interesting approach and seems like a cool technology.

2