Recent comments in /f/MachineLearning
znihilist t1_j6uy7z0 wrote
Reply to comment by HateRedditCantQuitit in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
I think people are using words and disagreeing on conclusions without agreeing first on what is exactly meant by those words.
I am not sure that everyone is using the word "memorize" the same. I think those who use it in the context of defense, are saying that those images are no where to be found in the model itself. It is just a function that takes words as an input and outputs a picture. Is the model memorizing the training data if it can recreate it? I don't know, but my initial intuition tells me there is a difference between memorizing and pattern recreation, even if they aren't easily distinguishable in this particular scenario.
DigThatData t1_j6uxsdj wrote
Reply to comment by IDoCodingStuffs in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
> full image comparison.
that's not actually the metric they used precisely for the reasons you suggest: they found it to be too conservative. Specifically, they found they were getting too-high scores from images that had large black backgrounds. they chunked up each image into regions and used the score for the most dissimilar (but corresponding) regions to represent the whole image.
Further, I think they demonstrated their methodology probably wasn't too conservative when they were able to use the same approach to get a 2.3% (concretely: 23 memorized images in 1000 tested prompts) hit rate from Imagen. This hit rate is very likely a big overestimate of Imagen's propensity to memorize, but it demonstrates that the author's L2 metric has the ability to do its job.
Also, it's not like the authors didn't look at the images. They did, and found a handful more hits, which that 0.03% is already accounting for.
[deleted] t1_j6ux2u5 wrote
Reply to [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
[removed]
pm_me_your_pay_slips OP t1_j6uw5xs wrote
Reply to comment by LetterRip in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
where do you get that number?
Ok_Dependent1131 t1_j6uv7o5 wrote
Reply to [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
The company that makes snagit has software that does it... but not free...
bubudumbdumb t1_j6uux46 wrote
I would expect a lot of work around regulation. Like probably formal qualifications requirements will emerge for who can tell a legal jury how to interpret the behavior of ML models and the practices of who develops them. In other words there will be DL lawyers. Lawyers might get themselves automated out of courtrooms: if that's the case humans will be involved only in DL trials and the LLMs will settle everything else from tax fraud to parking tickets. Do you want to appeal the verdict of the LLMs? You need a DL lawyer.
Coding might be automated but it's really a question of how much good code to learn from is out there.
Books, movies, music, VR experiences will be prompted. Maybe even psychoactive substances could be generated and synthesized from prompts (if a DL lawyer sign off the ML for it). Writing values will change: if words are cheap and attention is scarce writing in short form is valuable.
The real question is who we are going to be to each others and even more importantly to kids up to age 6.
Agreeable_Dog6536 t1_j6uuq7r wrote
Reply to comment by SnooWords6686 in [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
He's asking the opposite - remove the bits with no speech.
I used to do more or less this same thing manually, years ago, for a corporate vlog in which people drove around all day fixing pipe leaks and occasionally commented on what they'd done - they wanted the clips where they commented, edited together.
I basically just looked at the audio waveform and figured out where I should probably cut, and then listened to it to narrow it down.
If someone hasn't already trained an AI for this, they should.
DigThatData t1_j6uu82y wrote
Reply to comment by ItsJustMeJerk in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
This is true, and also generalization and memorization are not mutually exclusive.
EDIT: I can't think of a better way to articulate this, but the image that keeps coming to my mind is a model memorizing the full training data and simulating a nearest neighbors estimate.
LetterRip t1_j6ut9kc wrote
Reply to comment by -xXpurplypunkXx- in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
> I can't tell which is crazier: that it memorizes images at all, or that memorization is such a small fraction of its overall outputs.
It sees most images between 1 (LAION 2B) and 10 times (aesthetic dataset is multiple epochs). It simply can't learn enough from an image to learn that much about it with that few exposures. If you've tried fine tuning a model on a handful of images it takes a huge numbers of exposures to memorize an image.
Also the model capacity is small enough that on average it can learn 2 bits of unique information per image.
[deleted] t1_j6usxnp wrote
[deleted]
LetterRip t1_j6uskhi wrote
That only works for images for which the model has seen the image a 1000 times or so (ie 100 copies of the image seen 10 times each). It requires massive overtraining to memorize an image.
Nhabls t1_j6urk1b wrote
Reply to comment by ItsJustMeJerk in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
This isn't really relevant. Newer, larger LLMs generalize better than smaller ones yet they also regurgitate training data better. it's not exclusive
Laphing_Drunk t1_j6ur796 wrote
Reply to comment by mongoosefist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Yeah, model inversion attacks aren't new. It's reasonable to assume that large models, especially generative models that make no effort to be resilient, are susceptible to this.
[deleted] t1_j6uqyrb wrote
Reply to [D] Apple's ane-transformers - experiences? by alkibijad
[removed]
ItsJustMeJerk t1_j6uqkv6 wrote
Reply to comment by Nhabls in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Actually, data has shown after a certain size larger models end up generalizing more than smaller ones. It's called double descent.
HateRedditCantQuitit t1_j6upt7k wrote
Reply to comment by mongoosefist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
It's funny that the top comment right now is that it shouldn't be surprising, because whenever the legal argument comes in, the most common defense is that these models categorically don't memorize.
gdahl t1_j6upct4 wrote
Reply to comment by [deleted] in [D] What does a DL role look like in ten years? by PassingTumbleweed
I would say the turning point was when we published the first successful large vocabulary results with deep acoustic models in April 2011, based on work conducted over the summer of 2010. When we published the paper you mention, it was to recognize that these techniques were the new standard in top speech recognition groups.
Regardless, there were deep learning roles in tech companies in 2012, just not very many of them compared to today.
Nhabls t1_j6uokwb wrote
Reply to comment by DigThatData in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
It's incredibly easy to make giant LLMs regurgitate training data near verbatim. There's very little reason to believe that this won't just start happening more frequently with image models as they grow in scale as well.
Personally i just hope it brings a reality check in the courts to these companies that think they can just monetize generative models trained on copyrighted material without permission
SnooWords6686 t1_j6um1ru wrote
Reply to [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
Good . hope you can solve it 🙂
CeFurkan OP t1_j6ulm3r wrote
Reply to comment by SnooWords6686 in [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
no just remove filler words. such as um uh etc . also the parts where i take breath
SnooWords6686 t1_j6ulih8 wrote
Reply to [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
Why do you want a video without speech ?
-xXpurplypunkXx- t1_j6ulhcj wrote
Reply to comment by koolaidman123 in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
I can't tell which is crazier: that it memorizes images at all, or that memorization is such a small fraction of its overall outputs.
Very interesting. I'm wondering how sensitive this methodology is to finding instances of memorization though; maybe this is the tip of the iceberg.
IDoCodingStuffs t1_j6uk67h wrote
Reply to comment by koolaidman123 in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
~~In this case the paper seems to use a very conservative threshold to avoid false positives -- l2 distance < 0.1, full image comparison. Which makes sense for their purposes, since they are trying to establish the concept rather than investigating its prevalence.
It is definitely a larger number than 0.03% when you pick a threshold to optimize the F score rather than just precision. How much larger? That's a bunch of follow-up studies.~~
[deleted] t1_j6uj4vr wrote
Reply to comment by gdahl in [D] What does a DL role look like in ten years? by PassingTumbleweed
[deleted]
ItsJustMeJerk t1_j6uymag wrote
Reply to comment by Nhabls in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
You're right, it's not exclusive. But I believe that while the the absolute amount of data memorized might go up with scale, it occupies a smaller fraction of the output because it's only used where verbatim recitation is necessary instead of as a crutch (I could be wrong though). Anyway, I don't think that crippling the model by removing all copyrighted data from the dataset is a good long-term solution. You don't keep students from plagiarizing by preventing them from looking at a source related to what they're writing.