Recent comments in /f/MachineLearning
maxToTheJ t1_j6vqzvb wrote
Reply to comment by mongoosefist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
>Is this really that surprising?
It should be to all the people who claim these models are solely transformative in all the threads about the court cases related to generative model.
minhrongcon2000 t1_j6vox3u wrote
Maybe a resource-hungry industry that occupies 85% of the world's energy
LetterRip t1_j6vo0zz wrote
Reply to comment by pm_me_your_pay_slips in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
> The model capacity is not spent on learning specific images
I'm completely aware of this. It doesn't change the fact that the average information retained per image is 2 bits. (2GB of parameters/total images learned on in dataset).
> As an extreme example, imagine you ask 175 million humans to draw a random number between 0 and 9 on a piece of paper. you then collect all the images into a dataset of 256x256 images. Would you still argue that the SD model capacity is not enough to fit that hypothetical digits dataset because it can only learn 2 bits per image?
I didn't say it learned 2 bits of pixel data. It learned 2 bits of information. The information is in a higher dimensional space, so it is much more informative then 2 bits of pixel space data, but it is still an extremely small amount of information.
Given that it often takes about 1000 repetitions of an image to approximately memorize the key attributes. We can infer it takes about 2**10 bits on average to memorize an image. So on average it learns about 1/1000 of the available image data per time it sees an image, or about 1/2 kB equivalent of compressed image data.
danielfm123 t1_j6vmxlu wrote
Artists will be requesting their copyright...
[deleted] t1_j6vhw51 wrote
Reply to comment by lunarNex in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
You must be new here from a gaming subreddit or something where people talk like this, and not actually in a research field.
ChatGPT is the only free, self hosted product they have exposed people to. This is actually the norm for OpenAI and you would be dying on a stale hill.
Other than that their inference code is open. You can run a local version of GPT with your own code and a locally existing model right now (if you know what you are doing, minor caveat)
Same for their Whisper code. Doesn’t get more open than that. The compute required to train a multi billion parameter model isn’t something you could do anyways.
Lastly “open” doesn’t just mean free of cost. It means intellectually transparent about the code (this is always what it means). There’s no reason to confuse the two. It costs 100k per day to run these models so I’m not sure what leads you to think that risk should be part of an intellectually open philosophy when you can just deploy GPT yourself if you’re so inclined.
Welcome to the sub.
pm_me_your_pay_slips OP t1_j6vgxpe wrote
Reply to comment by LetterRip in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
>on average it can learn 2 bits of unique information per image.
The model capacity is not spent on learning specific images, but on learning the mapping from noise to latent vectors corresponding to natural images. Human-made or human-captured images have common features shared across images, and that's what matters for learning the mapping.
As an extreme example, imagine you ask 175 million humans to draw a random number between 0 and 9 on a piece of paper. you then collect all the images into a dataset of 256x256 images. Would you still argue that the SD model capacity is not enough to fit that hypothetical digits dataset because it can only learn 2 bits per image?
lunarNex t1_j6vgjgy wrote
So not "open" AI anymore? That greed sets in fast.
I_Am_The_Sevit t1_j6vf5jg wrote
Reply to [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
Theres a Carykh video about something similar. The GitHub is linked in the description. https://youtu.be/DQ8orIurGxw
Mefaso t1_j6vdzji wrote
Reply to comment by Ne_Nel in [D] Why is stable diffusion much smaller than predecessors? by dahdarknite
Exactly, the entire point of Latent Diffusion Models was to make them smaller and faster
doctorjuice t1_j6vdwpb wrote
Reply to [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
There are ways to robustly remove all silent spaces and breaths but filler words is less robust. Would you still find that useful?
[deleted] t1_j6vdena wrote
[removed]
bushrod t1_j6vaal9 wrote
Reply to comment by mongoosefist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
What theory are you referring to when you say "theoretically"?
Ne_Nel t1_j6va0z6 wrote
Pixel vs Latent.
neanderthal_math t1_j6v9qoj wrote
OK, I’ll bite. : )
The vast majority of coding data ingestion, mooel discovery, and training that we currently do will all go away.
The job will become much more interesting, because researchers will try and understand why certain architectures/training regimes are unable to perform certain tasks. Also, I think the architectures for some fundamental tasks like computer vision, and audio are going to become modular. This whole training models end to end is going to be verboten.
quichemiata t1_j6v6e42 wrote
That last step might as well be
> generate infinite copies until one matches
LetterRip t1_j6v57y5 wrote
Mostly the language model - Imagen is using T5-XXL (the 4.6 billion parameters), Dall-E 2 uses GPT-3 (presumably 2.7B not the much larger variants used for ChatGPT). SD is just using CLIP without anything else. The more sophisticated the language model, the better the image generation can understand what you want. CLIP is close to using bag of words.
txhwind t1_j6v41wn wrote
Reply to [D] Any open source model, or application to remove no speech parts of a video? by CeFurkan
Try speech recognition model with timeline alignment output, then cut parts not aligned to words or aligned to filler words.
-xXpurplypunkXx- t1_j6v3fab wrote
Reply to comment by LetterRip in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Thanks for context. Maybe a little too much woo in my post.
For me, the fidelity to decide which images are completely stored is either an interesting artifact or an interesting piece of the model.
But regardless it is very un-intuitive to me with respect to how diffusion models would train and behave, due to both mutation of training images as well as foreseeable lack of space to encode that much info into a single model state. Admittedly don't have much working experience with these sort of models.
starstruckmon t1_j6v3etd wrote
Reply to comment by pm_me_your_pay_slips in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
From paper
>Our attack extracts images from Stable Diffu- sion most often when they have been duplicated at least k = 100 times
for the 100 number. The 10 is supposed to be the number of epochs, but I don't think it was trained on that many epochs. More like 5 or so ( you can look at the model card ; it's not easy to give an exact number ).
starstruckmon t1_j6v1qv0 wrote
Reply to comment by IDoCodingStuffs in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
They also manually annotated the top 1000 results, adding only 13 more images. The number you're replying to counted those.
Wiskkey t1_j6v0hqg wrote
Reply to comment by HateRedditCantQuitit in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
The fact that Stable Diffusion v1.x models memorize images is noted in the various v1.x model cards. For example, the following text is from the Stable Diffusion v1.5 model card:
>No additional measures were used to deduplicate the dataset. As a result, we observe some degree of memorization for images that are duplicated in the training data. The training data can be searched at https://rom1504.github.io/clip-retrieval/ to possibly assist in the detection of memorized images.
[deleted] t1_j6v06ei wrote
znihilist t1_j6uz705 wrote
Reply to comment by znihilist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
If you have a set of pair numbers: (1,1)..(2,3.95)..(3,9.05)..(4, 16.001)..etc These can be fitted with x^2, but x^2 does not contain anywhere the four pairs of numbers, but can recreate them to a certain degree of precision if you try to guess the x values.
Is f(x) = x^2 memorizing the inputs or just able to recreate them because they are in the possible outcome space?
[deleted] t1_j6uyzlk wrote
Reply to comment by DigThatData in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
[deleted]
[deleted] t1_j6vtj8t wrote
Reply to comment by [deleted] in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
[removed]