Recent comments in /f/MachineLearning
Much_Blacksmith_1857 OP t1_j6wbgus wrote
Reply to comment by Remco32 in [P] AI Poker/Machine Learning/Game-Theory by Much_Blacksmith_1857
True for 1v1 scenarios but solving multi-way situations are far more complex.
ProSmokerPlayer t1_j6wb70n wrote
Poker is solved already bud don't waste your time
SuddenlyBANANAS t1_j6waypu wrote
Reply to comment by Argamanthys in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
If diffusion models were a perfect bijection between the latent space and the space of possible images, that would make sense, but they're obviously not. If you could repeat this procedure and find exact duplicates of images which were not in the training data, you'd have a point.
Remco32 t1_j6waskx wrote
This has been done to death already.
DingusFamilyVacation t1_j6waih9 wrote
Reply to [P] An open source tool for repeatable PyTorch experiments by embedding your code in each model checkpoint by latefordinnerstudios
I'm excited to try this out. I'm doing most of the ML development on my team. I'll iterate on code development and retrain, multiple times over. Oftentimes, my team members will jump in and want to use a trained model to run some downstream analyses. If the library API has changed, or the model architecture has been tweaked, loading the state_dicts of earlier models becomes nearly impossible without checking out old commits. Even then, storing the results and associating them w commit numbers is super annoying.
Thanks for the tool!
londons_explorer t1_j6wa910 wrote
It's a much smaller model, but IMO, the results are much lower quality too.
However the fact you can run it on your PC means you can tweak all the settings and have many goes at getting better results, partially offsetting that.
SulszBachFramed t1_j6wa7ii wrote
Reply to comment by znihilist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
You can make the same argument about lossy compression. Am I really infringing on copyright if I record an episode of House, re-encode it and redistribute it? It's not the 'original' episode, but a lossy copy of it. What if I compress it in a zip file and distribute that? In that case, I am only sharing something that can imperfectly recreate the original. The zip file itself does not resemble a video at all.
WikiSummarizerBot t1_j6w9h7w wrote
Reply to comment by Argamanthys in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
>"The Library of Babel" (Spanish: La biblioteca de Babel) is a short story by Argentine author and librarian Jorge Luis Borges (1899–1986), conceiving of a universe in the form of a vast library containing all possible 410-page books of a certain format and character set. The story was originally published in Spanish in Borges' 1941 collection of stories El jardín de senderos que se bifurcan (The Garden of Forking Paths). That entire book was, in turn, included within his much-reprinted Ficciones (1944).
^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)
Argamanthys t1_j6w9gal wrote
Reply to comment by HateRedditCantQuitit in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
There is a short story called The Library of Babel about a near-infinite library that contains every possible permutation of a book with 1,312,000 characters. It is not hard to recreate that library in code. You can explore it if you want.
Contained within that library is a copy of every book ever written, freely available to read.
Is that book piracy? It's right there if you know where to look.
That's pretty much what's going on here. They searched the latent space for an image and found it. But that's because the latent space, like the Library of Babel is really big and contains not just that image but also near-infinite permutations of it.
sad_potato00 t1_j6w92uy wrote
Reply to [P] NER output label post processing by hasiemasie
so we had a similar problem, where buidling names were written in diffrent ways (some abbreviation, full name, full name + what type of it). something that worked for me was using sentence BERT and doing a cosine similarity. deciding a cut off value was easier than deciding how many cluster to use. sadly, manuall labeling and checking is still needed
Ulfgardleo t1_j6w8snb wrote
Reply to comment by GoofAckYoorsElf in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
"copyright warriors"
do you care about what is right, or what you like?
alkibijad OP t1_j6w7lo3 wrote
Reply to comment by TheDeviousPanda in [D] Apple's ane-transformers - experiences? by alkibijad
That was not the answer I was hoping for, but very helpful :)
Do you have any code/repo to share? I'm only able to find the DistilBERT implementation in apple's repo, would like to see some other examples?
[deleted] t1_j6w69n8 wrote
Reply to comment by [deleted] in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
[removed]
[deleted] t1_j6w68lq wrote
Reply to comment by [deleted] in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
[removed]
[deleted] t1_j6w67if wrote
Reply to comment by [deleted] in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
[removed]
E_Snap t1_j6w4skd wrote
Reply to comment by Monoranos in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
So are we just collectively pretending that the terms and conditions of websites don’t exist? You put something up on somebody else’s server, 99% of the time it’s no longer yours to claim ownership of anymore.
jimmymvp t1_j6w4ezb wrote
Reply to [D] Normalizing Flows in 2023? by wellfriedbeans
Any application where you need exact likelihoods, flows are king. Such is the case for example jf you're learning a sampling distribution for MCMC sampling, estimating normalizing constants (I believe in physics there are a lot of these problems) etc.
emotionalfool123 t1_j6w29e1 wrote
Reply to comment by minhrongcon2000 in [D] What does a DL role look like in ten years? by PassingTumbleweed
It will solve that problem by solving for nuclear fusion. Everybody will get energy as Oprah would say.
Ulfgardleo t1_j6vzpgz wrote
Reply to [D] Normalizing Flows in 2023? by wellfriedbeans
There is only very little research. They are a nice theoretical idea, but the concept is very constraining and numerical difficulties make experimenting hell.
I am not aware of any active research and I think they never were really big to begin with.
[deleted] t1_j6vy5uy wrote
[deleted]
Monoranos t1_j6vy1lg wrote
I am the only one who finds it weird to make profits from what it seems to be stolen data from the whole humanity?
Edit: Well didn't think this was a controversial take. I feel like people juste choose to ignore the whole aspect of consent and ethics about your data.
The GDPR further clarifies the conditions for consent in Article 7: https://gdpr.eu/gdpr-consent-requirements/
-
Where processing is based on consent, the controller shall be able to demonstrate that the data subject has consented to processing of his or her personal data.
-
If the data subject’s consent is given in the context of a written declaration which also concerns other matters, the request for consent shall be presented in a manner which is clearly distinguishable from the other matters, in an intelligible and easily accessible form, using clear and plain language. Any part of such a declaration which constitutes an infringement of this Regulation shall not be binding.
-
The data subject shall have the right to withdraw his or her consent at any time. The withdrawal of consent shall not affect the lawfulness of processing based on consent before its withdrawal. Prior to giving consent, the data subject shall be informed thereof. It shall be as easy to withdraw as to give consent.
-
When assessing whether consent is freely given, utmost account shall be taken of whether, inter alia, the performance of a contract, including the provision of a service, is conditional on consent to the processing of personal data that is not necessary for the performance of that contract.
GoofAckYoorsElf t1_j6vwbgm wrote
Well, there goes a main argument against the copyright warriors... Damn...
TheDeviousPanda t1_j6vv0my wrote
Reply to [D] Apple's ane-transformers - experiences? by alkibijad
I hate to do this to you, but I have been in your position and I have answers to all your questions.
- Yes, yes
- A lot
- Yes, very
Parzival_007 t1_j6vtx0l wrote
I'm not surprised, but imo this is good. I think they did the same once before ? Hopefully the watermarking system gets very good too, I know there is active research going on in this area.
GoofAckYoorsElf t1_j6wbljw wrote
Reply to comment by Ulfgardleo in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
Both, actually. I can easily echo this question back to the people I call copyright warriors. Do they care about what is right or what they like? Right would be that everyone took an objective and unbiased look at the new technology and how to incorporate it into their work, instead of seeing only and aggressively clinging to their crumbling business models.