Recent comments in /f/MachineLearning

GoofAckYoorsElf t1_j6wbljw wrote

Both, actually. I can easily echo this question back to the people I call copyright warriors. Do they care about what is right or what they like? Right would be that everyone took an objective and unbiased look at the new technology and how to incorporate it into their work, instead of seeing only and aggressively clinging to their crumbling business models.

2

DingusFamilyVacation t1_j6waih9 wrote

I'm excited to try this out. I'm doing most of the ML development on my team. I'll iterate on code development and retrain, multiple times over. Oftentimes, my team members will jump in and want to use a trained model to run some downstream analyses. If the library API has changed, or the model architecture has been tweaked, loading the state_dicts of earlier models becomes nearly impossible without checking out old commits. Even then, storing the results and associating them w commit numbers is super annoying.

Thanks for the tool!

1

SulszBachFramed t1_j6wa7ii wrote

You can make the same argument about lossy compression. Am I really infringing on copyright if I record an episode of House, re-encode it and redistribute it? It's not the 'original' episode, but a lossy copy of it. What if I compress it in a zip file and distribute that? In that case, I am only sharing something that can imperfectly recreate the original. The zip file itself does not resemble a video at all.

4

WikiSummarizerBot t1_j6w9h7w wrote

The Library of Babel

>"The Library of Babel" (Spanish: La biblioteca de Babel) is a short story by Argentine author and librarian Jorge Luis Borges (1899–1986), conceiving of a universe in the form of a vast library containing all possible 410-page books of a certain format and character set. The story was originally published in Spanish in Borges' 1941 collection of stories El jardín de senderos que se bifurcan (The Garden of Forking Paths). That entire book was, in turn, included within his much-reprinted Ficciones (1944).

^([ )^(F.A.Q)^( | )^(Opt Out)^( | )^(Opt Out Of Subreddit)^( | )^(GitHub)^( ] Downvote to remove | v1.5)

2

Argamanthys t1_j6w9gal wrote

There is a short story called The Library of Babel about a near-infinite library that contains every possible permutation of a book with 1,312,000 characters. It is not hard to recreate that library in code. You can explore it if you want.

Contained within that library is a copy of every book ever written, freely available to read.

Is that book piracy? It's right there if you know where to look.

That's pretty much what's going on here. They searched the latent space for an image and found it. But that's because the latent space, like the Library of Babel is really big and contains not just that image but also near-infinite permutations of it.

3

sad_potato00 t1_j6w92uy wrote

so we had a similar problem, where buidling names were written in diffrent ways (some abbreviation, full name, full name + what type of it). something that worked for me was using sentence BERT and doing a cosine similarity. deciding a cut off value was easier than deciding how many cluster to use. sadly, manuall labeling and checking is still needed

1

jimmymvp t1_j6w4ezb wrote

Any application where you need exact likelihoods, flows are king. Such is the case for example jf you're learning a sampling distribution for MCMC sampling, estimating normalizing constants (I believe in physics there are a lot of these problems) etc.

9

Ulfgardleo t1_j6vzpgz wrote

There is only very little research. They are a nice theoretical idea, but the concept is very constraining and numerical difficulties make experimenting hell.

I am not aware of any active research and I think they never were really big to begin with.

−4

Monoranos t1_j6vy1lg wrote

I am the only one who finds it weird to make profits from what it seems to be stolen data from the whole humanity?

Edit: Well didn't think this was a controversial take. I feel like people juste choose to ignore the whole aspect of consent and ethics about your data.

The GDPR further clarifies the conditions for consent in Article 7: https://gdpr.eu/gdpr-consent-requirements/

  1. Where processing is based on consent, the controller shall be able to demonstrate that the data subject has consented to processing of his or her personal data.

  2. If the data subject’s consent is given in the context of a written declaration which also concerns other matters, the request for consent shall be presented in a manner which is clearly distinguishable from the other matters, in an intelligible and easily accessible form, using clear and plain language. Any part of such a declaration which constitutes an infringement of this Regulation shall not be binding.

  3. The data subject shall have the right to withdraw his or her consent at any time. The withdrawal of consent shall not affect the lawfulness of processing based on consent before its withdrawal. Prior to giving consent, the data subject shall be informed thereof. It shall be as easy to withdraw as to give consent.

  4. When assessing whether consent is freely given, utmost account shall be taken of whether, inter alia, the performance of a contract, including the provision of a service, is conditional on consent to the processing of personal data that is not necessary for the performance of that contract.

−33