Recent comments in /f/MachineLearning

mirrorcoloured t1_j5wwhn5 wrote

While I agree with your concluding sentiments against centralization and recommending use of signatures, I don't believe your initial premise holds.

Consider stenography in digital images, where extra information can be included without any noticeable loss in signal quality.

One could argue that any bits not used for the primary signal are 'limiting usability', but this seems pedantic to me. It seems perfectly reasonable that watermarking could be implemented with no noticable impacts, given the already massive amount of computing power required and dense information output.

1

NeoKov t1_j5wmjkr wrote

As a novice, I’m not understanding why the test loss continues to increase— in general, but also in Fig. 8.2b, if anyone can explain… The model continues to update and (over)fit throughout testing? I thought it was static after training. And the testing batch is always the same size as the training batch? And they don’t occur simultaneously, right? So the test plot is only generated after the training plot.

1

shingekichan1996 OP t1_j5wlavz wrote

I saw an implementation of that paper here: https://github.com/raminnakhli/Decoupled-Contrastive-Learning

And I saw also that the same paper is rejected at NeurIPS'21 becuase of its similar impact on other methods like Barlow Twins, SimSiam, BYOL, etc.

However, at first glance at the re-implemented results, it works great on small batch-size indeed.

3

koolaidman123 t1_j5wbk37 wrote

Thats not the same thing...

Gradient accumulation calcs the loss on each batch, it doesnt work with in batch negatives because you need compare input from batch 1 to inputs of batch 2, hence offloading and caching predictions, then calculating the loss with 1 batch

Thats why gradient accumulation doesnt work to simulate large batch sizes for contrastive learning, if youre familiar with it

8

BigDreamx OP t1_j5w8u6o wrote

Reply to comment by Red-Portal in [D] Publication Resume by BigDreamx

"A reviewer may be able to deduce the authors’ identities by using external resources, such as technical reports published on the web. The availability of information on the web that may allow reviewers to infer the authors’ identities does not constitute a breach of the double-blind submission policy. Reviewers are explicitly asked not to seek this information."

Doesn't this indicate that we can post whatever on the web? It's just the submission that has to he anonymous?

0

Red-Portal t1_j5w8thc wrote

Reply to comment by BigDreamx in [D] Publication Resume by BigDreamx

> Authors are allowed to post versions of their work on preprint servers such as arXiv. They are also allowed to give talks to restricted audiences on the work(s) submitted to ICML during the review. If you have posted or plan to post a non-anonymized version of your paper online before the ICML decisions are made, the submitted version must not refer to the non-anonymized version.

> ICML strongly discourages advertising the preprint on social media or in the press while under submission to ICML. Under no circumstances should your work be explicitly identified as ICML submission at any time during the review period, i.e., from the time you submit the paper to the communication of the accept/reject decisions.

Mate, it's stated on the call for papers

5

RealKillering t1_j5vu1t5 wrote

I just started working with Google Colab. I am still learning and just used Cifar 10 for the first time. I switched to colab pro and also switched the GPU class to Premium.

The thing is the training seems to take just as long as with the free GPU. What am I doing wrong?

2

squidward2022 t1_j5vmb95 wrote

(https://arxiv.org/pdf/2106.04156.pdf ) This was a cool paper from NeurIPS 2020 which aimed to theoretically explain the success of CL by relating it spectral clustering. They present a loss with a very similar form to InfoNCE, which they use for their theory. One of the plus sides found was it worked well with small batch sizes.

(https://arxiv.org/abs/2110.06848) I skimmed this work a while back, one of their main claims is that this approach works with small batch sizes.

2

dineNshine t1_j5vm4r5 wrote

By definition. If you force the model to embed a watermark, you can only generate watermarked content. Since OP proposed to embed it into model parameters, it would also likely degrade performance.

Limiting the end user this way is bad, for reasons I have stated above. The right approach is to train a model that fits the data well, and then condition it using the input prompt. Putting arbitrary limits on the model itself to prevent misuse is misguided at best, making sure that only people in power will be able to utilize the technology to its fullest. This would also give people a false sense of security, since they might think that content generated with a model lacking a watermark is "genuine".

If AI advances to a point where the content it generates is indistinguishable from human-generated content and fake recordings become a problem, the only sensible thing we can really do is using signatures. This is a simple method that works perfectly well. For any piece of virtual content, you can quickly check if it came from an entity known to you by checking against their public key.

1