Recent comments in /f/MachineLearning

dineNshine t1_j5esx4f wrote

Why would you want to do this? We can fake text without GPT, and we also have the means to prove authenticity by digital signatures. By limiting the technology artificially, you will end up limiting the end user, while organizations with more resources will still be able to circumvent these limitations by training their own models.

To avoid limiting usability, desired limitations should be applied on top of the base model by the end user, not to the base model itself.

The sole consequence of attempts like which OP suggests is further centralization of the technology, which is the worst imaginable result.

25

BitterAd9531 t1_j5erse4 wrote

Won't work in the long term. OpenAI might have been the first one to release, but we know other companies have better LLMs and others will catch up soon. When that happens, models without watermarks will be released and people who want output without a watermark will use that model.

And even if you somehow force all of them to implement a watermark, it would be trivial to combine outputs of different models to circumvent it. Not to mention that slight rewrites by a human would probably break most watermarks, the same way they break the current GPT detectors.

158

adt t1_j5erdiz wrote

Already in the works (Scott Aaronson is a scientist with OpenAI):

>>we actually have a working prototype of the watermarking scheme, built by OpenAI engineer Hendrik Kirchner. It seems to work pretty well—empirically, a few hundred tokens seem to be enough to get a reasonable signal that yes, this text came from GPT.
>Now, this can all be defeated with enough effort. For example, if you used another AI to paraphrase GPT’s output—well okay, we’re not going to be able to detect that. On the other hand, if you just insert or delete a few words here and there, or rearrange the order of some sentences, the watermarking signal will still be there. Because it depends only on a sum over n-grams, it’s robust against those sorts of interventions.
https://scottaaronson.blog/?p=6823

177

drewkungfu t1_j5erafj wrote

Reply to comment by ardula99 in ChatGPT is not all you need [R] by EduCGM

I think there was major word choice failures that perhaps auto-correct spell help mask.

Here my attempt to fix:

“This work paper consists on an attempts to describe in a concise way the min models are and sectors ^of ^industry ^jobs that are affected by generative AI.

8

EmmyNoetherRing t1_j5er3xp wrote

I’d heard they had added one, actually. Or were planning to— the concern they listed was they didn’t want the model accidentally training on its own output, as more of its output shows up online.

I have to imagine this is a situation where security by obscurity is unavoidable though, so if they do have a watermark we might not hear much about it. Otherwise malicious users would just clean it back out again.

We may end up with a situation where only a few people internal to OpenAI know how the watermark works, and they occasionally answer questions for law enforcement with the proper paperwork.

51

AmputatorBot t1_j5eqn8m wrote

It looks like you shared an AMP link. These should load faster, but AMP is controversial because of concerns over privacy and the Open Web.

Maybe check out the canonical page instead: https://techcrunch.com/2022/12/10/openais-attempts-to-watermark-ai-text-hit-limits/


^(I'm a bot | )^(Why & About)^( | )^(Summon: u/AmputatorBot)

8

JackandFred t1_j5epyi0 wrote

It wouldn’t necessarily be easy. But you say you want one detectable by some “key or other model” you can already design or use a model to detect if it was generated by Gpt, so it wouldn’t really need to use a watermark if you’re using a model. And if you’re using a more traditional watermark for digital pictures it could be very easily removed.

1

sabertoothedhedgehog t1_j5eneoh wrote

Love the topic of the paper.Absolutely HATE the figures showing taxonomies / example AI tools. These visualisations with boxes and arrows are really awful. These arrows are all over the place and meaningless. And the category boxes look the same as the application boxes.

It could have looked more like this:https://the-decoder.com/wp-content/uploads/2022/10/market_map_generative_AI-770x1027.png.webp

Or like this:https://www.sequoiacap.com/wp-content/uploads/sites/6/2022/09/genai-landscape-8.png

I don't even particularly like my examples. But there is no need for all these arrows and category boxes looking like the examples.

3

clemda2 t1_j5eliix wrote

You CAN batch train GCNs (or some of them are very amenable to that) some of the most scalable GCNs rely on something like GraphSAGE convolution which doesn’t require the whole graph laplacian for updates (this approach is used by Wikipedia, Uber, Pinterest) to train highly scalable GCNs). Other convolutional operators like GAT also can be batch trained.

You can use the Python package PyTorch-Geometric documentation as a jumping off point for reading about practical graph sub sampling.

2

damc4 t1_j5e9bog wrote

Ok, so that proves that someone has the skill. But when someone doesn't have master/phd, that doesn't prove that someone doesn't have that skill. In other words, if someone has no master/phd degree, but has published a research paper, then they have also proved to have that skill, so it should be possible for someone with bachelor / no bachelor to get a job in ML. Is that correct?

1