artsybashev t1_j5jj7fi wrote on January 23, 2023 at 1:35 PM

Reply to comment by hiptobecubic in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

no

Numerous-Carrot3910 t1_j5jhhkg wrote on January 23, 2023 at 1:20 PM

Reply to [D] Simple Questions Thread by AutoModerator

Hi, I’m trying to build a model with a large number of categorical predictor variables that each have a large number of internal categories. Implementing OHE leads to a higher dimensional dataset than I want to work with. Does anyone have advice for dealing with this other than using subject matter expertise or iteration to perform feature selection? Thanks!

[deleted] t1_j5je7q6 wrote on January 23, 2023 at 12:49 PM

Reply to [D] Simple Questions Thread by AutoModerator

[removed]

hiptobecubic t1_j5jchks wrote on January 23, 2023 at 12:31 PM

Reply to comment by londons_explorer in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

This would be trivially defeated by making tiny changes to the text. Also it is wildly impractical and won't scale up to wide spread usage.

hiptobecubic t1_j5jcexd wrote on January 23, 2023 at 12:31 PM

Reply to comment by JackandFred in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

This will be a permanent arms race of training the generative model and the detector to defeat one another, with each iteration making it harder and harder for humans to do so unaided. Training these models is expensive and only the big corporate players are currently able to do so.

hiptobecubic t1_j5jbgh1 wrote on January 23, 2023 at 12:20 PM

Reply to comment by mirrorcoloured in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

One thing i can imagine is that the AI refuses to output text that doesn't trigger the watermark detector.

hiptobecubic t1_j5jb8db wrote on January 23, 2023 at 12:18 PM

Reply to comment by perspectiveiskey in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

This is the current state of things. With sufficient training data we can identify a lot about who wrote some text from the statistical properties of the text itself and voice has a long history of being used for identification at this point.

hiptobecubic t1_j5jaw2x wrote on January 23, 2023 at 12:14 PM

Reply to comment by artsybashev in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

It's the same meaning.

hiptobecubic t1_j5jari2 wrote on January 23, 2023 at 12:13 PM

Reply to comment by artsybashev in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

They aren't doing that. We're doing that.

Username912773 t1_j5j7vci wrote on January 23, 2023 at 11:40 AM

Reply to comment by conchoso in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

How does that watermark work without altering image quality significantly across many prompts and color schemes?

Lamos21 t1_j5j74g9 wrote on January 23, 2023 at 11:30 AM

Reply to [D] Simple Questions Thread by AutoModerator

Hi. I'm looking to create a custom dataset for pose estimation. Are there any free annotation tools suitable to annotate objects (meaning not human) so that I can create a custom dataset? Thanks

KvanteKat t1_j5j3ewk wrote on January 23, 2023 at 10:42 AM

Reply to comment by TonyTalksBackPodcast in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

>I think the worst possible idea is allowing a single person or handful of people to have near-total control over the future of AI

I'm not sure regulation is the biggest threat to the field of AI being open. We already live in a world where a small handful of people (i.e. decision makers at Alphabet, OpenAI, etc.) have an outsized influence on the development of the field because training large models is so capital-intensive that very few organizations can really compete with them (researches at universities sure as hell can't). Neither compute (on the scale necessary to train a state-of-the-art model) or well-curated large training datasets are cheap.

Since it is in the business interest of incumbents in this space to minimize competition (nobody likes to be disrupted), and since incumbents in this space already have an outsized influence, some degree of regulation to keep them in check may well be beneficial rather than detrimental to the development of AI and derived technologies and their integration into wider society (at least I believe so, although I'm open to other perspectives in this matter).

MrEloi t1_j5j2hz1 wrote on January 23, 2023 at 10:29 AM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

Most people are already uniquely identifiable via browser fingerprinting.

The Powers That Be can find you if they are interested enough.

The Una Bomber had the 'right' idea with regard to security - he lived in a basic hut in the woods.

Ironically, he was identified by his writing style ... his brother recognized the text style in a letter sent by Ted Kaczynski.

aero_oliver t1_j5iz5d7 wrote on January 23, 2023 at 9:41 AM

Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

I curious, the absolutely no way a watermark can’t be worked around. I think this is gonna turn out to be a Caine of cat and mouse which each side having to constantly develop.

Cherubin0 t1_j5iwn3e wrote on January 23, 2023 at 9:05 AM

Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

I don't see a value in that. Humans can just lie as much and write things that are not true or pretend to be someone else. The only difference is that you can write much more with such tools.

[deleted] t1_j5iwdml wrote on January 23, 2023 at 9:01 AM

Reply to [D] Simple Questions Thread by AutoModerator

[deleted]

silverstone1903 OP t1_j5iw6bx wrote on January 23, 2023 at 8:58 AM

Reply to comment by vwings in Evaluation for similarity search [P] by silverstone1903

Makes sense, I'll try. Thanks!

silverstone1903 OP t1_j5iw1wl wrote on January 23, 2023 at 8:56 AM

Reply to comment by Kacper-Lukawski in Evaluation for similarity search [P] by silverstone1903

Thanks for the links. I'll check them out asap.

andreichiffa t1_j5ivwgc wrote on January 23, 2023 at 8:54 AM

Reply to comment by TheTerrasque in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

or OPT175.

However 7B is more than large enough to do a lot of shady stuff that 175B models can do. Even 1.5B ones are already starting to do a good job with a minimally competent user.

Acceptable-Cress-374 t1_j5ivlhe wrote on January 23, 2023 at 8:50 AM

Reply to comment by EmmyNoetherRing in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

> You don’t want the thing talking to itself.

Heh, I was thinking about this the other day. Do you think there's a world where LLMs can become better by "self-play" a la AlphaZero? Would it converge to understandable language or would it diverge into babllbe-speak?

silverstone1903 OP t1_j5ivdmx wrote on January 23, 2023 at 8:47 AM

Reply to comment by Original_Rip_8182 in Evaluation for similarity search [P] by silverstone1903

Thank you for your answer. What is the difference from using annoy? I'm experimenting with annoy, faiss, and hsnw. The performance is not the thing just because I can't measure the quality of retrievals 🤷🏻‍♂️

Kacper-Lukawski t1_j5itq5h wrote on January 23, 2023 at 8:24 AM

Reply to Evaluation for similarity search [P] by silverstone1903

You need some ground truth labels to evaluate the quality of the semantic search. It might be a relevancy score or just binary information that a particular item is relevant. But you don't need to label all our data points.

There is a great article describing the metrics: https://neptune.ai/blog/recommender-systems-metrics I use that as a reference quite often. And if you are interested in a more step-by-step introduction, here is an article I wrote: https://qdrant.tech/articles/qa-with-cohere-and-qdrant/ It's an end-to-end solution, but some basic quality measurement is also included.

TheTerrasque t1_j5irq12 wrote on January 23, 2023 at 7:57 AM

Reply to comment by andreichiffa in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

On a side note, 7b isn't large these days.

GPT3 and BLOOMZ are around 175b parameters.

[deleted] t1_j5ip3dq wrote on January 23, 2023 at 7:22 AM

Reply to comment by franciscrot in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

[deleted]

1980sMUD t1_j5inqqg wrote on January 23, 2023 at 7:05 AM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

If you’re worried about this, then first ask a model to generate your comments for you.

Recent comments in /f/MachineLearning