Recent comments in /f/MachineLearning
Numerous-Carrot3910 t1_j5jhhkg wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hi, I’m trying to build a model with a large number of categorical predictor variables that each have a large number of internal categories. Implementing OHE leads to a higher dimensional dataset than I want to work with. Does anyone have advice for dealing with this other than using subject matter expertise or iteration to perform feature selection? Thanks!
[deleted] t1_j5je7q6 wrote
Reply to [D] Simple Questions Thread by AutoModerator
[removed]
hiptobecubic t1_j5jchks wrote
Reply to comment by londons_explorer in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
This would be trivially defeated by making tiny changes to the text. Also it is wildly impractical and won't scale up to wide spread usage.
hiptobecubic t1_j5jcexd wrote
Reply to comment by JackandFred in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
This will be a permanent arms race of training the generative model and the detector to defeat one another, with each iteration making it harder and harder for humans to do so unaided. Training these models is expensive and only the big corporate players are currently able to do so.
hiptobecubic t1_j5jbgh1 wrote
Reply to comment by mirrorcoloured in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
One thing i can imagine is that the AI refuses to output text that doesn't trigger the watermark detector.
hiptobecubic t1_j5jb8db wrote
Reply to comment by perspectiveiskey in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
This is the current state of things. With sufficient training data we can identify a lot about who wrote some text from the statistical properties of the text itself and voice has a long history of being used for identification at this point.
hiptobecubic t1_j5jaw2x wrote
Reply to comment by artsybashev in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
It's the same meaning.
hiptobecubic t1_j5jari2 wrote
Reply to comment by artsybashev in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
They aren't doing that. We're doing that.
Username912773 t1_j5j7vci wrote
Reply to comment by conchoso in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
How does that watermark work without altering image quality significantly across many prompts and color schemes?
Lamos21 t1_j5j74g9 wrote
Reply to [D] Simple Questions Thread by AutoModerator
Hi. I'm looking to create a custom dataset for pose estimation. Are there any free annotation tools suitable to annotate objects (meaning not human) so that I can create a custom dataset? Thanks
KvanteKat t1_j5j3ewk wrote
Reply to comment by TonyTalksBackPodcast in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
>I think the worst possible idea is allowing a single person or handful of people to have near-total control over the future of AI
I'm not sure regulation is the biggest threat to the field of AI being open. We already live in a world where a small handful of people (i.e. decision makers at Alphabet, OpenAI, etc.) have an outsized influence on the development of the field because training large models is so capital-intensive that very few organizations can really compete with them (researches at universities sure as hell can't). Neither compute (on the scale necessary to train a state-of-the-art model) or well-curated large training datasets are cheap.
Since it is in the business interest of incumbents in this space to minimize competition (nobody likes to be disrupted), and since incumbents in this space already have an outsized influence, some degree of regulation to keep them in check may well be beneficial rather than detrimental to the development of AI and derived technologies and their integration into wider society (at least I believe so, although I'm open to other perspectives in this matter).
MrEloi t1_j5j2hz1 wrote
Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar
Most people are already uniquely identifiable via browser fingerprinting.
The Powers That Be can find you if they are interested enough.
The Una Bomber had the 'right' idea with regard to security - he lived in a basic hut in the woods.
Ironically, he was identified by his writing style ... his brother recognized the text style in a letter sent by Ted Kaczynski.
aero_oliver t1_j5iz5d7 wrote
Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
I curious, the absolutely no way a watermark can’t be worked around. I think this is gonna turn out to be a Caine of cat and mouse which each side having to constantly develop.
Cherubin0 t1_j5iwn3e wrote
Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
I don't see a value in that. Humans can just lie as much and write things that are not true or pretend to be someone else. The only difference is that you can write much more with such tools.
[deleted] t1_j5iwdml wrote
Reply to [D] Simple Questions Thread by AutoModerator
[deleted]
silverstone1903 OP t1_j5iw6bx wrote
Reply to comment by vwings in Evaluation for similarity search [P] by silverstone1903
Makes sense, I'll try. Thanks!
silverstone1903 OP t1_j5iw1wl wrote
Reply to comment by Kacper-Lukawski in Evaluation for similarity search [P] by silverstone1903
Thanks for the links. I'll check them out asap.
andreichiffa t1_j5ivwgc wrote
Reply to comment by TheTerrasque in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
or OPT175.
However 7B is more than large enough to do a lot of shady stuff that 175B models can do. Even 1.5B ones are already starting to do a good job with a minimally competent user.
Acceptable-Cress-374 t1_j5ivlhe wrote
Reply to comment by EmmyNoetherRing in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
> You don’t want the thing talking to itself.
Heh, I was thinking about this the other day. Do you think there's a world where LLMs can become better by "self-play" a la AlphaZero? Would it converge to understandable language or would it diverge into babllbe-speak?
silverstone1903 OP t1_j5ivdmx wrote
Reply to comment by Original_Rip_8182 in Evaluation for similarity search [P] by silverstone1903
Thank you for your answer. What is the difference from using annoy? I'm experimenting with annoy, faiss, and hsnw. The performance is not the thing just because I can't measure the quality of retrievals 🤷🏻♂️
Kacper-Lukawski t1_j5itq5h wrote
Reply to Evaluation for similarity search [P] by silverstone1903
You need some ground truth labels to evaluate the quality of the semantic search. It might be a relevancy score or just binary information that a particular item is relevant. But you don't need to label all our data points.
There is a great article describing the metrics: https://neptune.ai/blog/recommender-systems-metrics I use that as a reference quite often. And if you are interested in a more step-by-step introduction, here is an article I wrote: https://qdrant.tech/articles/qa-with-cohere-and-qdrant/ It's an end-to-end solution, but some basic quality measurement is also included.
TheTerrasque t1_j5irq12 wrote
Reply to comment by andreichiffa in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
On a side note, 7b isn't large these days.
GPT3 and BLOOMZ are around 175b parameters.
[deleted] t1_j5ip3dq wrote
Reply to comment by franciscrot in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
[deleted]
1980sMUD t1_j5inqqg wrote
Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar
If you’re worried about this, then first ask a model to generate your comments for you.
artsybashev t1_j5jj7fi wrote
Reply to comment by hiptobecubic in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut
no