Recent comments in /f/MachineLearning

vwings t1_j5gowwn wrote

For such retrieval systems, you would usually use Top-1, Top-5 or Top-something accuracy. Concretely, you have a list of product types (embeddings) in your database (let's say 100 or 100,000 whatever). Then you get your product description, you embed it with your ANN and then you compare it with all product type embs. then you check on which rank the correct product type ends up. From that you can calculate mean rank or top-k accuracy ..

8

kkchangisin t1_j5gcgbe wrote

Nice work! Triton already looks good but have you tried optimizing with the Triton Model Analyzer?

https://github.com/triton-inference-server/model_analyzer

In various models I use with Triton I've found the output model formats and configurations for use with Triton can provide drastically increased performance whether that be throughput, latency, etc.

Hopefully I get some time soon to try it out myself!

Again, nice work!

5

Appropriate_Ant_4629 t1_j5gb1kw wrote

Stable Diffusion already includes one by default:

In particular it uses

Of course with open source software and models, you'd be free to create a fork that doesn't include one, or uses a different one.

30

EmmyNoetherRing t1_j5g8ogy wrote

So, not quite. You’re describing funny cases that a trained classifier will misclassify.

We’re talking about what happens if you can intentionally inject bias into an AI’s training data (since it’s pulling that data from the web, if you know where it’s pulling from you can theoretically influence how it’s trained). That would potentially cause it to misclassify many cases (or have other more complex issues). It starts to be weirdly slightly feasible if you think about a future where a lot of online content is generated by AI— but we have at least two competing companies/governments supplying those AI.

Say we’ve got two AI’s, A & B. A can use secret proprietary watermarks to recognize its own text online and avoid using that text in its training data (it wants to train on human data). And of course AI B can do the same thing, to recognize its own text. But since each AI is using its own secret watermarks, there’s no good way to prevent A from accidentally training on B’s output. And vice versa.

The AI’s are supposed to only train on human data, to be more like humans. But maybe there will be a point where they unavoidably start training on each other. And then if there’s a malicious actor, they might intentionally use their AI to flood a popular public text data source with content that, if the other AI ingest it, will cause them to behave in a way that the actor wants (biased against their targets, or biased positively for the actor).

Effectively, at some point we may have to deal with people secretly using AI to advertise to, radicalize, or scam other AI. Unless we get some fairly global regulations up in time. Should be interesting.

I wonder to what extent we’ll manage to get science fiction out about these things before we start seeing them in practice.

7

CarelessBar2844 t1_j5g54gw wrote

Rejected after an 866 score (up from 856). the 8 score reviewer had given a 2 line review and the meta reviewer said their review was discounted (I expected this).The sad part is that in both neurips (score 656) and iclr (score 866), our paper was borderline/weak accept from all reviewers (and one 8 score this time), but the meta reviewer comes up with new problems with the paper, without giving us a chance to respond.

Any way, onto the grind for the next conferences, I hope I will get a first author top conference paper this year!

3

BitterAd9531 t1_j5g52os wrote

>Besides that, OP stated that he wants to use a llm for this, not me.

Actually I didn't. If you read my comment you'd understand I would need the LLM to demonstrate the model that does the actual combining (which obviously wouldn't be an LLM). Seeing as there are currently no models that have watermarking, I'd have to write one myself to test the actual model that does the combining to circumvent the watermark. Either you didn't understand this, or you're once again taking single sentences out of context and making semi-valid points that don't have any relevancy to the orignal discussion.

But honestly I feel like this is completely besides the point. I've given you a high-level explanation of how these watermarks can be defeated and you seem to be the only one who does not understand how they work.

4