Recent comments in /f/MachineLearning

Numerous-Carrot3910 t1_j5jhhkg wrote

Hi, I’m trying to build a model with a large number of categorical predictor variables that each have a large number of internal categories. Implementing OHE leads to a higher dimensional dataset than I want to work with. Does anyone have advice for dealing with this other than using subject matter expertise or iteration to perform feature selection? Thanks!

1

hiptobecubic t1_j5jcexd wrote

This will be a permanent arms race of training the generative model and the detector to defeat one another, with each iteration making it harder and harder for humans to do so unaided. Training these models is expensive and only the big corporate players are currently able to do so.

1

Lamos21 t1_j5j74g9 wrote

Hi. I'm looking to create a custom dataset for pose estimation. Are there any free annotation tools suitable to annotate objects (meaning not human) so that I can create a custom dataset? Thanks

1

KvanteKat t1_j5j3ewk wrote

>I think the worst possible idea is allowing a single person or handful of people to have near-total control over the future of AI

I'm not sure regulation is the biggest threat to the field of AI being open. We already live in a world where a small handful of people (i.e. decision makers at Alphabet, OpenAI, etc.) have an outsized influence on the development of the field because training large models is so capital-intensive that very few organizations can really compete with them (researches at universities sure as hell can't). Neither compute (on the scale necessary to train a state-of-the-art model) or well-curated large training datasets are cheap.

Since it is in the business interest of incumbents in this space to minimize competition (nobody likes to be disrupted), and since incumbents in this space already have an outsized influence, some degree of regulation to keep them in check may well be beneficial rather than detrimental to the development of AI and derived technologies and their integration into wider society (at least I believe so, although I'm open to other perspectives in this matter).

2

MrEloi t1_j5j2hz1 wrote

Most people are already uniquely identifiable via browser fingerprinting.

The Powers That Be can find you if they are interested enough.

The Una Bomber had the 'right' idea with regard to security - he lived in a basic hut in the woods.

Ironically, he was identified by his writing style ... his brother recognized the text style in a letter sent by Ted Kaczynski.

1

Acceptable-Cress-374 t1_j5ivlhe wrote

> You don’t want the thing talking to itself.

Heh, I was thinking about this the other day. Do you think there's a world where LLMs can become better by "self-play" a la AlphaZero? Would it converge to understandable language or would it diverge into babllbe-speak?

1

Kacper-Lukawski t1_j5itq5h wrote

You need some ground truth labels to evaluate the quality of the semantic search. It might be a relevancy score or just binary information that a particular item is relevant. But you don't need to label all our data points.

There is a great article describing the metrics: https://neptune.ai/blog/recommender-systems-metrics I use that as a reference quite often. And if you are interested in a more step-by-step introduction, here is an article I wrote: https://qdrant.tech/articles/qa-with-cohere-and-qdrant/ It's an end-to-end solution, but some basic quality measurement is also included.

3