Recent comments in /f/MachineLearning

Loquzofaricoalaphar OP t1_j5kmiqg wrote

Yes this is the sort of thing I am thinking about. Some percentage of people have very distinct styles, however with Ted it might have been the content that gave it away.

Yes I am familiar with amiunique and all the variables of the browser.

I wonder if this way of identifying people is ever used when google or others get subpoenaed and hand over stuff. It would be more accurate than IP in determining the individual with correlations it seems, however I wonder if accepted by or holds up in court of law?

1

trnka t1_j5kksex wrote

Hmm, you might also try feature selection. I'm not sure what you mean by not iterating, unless you mean recursive feature elimination? There are a lot of really fast correlation functions you can try for feature selection -- scikit-learn has some popular options. They run very quickly, and if you have lots of data you can probably do the feature selection part on a random subset of the training data.

Also, you could do things like dimensionality reduction learned from a subset of the training data, whether PCA or a NN approach.

1

jpercivalhackworth t1_j5kfcw6 wrote

You are reading a lot into OP's question that I'm not seeing. Yes, COVID is anomalous, no it's not clear that for the purposes of modeling demand for an unidentified product that it makes sense to disregard it, adjust it, or perform so other adjustment. Depending on the what demand is being modeled COVID is still a factor.

You would do well to reread what they actually wrote and what I wrote. Nowhere did I say that they should predict the next pandemic (cool if they could, but not relevant here). Considering that COVID deaths appear to be climbing in parts of the world and we don't know where the OP is modeling for, there are a lot of unknowns to address before a meaningful answer can be arrived at.

1

GinoAcknowledges t1_j5kb95p wrote

A vast amount of technological knowledge (e.g. how to create poisons, manufacture bombs) has mass destructive potential if it can be scaled. The difficulty, just like with AI, is scaling, and this mostly self-regulates (with help from the government).

For example, you can build dangerous explosive devices in your garage. That knowledge is widely available (google "Anarchists Handbook"). If you try and build thousands of them (enough to cause mass destruction) the government will notice, and most likely, you aren't going to have enough money and time to do it.

The exact same thing will happen for "dangerous uses of AI". The only actors which have the hardware and capital to cause mass destruction with AI are the big tech firms developing AI. Try running inference at scale on even a 30B parameter model right now. It's extremely difficult unless you have access to multiple server-grade GPUs which are very expensive and hard to get ahold of even if you had the money.

3

trnka t1_j5k77wb wrote

The difference from application-level evaluation is a bit vague in that text. I'll use a medical example that I'm more familiar with - predicting the diagnosis from text input.

Application-level evaluation: If the output is a diagnosis code and explanation, I might measure how often doctors accept the recommended diagnosis and read the explanation without checking more information from the patient. And I'd probably want a medical quality evaluation as well, to penalize any biasing influence of the model.

Non-expert evaluation: With the same model, I might compare 2-3 different models and possibly a random baseline model. I'd ask people like myself with some exposure to medicine which explanation is best for a particular case and I could compare against random.

That said I'm not used to seeing non-experts used as evaluators, though it makes some sense in the early stages of poor explanations.

I'm more used to seeing the distinction between real and artificial evaluation. I included that in my example above -- "real" would be when we're asking users to accomplish some task that relies on explanation and we're measuring task success. "Artificial" is more just asking for an opinion about the explanation but the evaluators won't be as critical as they would be in a task-based evaluation.

Hope this helps! I'm not an expert in explainability though I've done some work with it in production in healthcare tech.

1

trnka t1_j5k5ndr wrote

Yeah you can use a neural network instead of linear regression if you'd like. I usually start with linear regression though, especially regularized, because it usually generalizes well and I don't need to worry about overfitting so much.

Once you're confident that you have a working linear regression model then it can be good to develop the neural network and use the linear regression model as something to compare to. I'd also suggest a "dumb" model like predicting the average car price as another point of comparison, just to be sure the model is actually learning something.

I'm not familiar with the Levenberg–Marquardt algorithm so I can't comment on that. From the Wikipedia page it sounds like a second-order method, and those can be used if the data set is small but they're uncommon for larger data. Typically with a neural network we'd use an optimizer like plain stochastic gradient descent or a variation like Adam.

1

trnka t1_j5k4ldr wrote

It depends on the data and the problems you're having with high-dimensional data.

  • If the variables are phrases like "acute sinusitis, site not specified" you could use a one hot encoding of ngrams that appear in them.
  • If you have many rare values, you can just retain the top K values per feature.
  • If those don't work, the hashing trick is another great thing to try. It's just not easily interpretable.
  • If there's any internal structure to the categories, like if they're hierarchical in some way, you can cut them off at a higher level in the hierarchy
2

Historical-Coat5318 t1_j5k3k1o wrote

I think so, yes. In that world the dead internet theory would become true and people will become only more dissociated from reality and society, especially so when AI can generate video and audio. The political repercussions are disastrous.

Also, I really love literature (and art in general) and a future where one cannot differentiate a human writer from AI is, frankly, suicidally bleak to me. I can see a future where publishers use AI to read the market and write the right books for maximum profit completely cutting out human authors from the process. I am an aspiring novelist myself and, while the act of writing is intrinsically motivating there is also a massive social component in terms of having a career and having others read your work that would be completely excised from creativity, so there is also a personal component I suppose. Sharing in the creativity of other humans is the main thing that gives life meaning to me personally and to many others, and to have that stripped from life is extremely depressing.

While this is all very speculative I just can't see the rapid advances in AI leading anywhere expect a lonelier, more isolated and chaotic world if it isn't seriously regulated. But all of this can be fixed if we could just identify AI text. Then nothing would change in terms of the place of human creativity in the world, it would be basically like chess, people still devote their lives to it and the community thrives but only because we can discern AI chess playing from human chess playing. Imagine if there were no anti-cheating policies in chess tournaments, no one would ever play chess seriously ever again.

If we could just identify AI output we would get all of the benefits of LLMs without any of the disastrous drawbacks. To me it is the most important issue right now, but people don't even consider it and are outright hostile to the idea, just see the downvotes to my original reply.

−1

Historical-Coat5318 t1_j5jw8o8 wrote

I just can't even begin to comprehend this view. Of course, democratizing something sounds good, but if AI has mass-destructive potential it is obviously safer if a handful of people have that power than if eight billion have it. Even if AI isn't mass-destructive, which it obviously isn't yet, it is already extremely socially disruptive and if any given person has that power our governing bodies have basically no hope of steering it in the right direction through regulation, (which they would try to since it would serve their best interests as individuals). The common person would still have a say in these regulations through the vote.

−1

Historical-Coat5318 t1_j5juhb7 wrote

AI in my view should be controlled by very few institutions, and these institutions should be carefully managed by experts and very intelligent people, which is the case for companies like Google or OpenAI. If AI must exist, and it must, I would much rather it were in the hands of people like Sam Altman and Scott Aaronson than literally everyone with an internet connection.

Obviously terms like "open-source" and "democratised" sound good, but if you think about the repercussions of this you will surely realise that it would be totally disastrous for society. Looking back in history we can see that nuclear weapons were actually quite judiciously managed when you consider all of the economic and political tensions of the time, now imagine if anyone could have bought a nuke at Walmart, human extinction would have been assured. Open-source AI is basically democratized mass-destruction, and if weapons of mass-destruction must exist (including AI), then it should be in as few hands as possible.

Even ignoring existential risk, which is obviously still very speculative, even LLMs should never be open-source because that makes any regulation impossible. In that world evidence (video, images and text), not to mention human creativity, would cease to exist and the internet would basically be unnavigable as the chasm between people's political conception of the world and the world itself only widens. Only a few companies should be allowed to have this technology, and they should be heavily regulated. I admit I don't know how this could be implemented, I just know that it should be.

This is basically Nick Bostrom's Vulnerable World Hypothesis. Bostrom should be read as a prerequisite for everyone involved in AI, in my opinion.

−2

billbobby21 t1_j5jnvmh wrote

If you spend money training a model using OpenAI's API for example, do you actually own the model? As in lets say you train it so that it gets really good at writing short stories about animals. Would you then actually own that model and have the rights to use and/or license it to others? Or would OpenAI also be able to improve their own local models using the model that you created?

Basically I'm wondering what is stopping the company you are using to create a model from just stealing your creation.

2

Forsaken-Indication t1_j5jniqa wrote

Either you're trolling or you need to reread what they said bud. They know that 2020 and 2021 are anomalous due to covid. And, as is the case across most markets, 2022 is a "new normal" year. Yes, obviously covid continues to effect things, but no it doesn't make sense to try to force a model to use anomalous data from during the pandemic stage of the covid now that we're beyond that stage.

1

iLIVECSUI_741 t1_j5jmlzh wrote

Hi, I wonder how to decide *When* it is ok to submit your work to top conferences. For example, I have a model related to biological data mining, I know KDD is coming soon but I do not like this conference and I would like to wait for NeurIPS. However, I am not sure if I will be scooped during this long period. Thanks for your help!

1