Recent comments in /f/MachineLearning

ProSmokerPlayer t1_j6yybe7 wrote

Bots are detected using a number of different methods, probably last of which is how they actually play their cards. Bots have consistent timing tells on actions, they always click the mouse button on the exact same pixel, they fail captcha's etc.

Bots are not really an issue, the bigger issue is assisted play for a human. So a programme tells the player the optimal move and the player executes it. That's much harder to catch.

1

JigglyWiener t1_j6yxwhz wrote

The models don’t come with buttons that do anything. They are tools capable only of what the software developers permit to enter the models and what users request.

If we go down the road of regulating training and capacity to do x, you’ll have to file lawsuits against every artist on behalf of every copyright holder over the IP inside the artist’s head.

These cases are going to fall apart and copyright holders are going to go after platforms that don’t put reasonable filters in place.

1

GoOsTT t1_j6yw29l wrote

I’m one of the lucky ones and it has not really acted up for me just yet but one of my teammates is going through nightmares with it and it hurts me to see him suffer.

On the other hand it is a really nice piece of software which makes its flaws even harder to fathom honestly.

−3

seattleite849 OP t1_j6yt8w2 wrote

How are you wanting to trigger your function?

Also, here are some examples you can peek at: https://docs.cakework.com/examples

Under the hood, both Lambda and cakework are deploying Docker containers as microVMs running on bare metal instances. A few key differences:

- Lambda is a building block vs cakework is a custom, point solution for running async tasks. Meaning with Lambda, you will want to wire together other cloud resources to make it an application you can hit. This mix of code and infrastructure makes iterating quickly on your actual logic slow, in my experience, since you need to:

- Trigger the function (either exposing it via API Gateway if you'd like to invoke it using a REST call), or by hooking it up to an event (S3 PutObject, database update event).

- To hook up your function to other functions (for example, if you want to upload the final artifact to S3), you'll set up SQS queues. If you want to chain functions together, you'll set up Step Functions

- To track failures, store input/output params and results, and easily view logs, you would set up a database and write some scripts to trace the request via Cloudwatch logs.

- With Lambda, you manage creating and building the container yourself, as well as updating the Lambda function code. There are tools out there such as sst or serverless.com which help streamline this.

- With Cakework, you write your Python functions as plain code, then run a single command via the Cakework CLI to run `cakework deploy` which deploys your functions, exposes a public endpoint you can hit (either via REST calls, a Python SDK, or Javascript/Typescript SDK). The nice thing is you can directly test invoking your function as if it were code running on your local machine.

- No limits on the docker image size and no limit on how long your job can run for (vs 10 GB and 15 minute timeout for Lambda)

- You also specify CPU and memory parameters per request! So that you don't need to spin up a bigger instance than you actually need and pay that extra cost. Or provision not enough CPU or memory and 1) deal with failures, then 2) re-deploy your lambda with more compute.

3

pm_me_your_pay_slips OP t1_j6ypajq wrote

>This isn't a good definition for "memorization" because it's indistinguishable from how we define outliers.

The paper has this to say about your point

> If highly memorized observations are always given a low probability when they are included in the training data, then it would be straightforward to dismiss them as outliers that the model recognizes as such. However, we find that this is not universally the case for highly memorized observations, and a sizable proportion of them are likely only when they are included in the training data.


> Figure 3a shows the number of highly memorized and “regular” observations for bins of the log probability under the VAE model for CelebA, as well as example observations from both groups for different bins. Moreover, Figure 3b shows the proportion of highly memorized observations in each of the bins of the log probability under the model. While the latter figure shows that observations with low probability are more likely to be memorized, the former shows that a considerable proportion of highly memorized observations are as likely as regular observations when they are included in the training set. Indeed, more than half the highly memorized observations fall within the central 90% of log probability values.

TLDR if this method was giving you a high score to outliers only, then these samples would have low likelihood when they were included in the training data (because they are outliers). But the authors observed sizeable proportion of the samples with high memorization score to be as likely as regular (inlier) data.

1

DigThatData t1_j6ynesq wrote

> p(sample| dataset including sample)/p(sample| dataset excluding sample) )

which, like I said, is basically identical to statistical leverage. If you haven't seen it before, you can compute LOOCV for a regression model directly from the hat matrix (which is another name for the matrix of leverage values). This isn't a good definition for "memorization" because it's indistinguishable from how we define outliers.

> What's the definition of memorization here? how do we measure it?

I'd argue that what's at issue here is differentiating between memorization and learning. My concern regarding the density ratio here is that a model that had learned to generalize well in the neighborhood of the observation in question would behave the same way, so this definition of memorization doesn't differentiate between memorization and learning, which I think effectively renders it useless.

I don't love everything about the paper you linked in the OP, but I think they're on the right track by defining their "memorization" measure by probing the model's ability to regenerate presumably memorized data, especially since our main concern wrt memorization is in regards to the model reproducing memorized values.

1

pm_me_your_pay_slips OP t1_j6yl0wq wrote

The first paper proposes a way of quantifying memorization by looking at pairs of prefixes and postfixes and observing whether the postfixes wer generated by the model when the prefixes were used as prompts.

The second paper has this to say about generalization:

> A natural question at this point is to ask why larger models memorize faster? Typically, memorization is associated with overfitting, which offers a potentially simple explanation. In order to disentangle memorization from overfitting, we examine memorization before overfitting occurs, where we define overfitting occurring as the first epoch when the perplexity of the language model on a validation set increases. Surprisingly, we see in Figure 4 that as we increase the number of parameters, memorization before overfitting generally increases, indicating that overfitting by itself cannot completely explain the properties of memorization dynamics as model scale increases.

In fact, this is the title of the paper: "Memorization without overfitting".


> Anyway, need to read this closer, but "lower posterior likelihood" to me seems fundamentally different from "memorized".

The memorization score is not "lower posterior likelihood", but the log density ratio for a sample: log( p(sample| dataset including sample)/p(sample| dataset excluding sample) ) . Thus, a high memorization score is given to samples that go from very unlikely when not included to as likely as the average sample when included in the training data, or from as likely as the average training sample when not included in the training data to above-average likelihood when included.

1