Recent comments in /f/MachineLearning
ProSmokerPlayer t1_j6yybe7 wrote
Reply to comment by Acceptable-Cress-374 in [P] AI Poker/Machine Learning/Game-Theory by Much_Blacksmith_1857
Bots are detected using a number of different methods, probably last of which is how they actually play their cards. Bots have consistent timing tells on actions, they always click the mouse button on the exact same pixel, they fail captcha's etc.
Bots are not really an issue, the bigger issue is assisted play for a human. So a programme tells the player the optimal move and the player executes it. That's much harder to catch.
JigglyWiener t1_j6yxwhz wrote
Reply to comment by Ronny_Jotten in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
The models don’t come with buttons that do anything. They are tools capable only of what the software developers permit to enter the models and what users request.
If we go down the road of regulating training and capacity to do x, you’ll have to file lawsuits against every artist on behalf of every copyright holder over the IP inside the artist’s head.
These cases are going to fall apart and copyright holders are going to go after platforms that don’t put reasonable filters in place.
bumbo-pa t1_j6yw62b wrote
Reply to comment by Geneocrat in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
You mean the app? I did get a meaningful update in the flatpak not so long ago before I switched to browser
Edit: oh yeah seems you're right, and just around the time I quit
GoOsTT t1_j6yw29l wrote
Reply to comment by lawless_c in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
I’m one of the lucky ones and it has not really acted up for me just yet but one of my teammates is going through nightmares with it and it hurts me to see him suffer.
On the other hand it is a really nice piece of software which makes its flaws even harder to fathom honestly.
Geneocrat t1_j6yvl0d wrote
Reply to comment by bumbo-pa in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
It’s no longer supported. The installer is just an old copy
lawless_c t1_j6yvccp wrote
Reply to comment by GoOsTT in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
I downloaded teams to do job interviews.
Had to disable "run on startup" because I'd constantly be treated "teams has crashed" everytime I started my pc.
alpha-meta OP t1_j6yud7x wrote
Reply to comment by Jean-Porte in [D] Why do LLMs like InstructGPT and LLM use RL to instead of supervised learning to learn from the user-ranked examples? by alpha-meta
Ah yes, I see what you mean now, thanks!
fuscarili OP t1_j6yteod wrote
Reply to comment by ok531441 in [D] I'm at a crossroads: Bayesian methods VS Reinforcement Learning, which to choose? by fuscarili
Yes that's right. But which one would you say will become more handy to work as a ML engineer or data scientist?
What I have heard is that RL usage in the industry is almost non-existent.
seattleite849 OP t1_j6yt8w2 wrote
Reply to comment by BasilLimade in [p] I built an open source platform to deploy computationally intensive Python functions as serverless jobs, with no timeouts by seattleite849
How are you wanting to trigger your function?
Also, here are some examples you can peek at: https://docs.cakework.com/examples
Under the hood, both Lambda and cakework are deploying Docker containers as microVMs running on bare metal instances. A few key differences:
- Lambda is a building block vs cakework is a custom, point solution for running async tasks. Meaning with Lambda, you will want to wire together other cloud resources to make it an application you can hit. This mix of code and infrastructure makes iterating quickly on your actual logic slow, in my experience, since you need to:
- Trigger the function (either exposing it via API Gateway if you'd like to invoke it using a REST call), or by hooking it up to an event (S3 PutObject, database update event).
- To hook up your function to other functions (for example, if you want to upload the final artifact to S3), you'll set up SQS queues. If you want to chain functions together, you'll set up Step Functions
- To track failures, store input/output params and results, and easily view logs, you would set up a database and write some scripts to trace the request via Cloudwatch logs.
- With Lambda, you manage creating and building the container yourself, as well as updating the Lambda function code. There are tools out there such as sst or serverless.com which help streamline this.
- With Cakework, you write your Python functions as plain code, then run a single command via the Cakework CLI to run `cakework deploy` which deploys your functions, exposes a public endpoint you can hit (either via REST calls, a Python SDK, or Javascript/Typescript SDK). The nice thing is you can directly test invoking your function as if it were code running on your local machine.
- No limits on the docker image size and no limit on how long your job can run for (vs 10 GB and 15 minute timeout for Lambda)
- You also specify CPU and memory parameters per request! So that you don't need to spin up a bigger instance than you actually need and pay that extra cost. Or provision not enough CPU or memory and 1) deal with failures, then 2) re-deploy your lambda with more compute.
ok531441 t1_j6yqymo wrote
Reply to [D] I'm at a crossroads: Bayesian methods VS Reinforcement Learning, which to choose? by fuscarili
Do the one you're more interested in, you won't be more or less imployable because of an optional university course.
Whencowsgetsick t1_j6ypwdl wrote
Reply to comment by TREDOTCOM in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Are you referring to something like https://support.apple.com/guide/shortcuts/request-your-first-api-apd58d46713f/ios which uses Sirikit?
badabummbadabing t1_j6ypbxs wrote
Reply to comment by Imonfire1 in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
You mean you want to see more than the same random four people at once? I don't think there is a use case for that.
pm_me_your_pay_slips OP t1_j6ypajq wrote
Reply to comment by DigThatData in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
>This isn't a good definition for "memorization" because it's indistinguishable from how we define outliers.
The paper has this to say about your point
> If highly memorized observations are always given a low probability when they are included in the training data, then it would be straightforward to dismiss them as outliers that the model recognizes as such. However, we find that this is not universally the case for highly memorized observations, and a sizable proportion of them are likely only when they are included in the training data.
> Figure 3a shows the number of highly memorized and “regular” observations for bins of the log probability under the VAE model for CelebA, as well as example observations from both groups for different bins. Moreover, Figure 3b shows the proportion of highly memorized observations in each of the bins of the log probability under the model. While the latter figure shows that observations with low probability are more likely to be memorized, the former shows that a considerable proportion of highly memorized observations are as likely as regular observations when they are included in the training set. Indeed, more than half the highly memorized observations fall within the central 90% of log probability values.
TLDR if this method was giving you a high score to outliers only, then these samples would have low likelihood when they were included in the training data (because they are outliers). But the authors observed sizeable proportion of the samples with high memorization score to be as likely as regular (inlier) data.
maxToTheJ t1_j6yo1eq wrote
Reply to comment by znihilist in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
> They can't recreate other songs that has never been created or thought of.
AFAIK having a not copyrighting violating use doesnt excuse a copyright violating use.
nicholsz t1_j6yniui wrote
Reply to comment by puppet_pals in [D] ImageNet normalization vs [-1, 1] normalization by netw0rkf10w
With data augmentation techniques (especially contrast or luminance randomization), normalizing would end up being a no-op in the end, right?
DigThatData t1_j6ynesq wrote
Reply to comment by pm_me_your_pay_slips in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
> p(sample| dataset including sample)/p(sample| dataset excluding sample) )
which, like I said, is basically identical to statistical leverage. If you haven't seen it before, you can compute LOOCV for a regression model directly from the hat matrix (which is another name for the matrix of leverage values). This isn't a good definition for "memorization" because it's indistinguishable from how we define outliers.
> What's the definition of memorization here? how do we measure it?
I'd argue that what's at issue here is differentiating between memorization and learning. My concern regarding the density ratio here is that a model that had learned to generalize well in the neighborhood of the observation in question would behave the same way, so this definition of memorization doesn't differentiate between memorization and learning, which I think effectively renders it useless.
I don't love everything about the paper you linked in the OP, but I think they're on the right track by defining their "memorization" measure by probing the model's ability to regenerate presumably memorized data, especially since our main concern wrt memorization is in regards to the model reproducing memorized values.
Nhabls t1_j6ymx0w wrote
Reply to comment by LetterRip in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
I seriously doubt they have been able to do what you just described.
Not to mention a rented double gpu setup, even the one you described would run you into the dozen(s) of dollars per day, not 2.
Nhabls t1_j6ymqwr wrote
Reply to comment by AristosTotalis in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Well OpenAI also, in that scenario, got a massive on demand compute infrastructure at cost, that's a good deal both ways.
time_flask t1_j6ymeqz wrote
Reply to comment by Terkala in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
Technically yes. When something breaks and you recall that meeting where we said we'd pick it up but just didn't
Nhabls t1_j6ymdmr wrote
Reply to comment by bokonator in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
The 10 Billion dollar deal is, reportedly, giving microsoft 75% of OpenAI's profits until a certain threshold, that's more than just any given model
2blazen t1_j6yluho wrote
Reply to comment by TrevorIRL in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
>that’s some pretty amazing margins
That's just the (estimated) hardware uptime cost, you haven't mentioned the wages or the R&D investment
[deleted] t1_j6ylkd9 wrote
Reply to comment by wintermute93 in [N] Microsoft integrates GPT 3.5 into Teams by bikeskata
[deleted]
pm_me_your_pay_slips OP t1_j6yl0wq wrote
Reply to comment by DigThatData in [R] Extracting Training Data from Diffusion Models by pm_me_your_pay_slips
The first paper proposes a way of quantifying memorization by looking at pairs of prefixes and postfixes and observing whether the postfixes wer generated by the model when the prefixes were used as prompts.
The second paper has this to say about generalization:
> A natural question at this point is to ask why larger models memorize faster? Typically, memorization is associated with overfitting, which offers a potentially simple explanation. In order to disentangle memorization from overfitting, we examine memorization before overfitting occurs, where we define overfitting occurring as the first epoch when the perplexity of the language model on a validation set increases. Surprisingly, we see in Figure 4 that as we increase the number of parameters, memorization before overfitting generally increases, indicating that overfitting by itself cannot completely explain the properties of memorization dynamics as model scale increases.
In fact, this is the title of the paper: "Memorization without overfitting".
> Anyway, need to read this closer, but "lower posterior likelihood" to me seems fundamentally different from "memorized".
The memorization score is not "lower posterior likelihood", but the log density ratio for a sample: log( p(sample| dataset including sample)/p(sample| dataset excluding sample) ) . Thus, a high memorization score is given to samples that go from very unlikely when not included to as likely as the average sample when included in the training data, or from as likely as the average training sample when not included in the training data to above-average likelihood when included.
2blazen t1_j6ykrcq wrote
Reply to comment by arhetorical in [N] OpenAI starts selling subscriptions to its ChatGPT bot by bikeskata
I've been using the GPT3 API for around 0.4c per request with 0 down time. With my current usage this sums up to around 10c a day, 3usd per month. I don't see how 20usd is reasonable
ProSmokerPlayer t1_j6yyip6 wrote
Reply to comment by Much_Blacksmith_1857 in [P] AI Poker/Machine Learning/Game-Theory by Much_Blacksmith_1857
No, this has been solved already for all stack sizes, look up MonkerSolver.