romek_ziomek t1_j5hhn2h wrote on January 23, 2023 at 1:05 AM

Reply to comment by Maxerature in [D] Multiple Different GPUs? by Maxerature

Of course you can use them both, provided that you have free PCI-E slots. I use 3060 and 2060 super in my setup. I'm not sure what exactly you wanna do, I can tell you that I'm working in pytorch and it's a painless process, you can choose with one variable which gpu to use, or use one wrapper class (DataParallel) to train on both of them simultaneously. One trick that was specific to my motherboard and I had to figure out by trial and error (since it wasn't in the documentation) was that my second gpu wouldn't work if I had two NVMe drives installed. Other than that it works flawlessly.

sothatsit t1_j5hhb31 wrote on January 23, 2023 at 1:03 AM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

I’ve actually done some work on this and the real issue here is that:

You’d need a lot of text from other sources with people’s real names.
You’d need the user to have written a lot of Reddit comments or posts.
The style of user’s writing would need to match between Reddit and your other source.

If you’re interested though, I made the following library for my Master’s thesis, which can be used for this: https://github.com/TycheLibrary/Tyche

However, it would need more work to get close to identifying thousands, never mind millions, of users.

Loquzofaricoalaphar OP t1_j5hf96p wrote on January 23, 2023 at 12:49 AM

Reply to comment by neanderthal_math in [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

Thanks, that’s very interesting resource.

neanderthal_math t1_j5henyu wrote on January 23, 2023 at 12:45 AM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

People have been working on the Author Identification problem for about 20 years.

https://dergipark.org.tr/en/download/article-file/2482752

https://en.wikipedia.org/wiki/Author_profiling?wprov=sfti1

There is no way to unmask all of Reddit though. Too many people and many text samples are way too short. Some Redditors only speak in emoji and gif.

Forsaken-Indication t1_j5hc9hz wrote on January 23, 2023 at 12:29 AM

Reply to comment by jpercivalhackworth in [D] How to deal with COVID-19-era data for time series forecasting? by PM_ME_YOUR_GIGI

OP said it did. And that after Jan 2022 they see a return to some sort of baseline.

Trying to predict the next global pandemic as part of a product forecasting model seems pretty out-of-scope.

PryomancerMTGA t1_j5hbwrg wrote on January 23, 2023 at 12:26 AM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

Trying to match all businesses with fuzzy matching is hard enough when you have misspellings. To think you could identify redditors with any degree of certainty is optimistic at best.

Z1ndabad t1_j5hbncl wrote on January 23, 2023 at 12:24 AM

Reply to [D] Simple Questions Thread by AutoModerator

Hey guys, new to ML and cant seem to wrap my head around the concept. I was to make a used car price prediction model using large data set and most of the tutorials i watch just use the linear regression library. However can you use neural networks instead like Levenberg-marquat?

[deleted] t1_j5h9ci1 wrote on January 23, 2023 at 12:08 AM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

[removed]

Loquzofaricoalaphar OP t1_j5h6s4z wrote on January 22, 2023 at 11:51 PM

Reply to comment by PredictorX1 in [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

That is interesting to think about. I’m biased to think text patterns have lots of variables and are fairly unique. Perhaps it’s more of a model than compute problem to analyze it at scale and not get mush.

PredictorX1 t1_j5h5pb5 wrote on January 22, 2023 at 11:43 PM

Reply to comment by Loquzofaricoalaphar in [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

The biggest technical challenges I see:

Having enough reference samples from known people
The difference how people write on Reddit and how they write elsewhere (professional articles, e-mail, etc.: presumably used as reference)
If too many Reddit users are being considered, it may all dissolve into mush (estimated probabilities would all be low)

Loquzofaricoalaphar OP t1_j5h5kq4 wrote on January 22, 2023 at 11:42 PM

Reply to comment by [deleted] in [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

Perhaps It could return the top 10 likelihoods of the author of the account, some patterns of writing and and grammatical errors might be pretty unique and the more post it has the more unique right?

arkkienkeli t1_j5h5fa5 wrote on January 22, 2023 at 11:41 PM

Reply to ChatGPT is not all you need [R] by EduCGM

There was a paper with a similar message 2 years ago: https://arxiv.org/abs/2103.05247

Loquzofaricoalaphar OP t1_j5h59id wrote on January 22, 2023 at 11:40 PM

Reply to comment by PredictorX1 in [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

So like if you fed it 200 peoples samples you were looking and then fed it Reddit? Perhaps all of Reddit would be tricky because some might not have public text and it would be difficult to label all the text on Facebook or link-en, etc.

[deleted] t1_j5h4cp9 wrote on January 22, 2023 at 11:34 PM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

[deleted]

PredictorX1 t1_j5h3ymz wrote on January 22, 2023 at 11:31 PM

Reply to [D] With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data? by Loquzofaricoalaphar

>With more compute could it be easy to quickly un Mask all the people on Reddit by using text correlations to non masked publicly available text data?

With labeled samples of text, I think it would be pretty easy to come up with a a likelihood model, giving a reasonable educated guess of the identity of some Reddit members, and I don't think it would take much computing power.

jpercivalhackworth t1_j5h3cvx wrote on January 22, 2023 at 11:27 PM

Reply to [D] How to deal with COVID-19-era data for time series forecasting? by PM_ME_YOUR_GIGI

What makes you think that COVID is not going to impact the demand for your product?

NovaBom8 t1_j5h30af wrote on January 22, 2023 at 11:24 PM

Reply to [P] Benchmarking some PyTorch Inference Servers by op_prabhuomkar

Very cool, great work!!

In the context of running .pt (or any other device-agnostic filetypes), I’m guessing dynamic batching is the reason for Triton’s superior throughout?

Appropriate_Ant_4629 t1_j5gy6e3 wrote on January 22, 2023 at 10:53 PM

Reply to comment by ThisIsNotAnAlias in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

> Last I checked image watermarks were super weak against rotations

Obviously depends on the technique. The old-school popular technique of "slap a signature in the painting" like Dürer's stylized A/D logo is very robust to rotations, but not robust to cropping from the bottom in that case.

> seems to still be the case - but the better methods could cope with cropping way better than these.

It's near impossible to have a watermark technology that's robust to all transformations, at least if you reveal what watermark algorithm you used.

One easy attack that works on ~~most~~ some techniques, would be to just re-encode the content, but writing your own watermark over the original using the same watermarking algorithm.

kannkeinMathe t1_j5gxi7i wrote on January 22, 2023 at 10:48 PM

Reply to [D] Simple Questions Thread by AutoModerator

Hey you,
i want to build an chatbot for domain specify purpose, for example to talk with a person about its mental state and its depression. For that I would like to train the bot with texts from the domain.
So my question how should I start?
What is approach would you use? - Would you use an intent base solution?
What are the standard models for chatbots - BERT ?
Is it even possible to fine-tune models with large text corpuses ? - IF yes, how?
Thank you Guys

Mushroom_Philatelist t1_j5gw4zt wrote on January 22, 2023 at 10:40 PM

Reply to comment by artsybashev in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

https://github.com/Trusted-AI/adversarial-robustness-toolbox/blob/main/notebooks/README.md

mje-nz t1_j5gw3n5 wrote on January 22, 2023 at 10:40 PM

Reply to comment by twiztidsoulz in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

Are you talking about the model? We’re talking about the output. If you’re talking about signing the model, what does that achieve? If you’re talking about signing the output, how do you sign a chat transcript?

hasiemasie t1_j5gv221 wrote on January 22, 2023 at 10:33 PM

Reply to comment by Maxerature in [D] Multiple Different GPUs? by Maxerature

Not to my knowledge

careless25 t1_j5gu7tn wrote on January 22, 2023 at 10:28 PM

Reply to comment by franciscrot in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

Very simple explanation

Give each word a unique number. Add all the numbers up. That's your unique identifier for the gpt output

If you switched some words around the sum won't change very much.

Fabulous-Possible758 t1_j5gtzlk wrote on January 22, 2023 at 10:26 PM

Reply to comment by adt in [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

On phone so can’t read the blog yet: does it say how well it handles false positives? ie, flagging stuff not written by GPT as being written by GPT?

I could see a really shitty world coming about where the filter is effectively useless because everyone will need to have to make sure their content will pass the watermark detector.

fraktall t1_j5gtq5j wrote on January 22, 2023 at 10:25 PM

Reply to [D] Couldn't devs of major GPTs have added an invisible but detectable watermark in the models? by scarynut

It will be reverse engineered in now time. The output of the model is just text.

Recent comments in /f/MachineLearning