Recent comments in /f/MachineLearning

gwern t1_j9ff0ey wrote

> Only malicious questions will lead to malicious output.

That's not true, and has already been shown to be false by Sydney going off on users who seemed to doing harmless chats. You never know what it'll stochastically sample as a response.

Further, each time is different, as you really ought to know: the entire point of your technique is that at any time, Bing could refresh its search results (which search engines aspire to do in real time), and retrieve an entirely new set of results - any of which can prompt-inject Sydney to reprogram it to malicious output!

13

gwern t1_j9fekat wrote

> oh, My blog is written in Chinese, maybe non-English content will make NewBing less defensive.

GPT models are good at translating Chinese (eg https://www.reddit.com/r/MachineLearning/comments/1135tir/d_glm_130b_chineseenglish_bilingual_model/ the other day), so it can definitely read & understand your post if the Chinese text gets included in the context. Probably what would help is ensuring that Bing-the-search-engine either doesn't index it or it doesn't come up as a top hit for any queries; Sydney can't read anything outside the top 15 retrieved results. (I haven't seen any screenshots with >15 references listed, IIRC.)

2

mskogly t1_j9fbxbn wrote

Highspeed algoritmic trading is already running most of the tradevolume. I believe it will be very hard to use machine learning on this data unless you sit in the nexus between trades and can algotrade in the interim between when a buyer sends his order and the sellers accepts. There are «clearinghouses» in between who has better access to realtime data than ordinary endusers, so you will always loose. They make their money on both ends of the trade, from the seller and the buyer, plus can do multiple trades in between.

But perhaps take a look at sentiment analysis. A few years back someone made code that based their automatic trades on Trump tweets.

World events / news can cause massive shifts in stock value, which might be tradeable.

2

andreichiffa t1_j9fa9kz wrote

It's a grey area.

It's not general enough to warrant a full research paper, but on the other hand, it is equivalent to an SQL injection due to non-sanitation attack and would be reported as a CVE if we were in traditional programming.

I think eventually there will be a database like that, so save the prompt, date and context of the conversation, preferably somewhere that can has a timestamp (eg public github repo commit with a PGP signature), so that once the system goes live you can add to it.

3

PredictorX1 t1_j9f8ept wrote

As a start, I suggest learning the following:

Statistics:

- probability (distributions, basic manipulations)

- statistical summaries (univariate and bivariate)

- hypothesis testing / confidence intervals

- linear regression

Linear Algebra:

- basic understanding of arranging data in vectors and matrices

- operators (matrix multiplication, ...)

Calculus:

- limits

- basic differentiation and integration (at least of polynomials)

Information Theory (Discrete):

- entropy, joint entropy, conditional entropy, mutual information

0

IntrepidTieKnot t1_j9f89yr wrote

I build you one for 10 million dollars. Payment is upfront. It'll have a guaranteed prediction rate of almost 50%! So almost every second trade you execute, you will make profit!!! You just need to figure out which of the two trades is the profitable one.

But wait! I build you another model that can even predict that with an accuracy of almost 50%. Also upfront payment required.

You know what?

I'll discount you both models by 20% so you save a lot of money ordering both at once!

11

KrakenInAJar t1_j9f7aq2 wrote

The short answer is no.

I kind of have the feeling that these stock market system, especially those for extremely short time windows are the perpetuum mobile of the ML world. Everyone initially thinks it's a great idea (heck it's a money printer but ML) has a basic idea of how it could work, tries it and then fails.And there is an excellent reason why it fails. Short-term stock market data is probably the noisiest data you can get, which is a death sentence to any model that tries to work with it continuously and reliably.

STOP TRYING TO BUILD MONEY PRINTING MACHINES WITH ML IT IS NOT GONNA HAPPEN!

11

iosdevcoff t1_j9f51xx wrote

Honestly, what we’ve witnessed so far shows how even large corporations hurry to launch unrefined products if they believe it will benefit their perceived success. And they’ve done tremendous job in it. All the media coverage of how bing is fighting back is so much more important for them than a couple of nerdy guys figuring out it was just a simple pre-prompt. A lot to learn.

1

WarAndGeese t1_j9f40t7 wrote

If that's all it is then fair enough. I thought their long term threat model was for when we do eventually create sentient life.

If they were just sticking to things like language models and trying to align those, then their efforts could be aimed more at demilitarization, or for transparency in the corporate structure itself for corporations who would be creating and applying these language models. Because the AGIs that those groups create will be according to their own requirements. For example any military creating an AGI will forgo that sort of pro-human alignment. Hence efforts would have to be aimed at the hierarchies of the organisations who are likely to use AGIs in harmful ways, and not just at the transformer models. If that's just a task for a separate group though then I guess fair enough.

2

redditusername_x t1_j9f1odl wrote

What's it like being a Machine Learning engineer, really? I understand you're doing a lot of 'data pipelining', but how much building are you actually doing? Ever since I've started programming I've thought of over 25 projects to build... but only a subset feature ML concepts. I'd want to be able to program/build extensively but not exclusively using ML - am I better off being a SWE? PS: Currently in an MSCS, very ML-heavy.
I'd also appreciate resources/links to help guide me/understand the role a bit better.

1

polymorphicprism t1_j9f0wm2 wrote

Because what exists now is akin to an artificial stream of music. They can program guidelines for beats per minute. They can tell it to favor mimicking happy songs or songs people reported liking. It is a flaw of the listener to assume the jukebox is sentient or that it wants to accomplish its own goals. There's nothing to fool. Everybody who is working on this understands this (except the Google guy that lost perspective and got himself fired).

8