Recent comments in /f/MachineLearning

ateqio OP t1_j8h5g16 wrote

You're right.

The problem is, people (especially professors) are going to look for it no matter what.

Just look at the stats. Roberta OpenAI detector was downloaded a whopping 114k times in just last month. It clearly states not to use it as ChatGPT detector but I see a lot of it's implementations

Better to educate users with a big fat disclaimer and a tool

1

Main_Mathematician77 t1_j8h3v8z wrote

The best thing I can thing of that relates to this is based off LAIONs style attribution knn index search for their 5B image dataset. A similar approach could be done for text - search over text for similar samples. But again no guarantee however it’s fairly interpretable. the dataset of generations from chatgpt for 100M users is growing fast and searching over it is most likely improbable at the current pricing options . Also, As you said using gpt2 to measure perplexity is good for catching gpt generated text, but it’s not a perfect solution imo

1

ateqio OP t1_j8h1po3 wrote

I'm totally aware of that and I will be putting a disclaimer in front page, not buried in a Terms and Conditions link somewhere.

The tools currently available can ruin a student's life by not explicitly mentioning it.

I want to address that issue by providing a solution that comes at top of the search and informing professors about limitations as explicitly as possible

3

cdsmith t1_j8gq1gt wrote

Imagine you just didn't invest those millions of dollars, then, and instead someone else developed the idea and didn't want to freeze the rest of the world out of using it.

Patents only makes sense if you assume that the alternative to you inventing something is no one inventing it. Experience shows that's very rarely the case; in general, when an idea's time has come (the base knowledge is there to understand it, the infrastructure is in place to use it effectively, etc.), there is a race between many parties to develop the idea. Applies to everything from machine learning models to the light bulb or telephone, both of which were famously being developed by multiple inventors simultaneously before one person got lucky, often by a matter of mere days, and was issued an exclusive license to the invention, while everyone else who had the same idea was out of luck.

1

cdsmith t1_j8gpcrf wrote

We're off-topic for this forum, but since we're here anyway...

Patents are tricky when it comes to stuff like this. To successfully patent something software related, you must be able to convince the patent office that what you're patenting counts as a "process", and not as an "idea", or "concept" or "principle" or "algorithm", all of which are explicitly not patentable. The nuances of how you draw the lines between these categories are fairly complex, but in general it often comes down to being able to patent engineering details of HOW you do something in the face of a bunch of real-world constraints, but not WHAT you are doing or any broad generalization of the bigger picture.

It's likely that Swype didn't just screw up and write their patent poorly, but rather wrote the only patent their legal team could succeed in getting approved. If it didn't apply to what other companies did later because they used a different "process" (for nuanced lawyer meanings of that word) to accomplish the same goal, that is an intentional feature of the patent system, not a failure by Swype.

1

ImZanga t1_j8gh6ci wrote

Also interested been looking into this myself. Some materials of potential interest:

​

Official TikTok blog post describing with very limited detail how it works: How TikTok recommends videos #ForYou

​

Papers:

​

WSJ did a video on them trying to reverse engineer the algorithm, not too technical though Investigation: How TikTok's Algorithm Figures Out Your Deepest Desires

Some blogs I came across that may or may not be reliable:

18

dancingnightly t1_j8g0oqx wrote

Hold on Jurasstic is here from April 2022 I believe with something fairly similar:

https://arxiv.org/pdf/2204.10019.pdf

https://www.ai21.com/blog/jurassic-x-crossing-the-neuro-symbolic-chasm-with-the-mrkl-system

It didn't learn for new tools I think, but it did work well for calculations and wiki search.

3

MurlocXYZ OP t1_j8fwzw7 wrote

The posts I'm referring to are typically poorly constructed philosophical arguments on ChatGPT, or just straight up "how does it work". I do not want to gatekeep. I like that ML is hyped and new people are interested. But we have separare threads for beginner questions and/or tutorials, as per this subreddit's About section, specifically to avoid spammy posts.

5

VelveteenAmbush t1_j8fusa5 wrote

> I feel that the real challenge is to control language models using structured data, perform planning, etc.

I think the promise of tool-equipped LLMs is that these tools may be able to serve that sort of purpose (as well as, like, being calculators and running wikipedia queries). Could imagine an LLM using a database module as a long-term memory, to keep a list of instrumental goals, etc.. You could even give it access to a module that lets it fine-tune itself or create successor LLMs in some manner. All very speculative of course.

3