Recent comments in /f/MachineLearning
andreichiffa t1_j8h43hh wrote
Reply to [D] Looking for recommendations for an affordable API service to classify AI-generated text by ateqio
You can’t. Anyone with enough technical knowledge will not want to go anywhere near legal ramifications and responsibility it implies (in addition to looking like a clown in about 10 minutes of uptime once bypasses are found).
There are fundamental limitations on detectability as of now.
Main_Mathematician77 t1_j8h3v8z wrote
Reply to comment by ateqio in [D] Looking for recommendations for an affordable API service to classify AI-generated text by ateqio
The best thing I can thing of that relates to this is based off LAIONs style attribution knn index search for their 5B image dataset. A similar approach could be done for text - search over text for similar samples. But again no guarantee however it’s fairly interpretable. the dataset of generations from chatgpt for 100M users is growing fast and searching over it is most likely improbable at the current pricing options . Also, As you said using gpt2 to measure perplexity is good for catching gpt generated text, but it’s not a perfect solution imo
Final-Rush759 t1_j8h3qgi wrote
Whatever you can read are outdated. They don't reveal what they actually use. They are rumored to have the best recommendation system.
ateqio OP t1_j8h1po3 wrote
Reply to comment by Main_Mathematician77 in [D] Looking for recommendations for an affordable API service to classify AI-generated text by ateqio
I'm totally aware of that and I will be putting a disclaimer in front page, not buried in a Terms and Conditions link somewhere.
The tools currently available can ruin a student's life by not explicitly mentioning it.
I want to address that issue by providing a solution that comes at top of the search and informing professors about limitations as explicitly as possible
sumguysr t1_j8h1p7g wrote
Reply to [D] Have their been any attempts to create a programming language specifically for machine learning? by throwaway957280
Yeah that's differentiable programming. Julia is probably the biggest one but there's others.
cheeseler t1_j8h1h3j wrote
Reply to [R] DIGIFACE-1M — synthetic dataset with one million images for face recognition by t0ns0fph0t0ns
Something about the last row reminds me of Zoombinis
Main_Mathematician77 t1_j8h1by4 wrote
Reply to [D] Looking for recommendations for an affordable API service to classify AI-generated text by ateqio
Imo You’re not going to be able to provide a reliable service currently with out of the box solutions. The systems aren’t reliable enough to be certain especially when it can lead to false positives that can falsely defame someone
flamonster92 t1_j8gz5yk wrote
Reply to [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
Imagine an AI that could write another AI.
cdsmith t1_j8gq1gt wrote
Reply to comment by I_will_delete_myself in [D] Can Google sue OpenAI for using the Transformer in their products? by t0t0t4t4
Imagine you just didn't invest those millions of dollars, then, and instead someone else developed the idea and didn't want to freeze the rest of the world out of using it.
Patents only makes sense if you assume that the alternative to you inventing something is no one inventing it. Experience shows that's very rarely the case; in general, when an idea's time has come (the base knowledge is there to understand it, the infrastructure is in place to use it effectively, etc.), there is a race between many parties to develop the idea. Applies to everything from machine learning models to the light bulb or telephone, both of which were famously being developed by multiple inventors simultaneously before one person got lucky, often by a matter of mere days, and was issued an exclusive license to the invention, while everyone else who had the same idea was out of luck.
cdsmith t1_j8gpcrf wrote
Reply to comment by womenrespecter-69 in [D] Can Google sue OpenAI for using the Transformer in their products? by t0t0t4t4
We're off-topic for this forum, but since we're here anyway...
Patents are tricky when it comes to stuff like this. To successfully patent something software related, you must be able to convince the patent office that what you're patenting counts as a "process", and not as an "idea", or "concept" or "principle" or "algorithm", all of which are explicitly not patentable. The nuances of how you draw the lines between these categories are fairly complex, but in general it often comes down to being able to patent engineering details of HOW you do something in the face of a bunch of real-world constraints, but not WHAT you are doing or any broad generalization of the bigger picture.
It's likely that Swype didn't just screw up and write their patent poorly, but rather wrote the only patent their legal team could succeed in getting approved. If it didn't apply to what other companies did later because they used a different "process" (for nuanced lawyer meanings of that word) to accomplish the same goal, that is an intentional feature of the patent system, not a failure by Swype.
velcher t1_j8glba7 wrote
Reply to comment by dojoteef in [D] Quality of posts in this sub going down by MurlocXYZ
Could ML or simple rule-based filters help us out here?
ImZanga t1_j8gh6ci wrote
Also interested been looking into this myself. Some materials of potential interest:
​
Official TikTok blog post describing with very limited detail how it works: How TikTok recommends videos #ForYou
​
Papers:
- An Empirical Investigation of Personalization Factors on TikTok (2022) - sock puppet methodology to identify the parameters and their strength in influencing the algo
- Analysis on the “Douyin (Tiktok) Mania” Phenomenon Based on Recommendation Algorithms (2021)
- Trick and Please. A Mixed-Method Study On User Assumptions About the TikTok Algorithm (2021)
- Leveraging Rights of Data Subjects for Social Media Analysis: Studying TikTok via Data Donations (2023) - may be of interest
​
WSJ did a video on them trying to reverse engineer the algorithm, not too technical though Investigation: How TikTok's Algorithm Figures Out Your Deepest Desires
Some blogs I came across that may or may not be reliable:
chhaya_35 OP t1_j8gfvml wrote
Reply to comment by Glum-Mortgage-5860 in [D] What are resources to start with GNN and GraphML? by chhaya_35
Thanks!!
chhaya_35 OP t1_j8gfsbz wrote
Reply to comment by ___luigi in [D] What are resources to start with GNN and GraphML? by chhaya_35
Thanks!!
yaosio t1_j8gerab wrote
Reply to comment by Cherubin0 in [R] [P] OpenAssistant is a fully open-source chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. by radi-cho
It's in the data collection stage. It's being run by LAION.
saturn_since_day1 t1_j8gcvbh wrote
Reply to [D] Simple Questions Thread by AutoModerator
I recently tried out bloom, https://bigscience.huggingface.co/blog/bloom which is supposedly the biggest open source LLM. Is this really the state of the art for language models that are publcally available?
starfries t1_j8gcrzo wrote
Reply to comment by mindmech in [D] Quality of posts in this sub going down by MurlocXYZ
Me too. There's a lot of great people I want to hear from but only when they post about ML, not politics.
Kitchen_Tower2800 t1_j8gb25a wrote
Typically, large companies don't use a single model, but rather a large number of different models, all performing different tasks (recommending, filtering, etc). It would very difficult to describe the complete recommendations pipeline (i.e from user request to final candidate) in a single academic paper.
imaginethezmell t1_j8g4f64 wrote
Reply to comment by big_gondola in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
there are apis for auto ml already
it can simply learn the task to use other ai to create models
its over
Cherubin0 t1_j8g35sa wrote
Reply to [R] [P] OpenAssistant is a fully open-source chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so. by radi-cho
I am confused. Does a model already exist or is it only in a data collection stage?
dancingnightly t1_j8g0oqx wrote
Reply to comment by EducationalCicada in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
Hold on Jurasstic is here from April 2022 I believe with something fairly similar:
https://arxiv.org/pdf/2204.10019.pdf
https://www.ai21.com/blog/jurassic-x-crossing-the-neuro-symbolic-chasm-with-the-mrkl-system
It didn't learn for new tools I think, but it did work well for calculations and wiki search.
MurlocXYZ OP t1_j8fwzw7 wrote
Reply to comment by Borrowedshorts in [D] Quality of posts in this sub going down by MurlocXYZ
The posts I'm referring to are typically poorly constructed philosophical arguments on ChatGPT, or just straight up "how does it work". I do not want to gatekeep. I like that ML is hyped and new people are interested. But we have separare threads for beginner questions and/or tutorials, as per this subreddit's About section, specifically to avoid spammy posts.
MustBeSomethingThere t1_j8fvp24 wrote
Reply to comment by radi-cho in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
As far as I understand, many of those lucidrains repos doesn't contain the needed AI model. In this case too, that Toolformer AI model is not publicly available.
VelveteenAmbush t1_j8fusa5 wrote
Reply to comment by pyepyepie in [R] [N] Toolformer: Language Models Can Teach Themselves to Use Tools - paper by Meta AI Research by radi-cho
> I feel that the real challenge is to control language models using structured data, perform planning, etc.
I think the promise of tool-equipped LLMs is that these tools may be able to serve that sort of purpose (as well as, like, being calculators and running wikipedia queries). Could imagine an LLM using a database module as a long-term memory, to keep a list of instrumental goals, etc.. You could even give it access to a module that lets it fine-tune itself or create successor LLMs in some manner. All very speculative of course.
ateqio OP t1_j8h5g16 wrote
Reply to comment by andreichiffa in [D] Looking for recommendations for an affordable API service to classify AI-generated text by ateqio
You're right.
The problem is, people (especially professors) are going to look for it no matter what.
Just look at the stats. Roberta OpenAI detector was downloaded a whopping 114k times in just last month. It clearly states not to use it as ChatGPT detector but I see a lot of it's implementations
Better to educate users with a big fat disclaimer and a tool