Recent comments in /f/MachineLearning

LetGoAndBeReal t1_j4t6yb6 wrote

I'm a bit unclear why this announcement is so significant, and frankly I'm not even sure I understand it. We already have API access to the text-davinci-003 model, and my understanding is that ChatGPT basically uses the same model with a small amount of incremental tuning.

Is this announcement just saying that this marginally revised model will now be available as a model option through the OpenAI API? If so, what benefit does this provide over the API access using text-davinci-003?

2

gdpoc t1_j4ssata wrote

Chat gpt (large language models, in general) is a great generalist and would be likely very useful in predicting 'root node' locations in a knowledge graph which would allow finding the correct content from a minimal subset.

Chat gpt sucks with details, yes, but for use in a recommendation algorithm which depends on the graph, I think that issue could be minimized.

1

GPT-5entient t1_j4s8q64 wrote

Reply to comment by ThirdMover in [D] Bitter lesson 2.0? by Tea_Pearce

>I think the point of the metaphor was Amazon stealing product ideas from third party vendors on their site and undercutting them. They know what sells better than anyone and can then just produce it.

In many cases they are probably just selling the same white label item outright, just slapping on "Amazon Basics"...

1

mrconter1 OP t1_j4rsus2 wrote

Really appreciate your feedback.

> The distinctions you’re drawing, pixels vs selenium output and browser vs os, are far less significant than the complexity of the tasks (step-by-step vs entire processes). What they’ve achieved is strictly harder for humans than what you are testing. We can argue whether perception or planning are harder for current technology (the computer vision is far more developed than AI planning right now), but I think you need to reconsider the formulation of your tasks. It seems like they are designed to be easy enough for modern methods to solve.

I'm not sure about this. Being able to do the next click on a large diversified benchmark of screenshot is extremely difficult for a computer today. It would need to be able to:

  • Choose the next chess move if I am in a chess application
  • Recognize the color palette icon on the keyboard if I ask it to change the color of the keyboard
  • Recognize the Gmail icon of I say "send an email"
  • Change keyboard mode in if I ask it to write an exclamation mark
  • Press the key "2" if I ask it to type the number equivalent to the number of consuls that traditionally held the office at the same time in ancient Rome.

That's way outside what current models can do. But humans could do it easily. This benchmark would be extremely simple and intuitive for humans to complete (even with far fetched goals) but there is no model today capable of even knowing that you should press on the new line location given a screenshot and "Add line" today.

> On another note, most interesting tasks can’t be completed with just an x,y mouse location output. Why did you decide to restrict the benchmark to such a limited set of tasks?

I wrote about this in the ReadMe. There is no reason. It's just easier to explain the idea for people. I think the most powerful variant of this idea would take a series of frames (video context) and instructions and output something of the following:

  • Click
  • Press (X seconds)
  • Move from P1 to P2 (X seconds)

The benchmark is simple enough to understand and explain so that you can start to envision what such a model would be able to do. Or much more interesting. What would it not be able to do.

If you have any more feedback or thoughts please reply. I wish more people were interested but either the idea sucked or I need to create something interactive for people.

1