Recent comments in /f/MachineLearning

currentscurrents t1_j7ioshb wrote

> Besides, it's not clear to me whether these AI tools be used to benefit humanity as a whole

Of course they benefit humanity as a whole.

  • Language models allow computers to understand complex ideas expressed in plain english.
  • Automating art production will make custom art/comics/movies cheap and readily available.
  • ChatGPT-style AIs (if they can fix hallucination/accuracy problems) give you an oracle with all the knowledge of the internet.
  • They're getting less hype right now, but there's big advances in computer vision (CNNs/Vision Transformers) that are revolutionizing robotics and image processing.

>I really hope this case sets ome decent precedents about how AI developers can use data they did not create.

You didn't create the data you used to train your brain, much of which was copyrighted. I see no reason why we should put that restriction on people trying to create artificial brains.

4

aicharades OP t1_j7invof wrote

This is a series of open source libraries that can extend OpenAI completions model: https://langchain.readthedocs.io/en/latest/

You can make your own ChatGPT with its own reference library versus the current pre-2022 snapshot.

It’s possible to leapfrog chatgpt with LangChain and OpenAI Completions (excluding some of their labeled training data) until gpt4 comes out

−4

currentscurrents t1_j7innd5 wrote

Getty is just the test case for the question of copyright and AI.

If you can't train models on copyrighted data this means that they can't learn information from the web outside of specific openly-licensed websites like Wikipedia. This would sharply limit their usefulness. It also seems distinctly unfair, since copyright is only supposed to protect the specific arrangement of words or pixels, not the information they contain or the artistic style they're in.

The big tech companies can afford to license content from Getty, but us little guys can't. If they win it will effectively kill open-source AI.

16

-Rizhiy- t1_j7ilv1v wrote

I feel that they won't be trying to generate novel responses from the model, but rather take knowledge graph + relevant data from the first few responses and ask the model to summarise that/change into an answer which humans find appealing.

That way you don't have to rely on the model to remember stuff, it can access all required information through attention.

14

clueless1245 t1_j7il1dv wrote

Your model is learning to do is predict future market data from past market data, which fundamentally is not worthwhile because market data hinges on real-world news. If you want massive quantities of real-world news data in a structured/tagged format, look at GDELT.

https://www.gdeltproject.org/

Also, look at using Kaggle's GPU notebooks instead of Google's. You get 30 hours a week if you verify with your phone number, instead of Google's arbitrary secret heuristic based cutoff. Or look at something like runpod or vast.ai, rates for non secure GPUs are like a few cents an hour and datacenter GPUs not that expensive either.

P.S There are arbitrage opportunities you can spot using purely market data, but those are generally very short-term, don't warrant powerful models to detect and are pounced on by trading bots run by trading firms.

1