Recent comments in /f/MachineLearning

EmmyNoetherRing t1_j5ulz6t wrote

So-- a few things

ChatGPT doesn't currently have access to the internet, although it's obviously working with data it scraped in the recent past, and I expect searching wikipedia from 2021 is sufficient to answer a wide array of queries, which is why it feels like it has internet access when you ask it questions.

ChatGPT is effective because it's been trained on an unimaginably large set of data, and had an unknown large number of human hours gone into supervised/interactive/online/reinforcement/(whatever) learning where an army of contractors has trained it how to deal well with arbitrary human prompts. You don't really want an AI trained just on your data set by itself.

But ChatGPT (or just plain GPT3) is great for summarizing bodies of text as it is right now. I expect you should be able to google how to nicely ask GPT3 to summarize your notes or answer questions with respect to them.

7

FastestLearner t1_j5ul7uu wrote

Oh no. There’s nothing wrong. I think it’s just an inferior tool for the amount of ads they show everyone on the internet. I’ve met people who are overly enthusiastic about Grammarly (coz they’ve been biased with all the ads they’ve seen) and I think it’s overrated for what it is. People fall for overrated over-advertised products a lot and make bad decisions in the process. Reminds me of the paperlike screen protector ad on every other iPad review video. The product is not at all bad but considerably overhyped. What this kind of unhealthy hype does is that it creates a bad smoky atmosphere, which doesn’t let other products shine through even though they are equally good (in this case Quillbot is arguably better).

That’s said, if Grammarly works for you, then you should definitely choose it.

6

qalis t1_j5ukjvr wrote

ChatGPT does NOT retrieve any data at all from the internet. It merely remembers statistical patterns of words coming one after another in the typical texts. It has no knowledge of facts, and no means to get them whatsoever. It was also trained with data up to 2021, so there is no training data after that whatsoever. There was an older attempt with WebGPT, but it did not get anywhere AFAIK.

What you need is a semantic search model, which summarizes semantic information from texts as vectors and then performs vector search based on your query. You can use transformer-based model for text vectorization, of course, which may work reasonably well. For specific searches, however, I am pretty sure that in your use case regexes will be just fine.

If you are sure that you need semantic search, use domain-specific model like SciBERT for best results, or fine-tune some pretrained model from Huggingface.

7

bitchslayer78 t1_j5ug86c wrote

Integrating wolfram with chat gpt api works pretty well but your point stands particularly when it comes to logic ; the models might differentiate integrate , work on combinatorics , graph theory ,hell alphatensor even found new linear algebra algorithms but none of em can do any true logic based activities like coming up with original proofs

3

andreichiffa t1_j5uczy3 wrote

*Lecun. And their Galactica was subject of so much ridicule that after pompous launch it was in-launched 48 hours later. OPT-175B is a clone of OpenAI’s GPT3, but performs worth and is essentially a massive pain in the ass cyber-security and phishing/desinformation.

Lecun always was into CovNets for machine vision - text-to-text is Hinton, Bengio, and Sutskever.

So far it looks like Baidu and Google have bigger transformer-based models that could perform better, but only Google’s PaLM is architecturally different enough to potentially perform better.

There are also augmented variants of Transformer-based model that are capable of more factual response, but they tend to be less conversational.

1

trutheality t1_j5ubis4 wrote

ChatGPT is good at syntax, but it's worse than pretty much any rule-based system at logic or arithmetic. So depending on your task, something like IBM Watson could be considered more advanced because it has dedicated rule-based reasoning. All it takes for MS or Google to make a "more advanced" system is just couple a large language model with a logic engine.

2

melgor89 t1_j5u6pdr wrote

As said in the topic, gradient accumulation is not a solution. However, gradient checkpointing could be. https://paperswithcode.com/method/gradient-checkpointing It recompute some of the features map during backwards pass so that they are not stored in memory. So you can fit bigger batch size

1

mgwizdala t1_j5u2mgr wrote

It depends on implementation. Naive gradient accumulation will probably give better results than small batches, but as u/RaptorDotCpp mentioned, if you relay on many negative samples inside one batch, it will still be worse than a large batch training.

There is also a cool paper about gradient caching, which somehow solves this issue, but again with an additional penalty on training speed. https://arxiv.org/pdf/2101.06983v2.pdf

1