Recent comments in /f/MachineLearning
EmmyNoetherRing t1_j5ulz6t wrote
So-- a few things
ChatGPT doesn't currently have access to the internet, although it's obviously working with data it scraped in the recent past, and I expect searching wikipedia from 2021 is sufficient to answer a wide array of queries, which is why it feels like it has internet access when you ask it questions.
ChatGPT is effective because it's been trained on an unimaginably large set of data, and had an unknown large number of human hours gone into supervised/interactive/online/reinforcement/(whatever) learning where an army of contractors has trained it how to deal well with arbitrary human prompts. You don't really want an AI trained just on your data set by itself.
But ChatGPT (or just plain GPT3) is great for summarizing bodies of text as it is right now. I expect you should be able to google how to nicely ask GPT3 to summarize your notes or answer questions with respect to them.
FastestLearner t1_j5ul7uu wrote
Reply to comment by K_fortytwo in [R] Best service for scientific paper correction by Meddhouib10
Oh no. There’s nothing wrong. I think it’s just an inferior tool for the amount of ads they show everyone on the internet. I’ve met people who are overly enthusiastic about Grammarly (coz they’ve been biased with all the ads they’ve seen) and I think it’s overrated for what it is. People fall for overrated over-advertised products a lot and make bad decisions in the process. Reminds me of the paperlike screen protector ad on every other iPad review video. The product is not at all bad but considerably overhyped. What this kind of unhealthy hype does is that it creates a bad smoky atmosphere, which doesn’t let other products shine through even though they are equally good (in this case Quillbot is arguably better).
That’s said, if Grammarly works for you, then you should definitely choose it.
LetWrong1932 t1_j5ul63q wrote
Reply to [D] CVPR Reviews are out by banmeyoucoward
just curious, if a reviewer changes his/her score, can authors see it immediately or when final decision appears?
qalis t1_j5ukjvr wrote
ChatGPT does NOT retrieve any data at all from the internet. It merely remembers statistical patterns of words coming one after another in the typical texts. It has no knowledge of facts, and no means to get them whatsoever. It was also trained with data up to 2021, so there is no training data after that whatsoever. There was an older attempt with WebGPT, but it did not get anywhere AFAIK.
What you need is a semantic search model, which summarizes semantic information from texts as vectors and then performs vector search based on your query. You can use transformer-based model for text vectorization, of course, which may work reasonably well. For specific searches, however, I am pretty sure that in your use case regexes will be just fine.
If you are sure that you need semantic search, use domain-specific model like SciBERT for best results, or fine-tune some pretrained model from Huggingface.
koolaidman123 t1_j5uk2ai wrote
Reply to [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
cache your predictions on each smaller batch w/ labels until you get a similar batch size, then run your loss function
so instead of calculating loss and accumulating like gradient accumulation, you only calculate loss once you reach the target batch size
koolaidman123 t1_j5ujfpv wrote
Reply to comment by altmly in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
contrastive methods require in-batch negatives, you can't replicate that with grad accumulation
starfries t1_j5uhnv9 wrote
Reply to comment by bitchslayer78 in [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
Wait, has someone actually integrated Wolfram with ChatGPT? I thought it was still in the "would be cool" stage.
altmly t1_j5uglpx wrote
Reply to comment by RaptorDotCpp in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
I'm confused. Gradient accumulation is exactly equivalent to batching as long as the data is the same, unless you use things like batch norm (you shouldn't).
bitchslayer78 t1_j5ug86c wrote
Reply to comment by trutheality in [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
Integrating wolfram with chat gpt api works pretty well but your point stands particularly when it comes to logic ; the models might differentiate integrate , work on combinatorics , graph theory ,hell alphatensor even found new linear algebra algorithms but none of em can do any true logic based activities like coming up with original proofs
No_Cryptographer9806 t1_j5ufip4 wrote
Reply to [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
FastSiam: SimSiam that fit on one GPU small batch size (down to 32 smth) https://dl.acm.org/doi/abs/10.1007/978-3-031-16788-1_4
BinodBoppa t1_j5ue55n wrote
Reply to [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
> Lecum
Dude wth
Warm-Combination5374 t1_j5ud8k0 wrote
Reply to [D] CVPR Reviews are out by banmeyoucoward
3 weak rejects.. any chances with a rebuttal?
andreichiffa t1_j5uczy3 wrote
Reply to [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
*Lecun. And their Galactica was subject of so much ridicule that after pompous launch it was in-launched 48 hours later. OPT-175B is a clone of OpenAI’s GPT3, but performs worth and is essentially a massive pain in the ass cyber-security and phishing/desinformation.
Lecun always was into CovNets for machine vision - text-to-text is Hinton, Bengio, and Sutskever.
So far it looks like Baidu and Google have bigger transformer-based models that could perform better, but only Google’s PaLM is architecturally different enough to potentially perform better.
There are also augmented variants of Transformer-based model that are capable of more factual response, but they tend to be less conversational.
trutheality t1_j5ubis4 wrote
Reply to [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
ChatGPT is good at syntax, but it's worse than pretty much any rule-based system at logic or arithmetic. So depending on your task, something like IBM Watson could be considered more advanced because it has dedicated rule-based reasoning. All it takes for MS or Google to make a "more advanced" system is just couple a large language model with a logic engine.
Irate_Librarian1503 t1_j5uaqwq wrote
Reply to [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
Barlow twins, maybe? Easy to implement and batch size effective.
NotARedditUser3 t1_j5uabt1 wrote
Reply to [D]Are there any known AI systems today that are significantly more advanced than chatGPT ? by Xeiristotle
This is a joke, but Visual Mod on the Wallstreet bets sub has been looking like a human for quite a while. It's shockingly good.
LetWrong1932 t1_j5u9j27 wrote
Reply to [D] CVPR Reviews are out by banmeyoucoward
1 wa, 2 b, and 1 r ... is there any chance for me?
melgor89 t1_j5u766t wrote
Reply to [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
There is a great paper about analyzing batch size vs accuracy correlation. They propose loss function, which is able to learn SimClr on bs=256 instead of 4k. So, there is some research in this domain. https://arxiv.org/abs/2110.06848
melgor89 t1_j5u6pdr wrote
Reply to comment by mgwizdala in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
As said in the topic, gradient accumulation is not a solution. However, gradient checkpointing could be. https://paperswithcode.com/method/gradient-checkpointing It recompute some of the features map during backwards pass so that they are not stored in memory. So you can fit bigger batch size
shingekichan1996 OP t1_j5u40dx wrote
Reply to comment by mgwizdala in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
exactly the paper I need to read! Thanks!
Expensive-Track t1_j5u30ch wrote
Reply to [D] CVPR Reviews are out by banmeyoucoward
What's the scale of reviewer confidence scores?
Why isn't there a clear guide about this on the website or anywhere else? :/
mgwizdala t1_j5u2mgr wrote
Reply to comment by shingekichan1996 in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
It depends on implementation. Naive gradient accumulation will probably give better results than small batches, but as u/RaptorDotCpp mentioned, if you relay on many negative samples inside one batch, it will still be worse than a large batch training.
There is also a cool paper about gradient caching, which somehow solves this issue, but again with an additional penalty on training speed. https://arxiv.org/pdf/2101.06983v2.pdf
K_fortytwo t1_j5u2kgu wrote
Reply to comment by FastestLearner in [R] Best service for scientific paper correction by Meddhouib10
Just curious, what’s wrong with Grammarly?
shingekichan1996 OP t1_j5u22zn wrote
Reply to comment by mgwizdala in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
Curious about this, I have not read any paper related. What is its effect on the performance (accuracy, etc) ?
Kacper-Lukawski t1_j5um27o wrote
Reply to comment by qalis in [D] Efficient retrieval of research information for graduate research by [deleted]
Moreover, you need a proper vector database to avoid kNN-like full scans for every query to run a semantic search at scale. Qdrant (https://qdrant.tech) is one of the options, probably the fastest according to benchmarks.