Recent comments in /f/MachineLearning

currentscurrents t1_javx4pw wrote

The Winograd Schema is a test of commonsense reasoning. It's hard because it requires not just knowledge of english, but also knowledge of the real world.

But as you found, it's pretty much solved now. As of 2019 LLMs could complete it with better than 90% accuracy, which means it was actually already solved when Tom Scott made his video.

15

LetterRip t1_javpxbv wrote

> I mean... why were they not doing this already? They would have to code it but it seems like low hanging fruit

GPT-3 came out in 2020 (they had their initial price then a modest price drop early on).

Flash attention is June of 2022.

Quantization we've only figured out how to do it fairly lossless recently (especially int4). Tim Dettmers LLM int8 is from August 2022.

https://arxiv.org/abs/2208.07339

> That seems large, which paper has that?

See

https://github.com/HazyResearch/flash-attention/raw/main/assets/flashattn_memory.jpg

>We show memory savings in this graph (note that memory footprint is the same no matter if you use dropout or masking). Memory savings are proportional to sequence length -- since standard attention has memory quadratic in sequence length, whereas FlashAttention has memory linear in sequence length. We see 10X memory savings at sequence length 2K, and 20X at 4K. As a result, FlashAttention can scale to much longer sequence lengths.

https://github.com/HazyResearch/flash-attention

1

DSM-6 t1_javnmz2 wrote

Personally, I think the answer is existing bias in the training data.

I don’t know enough about chatgpt to state this as fact, but I think it’s safe to assume that chatgpt understands or adheres to grammar rules. I.e. nowhere in the code does it state “antecedent pronouns should refer to the subject of a sentence”

Instead I assume chatgpt grammar comes repeated convention in the training data. Enough data in which the antecedent refers to something other then the sentence object means that the “they” can refer to any of the preceding nouns. In that case “councilmen fear voilence” is a far more common sentence in the training than “protesters fear violence”

Then again your example was passive tense, so I dunno 🤷‍♀️.

5