Recent comments in /f/MachineLearning

ElectronicCress3132 t1_j629tix wrote

> implement a gradient descent optimization process at inference time

Could you expand on what this means? At inference time, I thought all weights were frozen, so how could the attention layers be somehow performing gradient descent?

Edit: I read the paper in detail and understood it (walk through the math in Section 3). Basically, the sentence itself X has some weights that go through the attention layer (recall how attention works: it embeds the sentence, then multiplies it by key, value, query matrices). If you give it some examples, X', to learn from, well, of course there are going to be weights for both X, and X'. Turns out those weights for X' end up being equivalent to stepping in gradient descent.

24

madmax_br5 OP t1_j629re3 wrote

Right, but BPE is designed to compress alphabetic languages (multiple letters per word), whereas logographic languages are already compressed (one or more words per symbol, but more net symbols). I suppose I don't get the reason behind obsessing over efficiency at this step and why it is necessary. What is the relationship between vocabulary size and model computational requirements? If the model input is ultimately an embedding of a fixed number of dimensions, does the token vocabulary size really make much practical difference?

−3

BeautyInUgly OP t1_j628n7e wrote

my update: heard nothing back yet, will keep posting in this thread when / if I hear anything back, note I am probably a weak candidate tbh, don't have any publications

10

currentscurrents OP t1_j627rd0 wrote

Meh, transformers have been around for like 5 years and nobody figured this out until now.

I think this mostly speaks to how hard it is to figure out what neural networks are doing. Complexity is irrelevant to the training process (or any other optimization process), so the algorithms they implement are arbitrarily complex.

(or in practice, as arbitrarily complex as the model size and dataset size allow)

23

VisceralExperience t1_j61znkf wrote

The amount of blatant anthropomorphism that comes from AI researchers is so disgusting. Laymen knowledge about the state of the field is already twisted enough from reality, and the researchers are 100% to blame. Seriously, I'd like to see papers getting rejected for this delusional framing of results.

−19

binheap t1_j61v2f2 wrote

If you believe them, model safety is why there isn't a general public release. LLMs (including chatGPT) tend to be bad at factual accuracy and can easily hallucinate. It's not obvious that you can work LLMs into a product where accuracy matters a lot. It might hurt brand image in ways that Google could not tolerate but OpenAI can tolerate.

4

lucidrage t1_j61u7zt wrote

>that's called a
>
>search engine
>
>.

like bing? :D

Google isn't known to develop and keep new products. When that google engineer leaked that "sentient AI" model, why didn't google beat the news by releasing a google-gpt with search engine capabilities?

With their 150k engineers, I doubt they lack the resources to build a user-friendly version of their LLM so how come they've been sitting on their hands the whole time?

3