Recent comments in /f/MachineLearning

Sirisian t1_j6yja6v wrote

Part of this is about brand identity also. Even if a technology isn't perfect some companies try to get in early. This is similar to virtual reality and mixed reality trends. The industry sees an inevitable future and want to be the name people think of. If one assumes gradual improvements until ~2045, then this is long-term planning. (Or short-term depending on improvements expected. It's possible MS has insider information that skews their motives).

3

LetterRip t1_j6yj4z2 wrote

GPT-3 can be quantized to 4bit with little loss, to run on 2 Nvidia 3090's/4090's (Unpruned, pruned perhaps 1 3090/4090). At 2$ a day for 8 hours of electricity to run them, and 21 working days per month. That is 42$ per month (plus amortized cost of the cards and computer to store them).

3

Acceptable-Cress-374 t1_j6yil6g wrote

> Their resources will always be larger, and they will keep accelerating faster on the exponential curve.

Sure, they'll have more money to throw at a problem, but also more incentive to throw that money into other money-making stuff. Open-source models might not necessarily go the same path, and even if under-trained or less-optimized, they might still be a tremendous help once a community gets to play with them.

1

AristosTotalis t1_j6ye5hn wrote

yep. $1B in cash but they have to use Azure as their exclusive compute cloud compute provider, which Microsoft probably sells to OAI at ~cost

I think it' safe to assume that 2/3 of that will go towards training & inference, and if you also assume M doesn't make nor lose money selling compute (and in fact they get to strengthen Azure as a cloud infra player), they really only paid ~$300M to invest in OAI at what seems like a great price in hindsight

6

Franck_Dernoncourt t1_j6ydkiu wrote

> I was surprised at how much better GPT3 davinci 003 performed compared to AI21's 178B model. AI21's Jurassic 178B seems to be comparable to GPT3 davinci 001.

on which tasks?

> Of course, I didn't expect the smaller models to be on par with GPT-3

You could read Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto. Benchmarking Large Language Models for News Summarization. arXiv:2301.13848.:

> we find instruction tuning, and not model size, is the key to the LLM’s zero-shot summarization capability

6

ReginaldIII t1_j6ybiju wrote

This isn't being used for autocomplete or any user text generation purposes though.

They're using it to summarize and make todo lists from the Whisper extracted transcripts of video meetings. Users aren't getting a frontend to run arbitrary stuff through the model. Seems like a pretty legitimate use case.

20

LeanderKu t1_j6y7cge wrote

I actually find automatically generating notes to be a smart and useful application. I often have 1 on 1 remote meetings and I find it difficult to both present and discuss my work while also taking notes. It often happens to me that I focus on something so that I forget I should also take notes, which I then notice a week later when I have forgotten half of the tasks. If it would work reliably then I can imagine it to be a very useful addition.

I have never used teams though, everything's on zoom.

47

crt09 t1_j6y5x4t wrote

This paper seems very relevant: https://arxiv.org/abs/2205.13636 I haven't read it closely enough to give strong opinions with confidence but it seems to beat PPO with a token level loss thats works similar to the Upside Down Reinforcement Learning paper, where you give a target reward between 1 and 5 as an input token before the prompt and train it to output a response of a coressponding quality, trained on the standard LM loss on an existing target output with the given 1-5 reward rank. Then during inference you just append 1 to the start of the prompt and it outputs a response of high quality

1