Recent comments in /f/MachineLearning

throwaway2676 t1_j6d99fw wrote

So shouldn't this mean we can train transformers using forward passes alone? It seems that it wouldn't be too difficult to derive an algorithm that updates the attention weights based on these results, but I don't believe the authors mention the possibility.

1

shaner92 OP t1_j6d5yr7 wrote

>How does information gathering differ between those in Applied ML and AI researchers (or even further, between those in Business Analytics and those in more 'AI' fields)

I had Data Elixir, will check the rest. Maybe it's time to trim some of the other newsletters that were probably 'influencers' trying to get easy news items off of ChatGPT.

Curious though, do you get these newsletters for general ML news, and focus on industry specifics for use cases? Or try to keep up with research papers in your area?

5

trnka t1_j6d5fbk wrote

I think most people split by participant. I don't remember if there's a name for that, sorry! Hopefully someone else will chime in.

If you have data from multiple hospitals or facilities, it's also common to split by that because there can be hospital-specific things in the data and you really want your evaluation to estimate the quality of the model for patients not in your data at hospitals not in your data.

1

starstruckmon t1_j6d3lsr wrote

I can guarantee the next paper out of this Google team is going to be a diffusion model ( instead of AudioLM ) conditioned on MuLan embeddings.

The strength of the Google model is the text understanding which is coming from the MuLan embeddings. While the strength of the work you highlighted is the quality from the diffusion model.

It's the obvious next step following the same path as Dalle1->Dalle2.

1

Artgor t1_j6d2xjw wrote

First of all, it is important to understand that we can't keep up with everything. There are too many things happening around us to be able to know all of them.

That being said, I'm subscribed to the following newsletters:

  • Data Elixir
  • Data Machina
  • DataScienceWeekly

They cover most of the advances, I think.

18

Redditing-Dutchman t1_j6cteqn wrote

I think copyright is more an issue than with artwork. Human brains are so sensitive and well trained on music that you immediately recognise a familiar tune. Plus the music industry as a whole is much more sensitive in copyright aspects. Maybe because there is a lot of money involved in it. Not sure. I can understand why Google keep theirs away from the public for now.

1

gamerx88 t1_j6cqerx wrote

It's not about large data or number of parameters. OpenAI has not actually revealed details regarding ChatGPT's architecture and training. What is special is the fine-tuning procedure -- alignment through RLHF on the underlying LLM (nicknamed GPT3.5) that is extremely good at giving "useful" responses to prompts\instructions.

Prior to this innovation, zero-shot and in-context few-shot learning with LLM was hardly working. Users had to trial and error their way to some obtuse prompt to get the LLM to generate some sensible response to their prompt, if it even worked at all. This is because LLM pre-training is purely about language structure without accounting for intent (what the human wishes to obtain via the prompt). Supervised fine-tuning based on instructions and output pairs helped but not by much. With RLHF however, the process is so effective that a mere 6B parameter model (fine-tuned with RLHF) is able to surpass a 175B parameter model. Check out the InstructGPT paper for details.

2

VirtualHat t1_j6ckblf wrote

I was thinking next frame prediction, perhaps conditioned on the text description or maybe a transcript. The idea is you could then use the model to generate a video from a text prompt.

I suspect this is far too difficult to achieve with current algorithms. It's just interesting that the training data is all there, and would be many, many orders of magnitude larger than GPT-3's training set.

2