Recent comments in /f/MachineLearning

bo_peng OP t1_jb1po7i wrote

Will the 150 lines help? Please read the code first :)

https://github.com/BlinkDL/ChatRWKV/blob/main/RWKV_in_150_lines.py

This is ALL you need for RWKV inference.

And you can read https://arxiv.org/abs/2302.13939 (SpikeGPT) which is inspired by RWKV and has plenty of explanations :)

4

tonicinhibition t1_jb1ntqz wrote

I don't think the author of the post took a position on the original argument; rather they just presented ways to explore the latent space and make comparisons that are reasonable so that we might derive better distance metrics.

I see it as a potential way to probe for evidence of mode collapse.

1

ml-research t1_jb1nlad wrote

People said similar things about deep learning a long time ago.

If you can use supervised learning, then you should, because it means you have tons of data with ground-truth labels for each decision. But many real-world problems are not like that. Even humans don't know if each of their decisions is optimal or not.

3

rpnewc t1_jb1k781 wrote

If you are successful in getting noticed, you may get sued. If you are just one guy (not a company) may be not. But tread carefully. There may be a restricted licensing way you could show your work if you want to, but I am not an expert there.

3

currentscurrents t1_jb1j20n wrote

>Do GANS really model the true data distribution...

I find their argument to be pretty weak. Of course these images look semantically similar; they ran a semantic similarity search to find them.

They are clearly not memorized training examples. The pose, framing, and facial expressions are very different.

5

royalemate357 t1_jb1h7wl wrote

hmm I very much doubt it couldve ran 100x faster for the same parameter count, as you are memory bandwith bound (both GPT and RWKV have to load the parameters n times to generate n tokens). Also Im somewhat skeptical that you only need 3GB for 14B parameters *without offloading the model*, as even 4-bit quantization is 14B/2 = 7GB needed. and offloading the model is slow to the point of being unusable as you need to do CPU<->GPU transfers.

1

tonicinhibition t1_jb1fgpe wrote

Reply to comment by tripple13 in To RL or Not to RL? [D] by vidul7498

> people who discount GANs due to their lack of a likelihood

I was going to ask you to expand on this a little, but instead found a post that describes it pretty well for anyone else who is curious:

Do GANS really model the true data distribution...

For further nuance on this topic, Machine Learning Street Talk discussed interpolation vs extrapolation with Yann LeCun regarding interpolation vs extrapolation, which Letitia Parcalabescu summarizes here.

1

luxsteele t1_jb1b68d wrote

Totally agree.

I have been following this from some time but I can't fully understand it and explain it to my collaborators.

I work in ML and I have quite some experience with transformers and I still can't fully get it. Let alone convince some of my collaborator that is worth pursuing it.

It is paramount that we have a paper that explains this in more detail if we want the community to consider this seriously.

Please do it!

8

rpnewc t1_jb17dvp wrote

For sure it can be taught. But I don't think the way to teach it is to give it a bunch of sentences from the internet and expect it to figure out advanced reasoning. It has to be explicitly tuned into the objective. A more interesting question is, then how can we do this for all domains of knowledge in a general manner? Well, that is the question. In other words, what is that master algorithm for learning? There is one (or a collection of them) for sure, but I don't think we are much close to it. ChatGPT is simply pretending to be that system, but it's not.

1