CatalyzeX_code_bot t1_jb1r9fn wrote on March 5, 2023 at 7:46 PM

I don't think the author of the post took a position on the original argument; rather they just presented ways to explore the latent space and make comparisons that are reasonable so that we might derive better distance metrics.

I see it as a potential way to probe for evidence of mode collapse.

ml-research t1_jb1nlad wrote on March 5, 2023 at 7:20 PM

Reply to To RL or Not to RL? [D] by vidul7498

People said similar things about deep learning a long time ago.

If you can use supervised learning, then you should, because it means you have tons of data with ground-truth labels for each decision. But many real-world problems are not like that. Even humans don't know if each of their decisions is optimal or not.

rpnewc t1_jb1k781 wrote on March 5, 2023 at 6:57 PM

Reply to [D] Ethics of minecraft stable diffusion by NoLifeGamer2

If you are successful in getting noticed, you may get sued. If you are just one guy (not a company) may be not. But tread carefully. There may be a restricted licensing way you could show your work if you want to, but I am not an expert there.

2blazen t1_jb1jk5h wrote on March 5, 2023 at 6:53 PM

Reply to comment by Quazar_omega in [P] LazyShell - GPT based autocomplete for zsh by rumovoice

And lazy

Like come on at least have a landing page

deekaire t1_jb1jd2i wrote on March 5, 2023 at 6:51 PM

Reply to comment by PassionatePossum in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Great comment 👍

currentscurrents t1_jb1j20n wrote on March 5, 2023 at 6:49 PM

Reply to comment by tonicinhibition in To RL or Not to RL? [D] by vidul7498

>Do GANS really model the true data distribution...

I find their argument to be pretty weak. Of course these images look semantically similar; they ran a semantic similarity search to find them.

They are clearly not memorized training examples. The pose, framing, and facial expressions are very different.

royalemate357 t1_jb1h7wl wrote on March 5, 2023 at 6:37 PM

Reply to comment by Art10001 in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

hmm I very much doubt it couldve ran 100x faster for the same parameter count, as you are memory bandwith bound (both GPT and RWKV have to load the parameters n times to generate n tokens). Also Im somewhat skeptical that you only need 3GB for 14B parameters *without offloading the model*, as even 4-bit quantization is 14B/2 = 7GB needed. and offloading the model is slow to the point of being unusable as you need to do CPU<->GPU transfers.

tonicinhibition t1_jb1fgpe wrote on March 5, 2023 at 6:26 PM

Reply to comment by tripple13 in To RL or Not to RL? [D] by vidul7498

> people who discount GANs due to their lack of a likelihood

I was going to ask you to expand on this a little, but instead found a post that describes it pretty well for anyone else who is curious:

Do GANS really model the true data distribution...

For further nuance on this topic, Machine Learning Street Talk discussed interpolation vs extrapolation with Yann LeCun regarding interpolation vs extrapolation, which Letitia Parcalabescu summarizes here.

growqx t1_jb1duv8 wrote on March 5, 2023 at 6:16 PM

Reply to comment by ilyakuzovkin in To RL or Not to RL? [D] by vidul7498

>Same way as one wouldn't use RL to multiply two numbers

🤔 https://www.nature.com/articles/s41586-022-05172-4

Zealousideal_Low1287 t1_jb1ce8q wrote on March 5, 2023 at 6:06 PM

Reply to comment by szidahou in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

IDK did you read it?

luxsteele t1_jb1b68d wrote on March 5, 2023 at 5:59 PM

Reply to comment by _Arsenie_Boca_ in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

Totally agree.

I have been following this from some time but I can't fully understand it and explain it to my collaborators.

I work in ML and I have quite some experience with transformers and I still can't fully get it. Let alone convince some of my collaborator that is worth pursuing it.

It is paramount that we have a paper that explains this in more detail if we want the community to consider this seriously.

Please do it!

szidahou t1_jb19y51 wrote on March 5, 2023 at 5:51 PM

Reply to [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

How can authors be confident that this phenomenon is generally true?

[deleted] t1_jb19gqq wrote on March 5, 2023 at 5:47 PM

Reply to comment by farmingvillein in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

[deleted]

farmingvillein t1_jb18evq wrote on March 5, 2023 at 5:40 PM

Reply to comment by Toast119 in [R] [N] Dropout Reduces Underfitting - Liu et al. by radi-cho

Yes. In the first two lines of the abstract:

> Introduced by Hinton et al. in 2012, dropout has stood the test of time as a regularizer for preventing overfitting in neural networks. In this study, we demonstrate that dropout can also mitigate underfitting when used at the start of training.

rpnewc t1_jb17dvp wrote on March 5, 2023 at 5:34 PM

Reply to comment by 2blazen in [D] The Sentences Computers Can't Understand, But Humans Can by New_Computer3619

For sure it can be taught. But I don't think the way to teach it is to give it a bunch of sentences from the internet and expect it to figure out advanced reasoning. It has to be explicitly tuned into the objective. A more interesting question is, then how can we do this for all domains of knowledge in a general manner? Well, that is the question. In other words, what is that master algorithm for learning? There is one (or a collection of them) for sure, but I don't think we are much close to it. ChatGPT is simply pretending to be that system, but it's not.

yannbouteiller t1_jb17aaw wrote on March 5, 2023 at 5:33 PM

Reply to To RL or Not to RL? [D] by vidul7498

People will say anything in hope of drawing attention. Reframing an unexplored MDP into a supervised learning problem makes no sense.

Art10001 t1_jb176r8 wrote on March 5, 2023 at 5:32 PM

Reply to comment by ThirdMover in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

Indeed.

Art10001 t1_jb172wo wrote on March 5, 2023 at 5:32 PM

Reply to comment by royalemate357 in [R] RWKV (100% RNN) can genuinely model ctx4k+ documents in Pile, and RWKV model+inference+generation in 150 lines of Python by bo_peng

It once said 100 times faster and 100 times less (V)RAM here. However, it now says that RWKV-14B can be run with only 3 GB of VRAM, which is regardless a massive improvement, because a 14B model normally requires about 30 GB of VRAM or thereabouts.

Recent comments in /f/MachineLearning