lucidrage t1_j61so7l wrote on January 27, 2023 at 2:33 AM

Reply to comment by cdsmith in Few questions about scalability of chatGPT [D] by besabestin

>convert from PyTorch, Tensorflow, or a model in several other common formats into a Groq program

Are there any effort spend in adding a plugin for a high level framework like keras to automatically use groq?

curiousshortguy t1_j61silr wrote on January 27, 2023 at 2:32 AM

Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

This is cool, thanks for sharing

wind_dude t1_j61rnjt wrote on January 27, 2023 at 2:25 AM

Reply to [Discussion] Github like alternative for ML? by angkhandelwal749

huggingface

flyer2403 t1_j61nnvc wrote on January 27, 2023 at 1:56 AM

Reply to [Discussion] Github like alternative for ML? by angkhandelwal749

Check out Dagshub!

ReginaldIII t1_j61nlno wrote on January 27, 2023 at 1:55 AM

Reply to comment by fernandocamargoti in [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient

Trying to force these things into a pure hierarchy sounds nothing short of an exercise in pedantry.

And to what end? You make up your own distinctions that no one else agrees with and you lose your ability to communicate ideas to people because you're talking a different language to them.

If you are so caught up on the "is a" part. Have you studied any programming languages that support "multiple inheritance" ?

currentscurrents OP t1_j61ndkl wrote on January 27, 2023 at 1:53 AM

Reply to comment by lucidraisin in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

Thanks for the link!

I think it's interesting that they spent so much time in the 90s trying to make meta-learning work, and now it appears emergently just from throwing scale at the problem.

[deleted] t1_j61n377 wrote on January 27, 2023 at 1:51 AM

Reply to comment by fernandocamargoti in [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient

[deleted]

royalemate357 t1_j61k9vy wrote on January 27, 2023 at 1:30 AM

Reply to comment by Individual-Cause-616 in [D] score based vs. Diffusion models by Individual-Cause-616

the speed and quality of score based/diffusion depends on what sampler you use. If youre using euler's method to solve the ODE for example, that might be slower than some of the newer methods developed for diffusion models, like tero karass' ODE solvers. AFAIK there isnt consensus on what the best sampler to use is though.

i dont think it affects training convergence much though since its more or less the same objective.

lucidraisin t1_j61h7lf wrote on January 27, 2023 at 1:07 AM

Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

and one more paper along same lines! https://arxiv.org/abs/2212.07677

[deleted] t1_j61h1lt wrote on January 27, 2023 at 1:06 AM

Reply to comment by currentscurrents in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

[deleted]

bo_peng OP t1_j61fdtp wrote on January 27, 2023 at 12:54 AM

Reply to comment by Gody_Godee in [P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers by bo_peng

No. It's highly competitive.

youngintegrator t1_j61dfqk wrote on January 27, 2023 at 12:39 AM

Reply to [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996

Is there any reason you'd like a contrastive algorithm? (intra-class discrimination?)

Barlow twins showed to work quite well with lower batches (32) and HSIC-SSL is a nice variant on this style of learning if you only care about clusters. Im sure simsiam is fine too (avoid BYOL for small batches).

In terms of contrastive approaches, methods that avoid any "coupling" mentioned in DCL for the negative terms will work with smaller batch sizes (contrastive estimates converge to mle assuming large noise samples). This is seen in the spectral algorithm or in align-uniform. These work because they ignore the comparing the representations from the same augmented samples. SWAV also does this by contrastive prototypes which are basically free variables which don't have gradients that conflict with any alignment goal. I think it's fair to say that algorithms with LSE transforms are less stable for small batch sizes since the gradients will be biases to randomly coupled terms. With sufficiently many terms this coupling matters less.

From what i've noticed, methods that avoid comparing the augmented views of the same base sample will require slightly more tuning to get things just right. (align + weight * diversity)

Notes: NNCLR is nicer than moco imo. VicReg is good but is a mess to finetune. I am assuming youre using a CNN and have omitted transformer and masked based algorithms.

Zealousideal_Low1287 t1_j6191sq wrote on January 27, 2023 at 12:08 AM

Reply to comment by HateRedditCantQuitit in [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo

I guess for it to really count as a variational autoencoder you need to be reconstructing the input

curiousshortguy t1_j617zzd wrote on January 27, 2023 at 12:00 AM

Reply to [Discussion] Github like alternative for ML? by angkhandelwal749

The keyword you want, similar to DevOps where Github plays a role as the code storage, is MLOps, and within that you want to look for data and model management and versioning. There are quite a number of companies offering various aspects of that, see for example this random infographic: https://adataanalyst.com/wp-content/uploads/2021/05/Infra-Tooling3.png

pythonpeasant t1_j614dq6 wrote on January 26, 2023 at 11:35 PM

Reply to [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient

THIS IS HUGE!!!!

Please go back to the AttentionNeuron and AttentionAgent papers and retrain them on GPU with big population sizes!

_poisonedrationality t1_j60zrrk wrote on January 26, 2023 at 11:03 PM

Reply to [Discussion] Github like alternative for ML? by angkhandelwal749

Why is this downvoted? Seems like a decent question.

Expensive-Track t1_j60z9ho wrote on January 26, 2023 at 11:00 PM

Reply to comment by LetWrong1932 in [D] CVPR Reviews are out by banmeyoucoward

Same 🥲🥲

LetWrong1932 t1_j60z742 wrote on January 26, 2023 at 10:59 PM

Reply to comment by Expensive-Track in [D] CVPR Reviews are out by banmeyoucoward

wish they would or else i would have to spend the most nervous month of my life lol

Delicious-View-8688 t1_j60ykt5 wrote on January 26, 2023 at 10:55 PM

Reply to comment by ikkeweer in [Discussion] Github like alternative for ML? by angkhandelwal749

Yes, it's true. I have not tried using mamba with mlflow - maybe it integrates, maybe it doesn't. MLflow docs at the time of my reading indicated it works with conda or docker only.

Individual-Cause-616 OP t1_j60yi8b wrote on January 26, 2023 at 10:55 PM

Reply to comment by royalemate357 in [D] score based vs. Diffusion models by Individual-Cause-616

So do you think it makes a difference in practice, I.e. sampling speed and quality, convergence etc

Expensive-Track t1_j60yf7a wrote on January 26, 2023 at 10:54 PM

Reply to comment by LetWrong1932 in [D] CVPR Reviews are out by banmeyoucoward

Not sure myself but I doubt they'll make anything available before the final decision

ikkeweer t1_j60ych9 wrote on January 26, 2023 at 10:54 PM

Reply to comment by Delicious-View-8688 in [Discussion] Github like alternative for ML? by angkhandelwal749

Try mamba if you struggle with conda being slow, its a drop in replacement.

ObjectManagerManager t1_j60y1rn wrote on January 26, 2023 at 10:52 PM

Reply to Few questions about scalability of chatGPT [D] by besabestin

OpenAI's LLM is special because it's open to the public. That's it. Other tech companies' internal LLMs are likely better. Google has a whole database of billions of websites and indexes directly at their disposal; I'm quite confident that they can outperform ChatGPT with ease. If Google was really afraid of ChatGPT running them out of business, they'd just release a public API for their own, better model. And they have a monopoly over the internet in terms of raw data and R&D; it would be virtually impossible for anyone else to compete.

Besides that, the whole "Google killer" thing is overreactive, IMO. The public api for ChatGPT doesn't retrain or even prompt-condition on new public internet data. So if you ask it about recent news, it'll spit out utter garbage. An internal version reportedly does seek out and retrain on new public internet data. But how does it find that data? With a neat tool that constantly crawls the web and builds large, efficient databases and indexes. Oh yeah---that's called a search engine.

So even if end users start using LLMs as a substitute for search engines (which is generally not happening at the moment, and it seems unlikely to be a concern in the age of GPT-3, despite what many people believe), most LLM queries will likely be forwarded to some search engine or another for prompt conditioning. Search engines will not die---they'll just have to adapt to be useful for LLM prompt conditioning in addition to being useful to end users.

royalemate357 t1_j60xuup wrote on January 26, 2023 at 10:51 PM

Reply to [D] score based vs. Diffusion models by Individual-Cause-616

there's an implementation of score-based models from the paper that showed how score based models and diffusion models are the same here: https://github.com/yang-song/score_sde_pytorch

imo their implementation is more or less the same as a diffusion model, except score based models would use a numerical ODE/SDE solver to generate samples instead of using the DDPM based sampling method. it might also train on continuous time, so rather than choosing t ~ randint(0, 1000) it would be t ~ rand_uniform(0, 1.)

NeoKov t1_j60xeip wrote on January 26, 2023 at 10:48 PM

Reply to [P] New textbook: Understanding Deep Learning by SimonJDPrince

Fig. 8.5 mentions “brown line” for b) but line appears to be black.

Recent comments in /f/MachineLearning