Discombobulated_Bar6 t1_j65orgc wrote on January 27, 2023 at 10:02 PM

Reply to [D] MusicLM: Generating Music From Text by carlthome

ugh, well there goes my music hopes

Mountain_Lab_5857 t1_j65ojup wrote on January 27, 2023 at 10:01 PM

Reply to Few questions about scalability of chatGPT [D] by besabestin

You can check Damien Benveniste on Linkedin, i dont remember when its shared but there is a article about Model Parallelism for training.

visarga t1_j65iwit wrote on January 27, 2023 at 9:24 PM

Reply to [D] Best large language model for Named Entity Extraction? by TankAttack

I am using GPT-3 for this kind of stuff, and fine-tuning small models on the data.

Secure-Technology-78 OP t1_j65ifpn wrote on January 27, 2023 at 9:21 PM

Reply to comment by Sylv__ in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Awwww i’m sorry baby, i promise i’ll work very very hard on my next post for you!

Sylv__ t1_j65ib3y wrote on January 27, 2023 at 9:20 PM

Reply to [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

already posted a few weeks ago, thank you for your low effort post linking to an arxiv link

RemindMeBot t1_j65eij7 wrote on January 27, 2023 at 8:56 PM

Reply to comment by JustOneAvailableName in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

I will be messaging you in 3 days on 2023-01-30 20:55:47 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

^(Parent commenter can ) ^(delete this message to hide from others.)

^(Info)	^(Custom)	^(Your Reminders)	^(Feedback)

JustOneAvailableName t1_j65eg5f wrote on January 27, 2023 at 8:55 PM

Reply to comment by nmfisher in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

!RemindMe 3 days

muchcharles t1_j65b3a6 wrote on January 27, 2023 at 8:34 PM

Reply to comment by element8 in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Deepmind put out a paper on adjusting the pruning mask during training (by reviving pruned weights if a transiently stored gradient exceeds some threshold).

The paper is called Rigging the Lottery (referencing initial weight lottery hypothesis) and method RigL I think.

[deleted] t1_j658zu8 wrote on January 27, 2023 at 8:21 PM

Reply to [D] MusicLM: Generating Music From Text by carlthome

[removed]

InsidiousApe t1_j658w0e wrote on January 27, 2023 at 8:20 PM

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

This was exactly the kind of answer I was hoping for - a great place to start more research. Thanks!

currentscurrents OP t1_j658kmf wrote on January 27, 2023 at 8:18 PM

Reply to comment by cthorrez in [R] Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers by currentscurrents

Interesting. That probably explains why ICL outperformed finetuning by so much in their experiments.

trnka t1_j6583q3 wrote on January 27, 2023 at 8:15 PM

Reply to comment by InsidiousApe in [D] Simple Questions Thread by AutoModerator

If you're ingesting from an API, typically the limiting factor is the number of API calls or network round trips. So if there's a "search" API or anything similar that returns paginated data that'll speed it up a LOT.

If you need to traverse the API to crawl data, that'll slow it down a lot. Like say if there's a "game" endpoint, a "player" endpoint, a "map" endpoint, etc.

If you're working with image data, fetching the images is usually a separate step that can be slow.

After that, it you can fit it in RAM you're good. If you can fit it on one disk, there are decent libraries with each ML framework to efficiently load from disk in batches, and you can probably optimize the disk loading too.

----

What you're describing is usually called exploratory data analysis but it depends on the general direction you want to go in. If you're trying to identify people with thyroid cancer earlier, for example, you might want to compare the data of recently-diagnosed people to similar people that have been tested and found not to have thyroid cancer. Personally, in that situation I like to just train a logistic regression model to predict that from various patient properties then check if it's predictive on a held-out data sample. If it's predictive I'll then look at the coefficients of the features to understand what's going on, then work to improve the features.

Another simple thing you can do, if the data is small enough and tabular rather than text/image/video/audio is to load it up in Pandas and run .corr then check correlations with the column you care about (has_thyroid_cancer).

Hope this helps! Happy to follow up too.

currentscurrents t1_j657n2z wrote on January 27, 2023 at 8:12 PM

Reply to comment by blimpyway in [R] The Predictive Forward-Forward Algorithm by radi-cho

>The so called NPUs. Which are simplified GPUs optimized only for inference (forward passes). Such an algorithm would enable them to learn using only forward passes, hence without requiring backpropagation.

More importantly, you could build even simpler chips that physically implement a neural network out of analog circuits instead of emulating one with digital math.

This would use orders of magnitude less power, and also let you fit a larger network on the same amount of die space.

bCollinsHazel t1_j655sz3 wrote on January 27, 2023 at 8:00 PM

Reply to [D] MusicLM: Generating Music From Text by carlthome

i love this! its brilliant. do more.

iidealized t1_j655guq wrote on January 27, 2023 at 7:58 PM

Reply to [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft

Paper that seems relevant:

https://arxiv.org/abs/1905.12777

Blutorangensaft OP t1_j654qyd wrote on January 27, 2023 at 7:54 PM

Reply to comment by jackilion in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft

Thank you for the reference, it looks very promising. I've heard of ways to smooth the latent space through Lipschitz regularisation, but then got disappointed again when I read "ah well it's just layer normalisation". So many things in ML come in a different appearance and actually mean the same thing once you implement them.

ApprehensiveNature69 t1_j651pux wrote on January 27, 2023 at 7:34 PM

Reply to comment by nmfisher in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Yep! This is known technique - if you search for it lots of papers on sparse fine tuning show up, its a very valid technique.

[deleted] t1_j650m2z wrote on January 27, 2023 at 7:27 PM

Reply to comment by element8 in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

[deleted]

[deleted] t1_j64xarm wrote on January 27, 2023 at 7:06 PM

Reply to comment by bhendel in [D] MusicLM: Generating Music From Text by carlthome

[removed]

hot_sauce_in_coffee t1_j64v4ea wrote on January 27, 2023 at 6:53 PM

Reply to [D] Microsoft ChatGPT investment isn't about Bing but about Cortana by fintechSGNYC

Not gonna lie, if Cortana and ChatGPT merge, I'd pay for a Cortana subscription.

element8 t1_j64uglo wrote on January 27, 2023 at 6:49 PM

Reply to [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Is network pruning in this case analogous to discarding specific evidence for more general intuitions, or is that over anthropomorphizing? How does it affect future training once pruned? can the pruning mask be applied during training since the method is operating within a local subset?

r2m2 t1_j64uah5 wrote on January 27, 2023 at 6:47 PM

Reply to [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78

Isn’t this a (somewhat) well-known “free lunch” effect w/ naive one-shot magnitude pruning? I feel like this is a folklore fact for many models like ResNet/VGG (& a paper from a few years back validated the same for BERT)

LetMeGuessYourAlts t1_j64t0lv wrote on January 27, 2023 at 6:40 PM

Reply to [D] Best large language model for Named Entity Extraction? by TankAttack

I'm doing something similar to your task. My plan is to use GPT-3's Text-divinci-003 as it can do this in Instruct mode without modification and then once I have a hundreds to thousands of examples then fine-tune GPT-J on Forefront.ai using what GPT-3 generated to hopefully cut costs by about 75%.

mil24havoc t1_j64s1uy wrote on January 27, 2023 at 6:34 PM

Reply to comment by red_dragon in [P] Using algorithms or models from papers for commercial use by romantimm25

The weights are part of the model, not the algorithm. Whether these can be copyrighted is (a) unclear and (b) should have no bearing on the status of the algorithm itself.

Edit: The output of an algorithm has been ruled by courts to not be copyrightable. A Transformer is, itself, the "output" of an algorithm (e.g., SGD). Therefore, IMHO (IANAL), a Transformer cannot be copyrighted. We'll see if the judges who start taking these cases are savvy enough to rule correctly. Similarly, recipes cannot be copyrighted and they're quite similar to algorithms.

red_dragon t1_j64qkqz wrote on January 27, 2023 at 6:24 PM

Reply to comment by mil24havoc in [P] Using algorithms or models from papers for commercial use by romantimm25

Isn't the main issue with the weights? Are the weights propreitary?

Recent comments in /f/MachineLearning