Recent comments in /f/MachineLearning
LetWrong1932 t1_j66dzte wrote
Reply to comment by Complete-Drag-2694 in [D] CVPR Reviews are out by banmeyoucoward
yes, try hard on 2 and also don't leave out the two 3s. and sometimes 3 3 3 gets accepted too!
maizeq t1_j66b3l5 wrote
Reply to comment by data-drone in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Chinchilla (70B) is trained with 1.4 trillion, so 140B would presumably need at least 2.8 trillion (it scales linearly afaik).
I’m not sure a 2.8 trillion token dataset actually exists
bubudumbdumb t1_j66a4kw wrote
The way you prompt assume there is a single entity for "name" so you catch "balmer" but not "bill gates".
Why not BIO tagging each token for each of the entity types?
[deleted] t1_j66a0bw wrote
Reply to comment by MyActualUserName99 in [D] ImageNet2012 Advice by MyActualUserName99
[deleted]
MyActualUserName99 OP t1_j669vl1 wrote
Reply to comment by [deleted] in [D] ImageNet2012 Advice by MyActualUserName99
Where at? Everything I can find is like 3$ . Cheapest I can find is google colab $10 for 7.5 hours, but you’re limited by ram and your node will drop at any given time
[deleted] t1_j669ixw wrote
Reply to comment by MyActualUserName99 in [D] ImageNet2012 Advice by MyActualUserName99
[deleted]
[deleted] t1_j669ec0 wrote
Reply to comment by MyActualUserName99 in [D] ImageNet2012 Advice by MyActualUserName99
[deleted]
MyActualUserName99 OP t1_j668o4r wrote
Reply to comment by arg_max in [D] ImageNet2012 Advice by MyActualUserName99
I’ll definitely check it out!
MyActualUserName99 OP t1_j668iiq wrote
Reply to comment by [deleted] in [D] ImageNet2012 Advice by MyActualUserName99
Yes, they had 40GB A100 GPUs in 2015
Screye t1_j6688sf wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
I am done man. How is someone supposed to keep up with this pace of research ?
MrCheeze t1_j661b7r wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
This seems like a major increase in quality compared to past attempts. And with long term coherency too, check out those 5 minute tracks.
And if that wasn't up, we even got an additional mode that lets you provide a melody of your own and ask for an arrangement. Should be very useful for composition.
Assuming that these results aren't cherrypicked or otherwise misleading, I'd be very excited to try to make music with an open replication of this.
TankAttack OP t1_j660efo wrote
Reply to comment by visarga in [D] Best large language model for Named Entity Extraction? by TankAttack
How many samples do you use for fine turning?
TankAttack OP t1_j660bm9 wrote
Reply to comment by thatphotoguy89 in [D] Best large language model for Named Entity Extraction? by TankAttack
Do you mean free text questions? Like zero shot learning? Are there any examples of this?
arg_max t1_j65z4o9 wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
You could use ffcv, it improves data loading speed and some bottlenecks without you needing to change train code much.
flashdude64 t1_j65z2q4 wrote
Reply to comment by CKtalon in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Do you have a citation for this that I could read?
byan19 t1_j65x7kr wrote
Reply to [D] ICLR 2023 results. by East-Beginning9987
Got rejected, due to a reviewer changing his score from 8 to 5.
tornado28 t1_j65wtvp wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
You might be able to get some free compute from AWS or GCP
feloneouscat t1_j65vzjx wrote
Reply to comment by Acceptable-Cress-374 in [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116
>Make some minor grammar mistakes while writing the post.
Huh. So you told it to do something it wouldn’t ordinarily do.
This seems akin to salesman who took a sledge to a product and then argued that it breaks in the field (true story). When you leave that off, does the paragraph get caught? Or did you muck about to find something that assured it would think it was human generated?
DigiglobalNOW t1_j65u0ip wrote
Reply to [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo
I feel like if you feed it a ton of videos it should be able to be complex enough to spit back out a decent High quality Image
Anyone find a quick process than the batch image stitching?
[deleted] t1_j65tyr5 wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
[deleted]
Boring_Party8508 t1_j65ti3z wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
Does anyone found any access to the code or paper for this MusicLM?
albertzeyer t1_j65rtdq wrote
Reply to [D] Why are there no End2End Speech Recognition models using the same Encoder-Decoder learning process as BART (no CTC) ? by KarmaCut132
What do you mean? There are many such papers where people only use attention-based encoder-decoder (AED) for speech recognition. Some random papers:
See my Phd thesis for some overview over CTC, AED, RNN-T and other approaches: https://www-i6.informatik.rwth-aachen.de/publications/download/1223/Zeyer--2022.pdf
I call this "sequence-to-sequence architecture".
I think most people nowadays use RNN-T.
Some people use CTC just because of its simplicity, and also it might be more stable, behave more sane on long sequences, where AED might break, and online streaming is simpler than AED.
AED is clearly better than CTC. But RNN-T is also better than CTC.
Of course, a combination is yet better than both. So AED+CTC is better than both AED or CTC alone. And ESPnet, a very popular open source framework, has this implemented, so many people just use that.
vaslor t1_j65qh5b wrote
Reply to comment by Sylv__ in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
Sigh. Was that necessary?
[deleted] t1_j65pti6 wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
[deleted]
thegreatmarker t1_j66igyg wrote
Reply to comment by Screye in [D] MusicLM: Generating Music From Text by carlthome
idk but I'm burnt out too, every day like 3 new ground-breaking papers drop it never ends