Recent comments in /f/MachineLearning
darkshenron t1_j63s68w wrote
Reply to comment by flyer2403 in [Discussion] Github like alternative for ML? by angkhandelwal749
+1 for Dagshub
Vivid-Ad6077 t1_j63qahy wrote
Reply to comment by starfries in [Discussion] Github like alternative for ML? by angkhandelwal749
https://docs.wandb.ai/ref/app/features/panels/code#save-library-code. it's turned off by default for security and privacy reasons. But it can be enabled or you can optionally log code in individual projects or runs.
bloc97 t1_j63q1nk wrote
Reply to comment by HateRedditCantQuitit in [D] Why are GANs worse than (Latent) Diffusion Models for text2img generation? by TheCockatoo
>It's simpler (which leads to progress)
I wouldn't say current diffusions models are simpler, in fact they are much more complex than even the most "complex" GAN architectures. However it's exactly because of all the other points that they have become this complex. A vanilla GAN would never be able to endure this much tweaking without mode collapse. Compare that to even the most basic score-based models, which are always stable.
Sometimes, the "It just works™" proposition is much more appealing than pipeline simplicity or speed.
starfries t1_j63pp6f wrote
Reply to comment by Vivid-Ad6077 in [Discussion] Github like alternative for ML? by angkhandelwal749
wandb is great but I had no idea it also versioned code, I'm still using git for that.
bhendel t1_j63nv34 wrote
Reply to comment by TFenrir in [D] MusicLM: Generating Music From Text by carlthome
using the ai to explain the other ai, love it
TFenrir t1_j63np8c wrote
Reply to comment by bhendel in [D] MusicLM: Generating Music From Text by carlthome
chatGPT (so take it with many grains of salt)
> The paper is discussing a machine that can create new songs or music. They are testing to see if the machine is able to memorize songs or if it can come up with new ones. They are looking at how well the machine does when given different amounts of information to work with. They found that even when given a lot of information, the machine is not able to create exact copies of songs. However, it can create similar songs. They also found that when the machine is given very little information, the songs it creates are not very diverse. They include examples of the machine's output in the accompanying material.
anony_sci_guy t1_j63nj0u wrote
Reply to comment by nmfisher in [R] SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot by Secure-Technology-78
This was exactly my first thought too - free up all those extra parameters & re-randomize them. Problem could be that the re-randomized parameters will have a big gap in distribution between the pre-tuned and re-randomized weights, so you'd want different step sizes for them. I've played with it before & ran into this problem, but got too lazy to actually implement a solution. (I'm actually a biologist, so don't really have bandwidth to dig into the ML side as much)..
madmax_br5 OP t1_j63mi7f wrote
Reply to comment by suflaj in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
Thank you, this is very helpful!
K3tchM t1_j63l7xu wrote
Reply to comment by CodeAllDay1337 in Machine learning and black box numerical solver[D] by Due-Wall-915
I don't know which numerical optimization OP is trying to solve, but one major weakness of this paper is that their method requires two solver calls per instance per epoch... Training time might quickly become intractable.
OP should have a look at other methods that aim to solve their problem efficiently, such as https://arxiv.org/abs/2112.03609 or recently https://arxiv.org/abs/2203.16067
bhendel t1_j63jzdy wrote
Reply to comment by tahansa in [D] MusicLM: Generating Music From Text by carlthome
Anyone got a simpler explanation of that?
5death2moderation t1_j63gfol wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
No code so who cares
[deleted] t1_j63gd9l wrote
Reply to comment by tahansa in [D] MusicLM: Generating Music From Text by carlthome
[removed]
tahansa t1_j63fqca wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
"Is it a memorization machine or can it create new songs?"
​
From the paper:
"Memorization analysis. Figure 3 reports both exact and
approximate matches when the length of the semantic token
prompt is varied between 0 and 10 seconds. We observe
that the fraction of exact matches always remains very
small (< 0.2%), even when using a 10 second prompt to
generate a continuation of 5 seconds. Figure 3 also includes results for approximate matches, using τ = 0.85.
We can see a higher number of matches detected with this
methodology, also when using only MuLan tokens as input
(prompt length T = 0) and the fraction of matching examples increases as the length of the prompt increases. We
inspect these matches more closely and observe that those
with the lowest matching score correspond to sequences
characterized by a low level of token diversity. Namely, the
average empirical entropy of a sample of 125 semantic tokens is 4.6 bits, while it drops to 1.0 bits when considering
sequences detected as approximate matches with matching
score less than 0.5. We include a sample of approximate
matches obtained with T = 0 in the accompanying material.
Note that acoustic modeling carried out by the second stage
introduces further diversity in the generated samples, also
when the semantic tokens match exactly."
tahansa t1_j63f88w wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
Incredible stuff.
Gotta get them copyright things solved with those visual NNs before these audio models hit the mainstream.
The progress of these audio models getting me much more stoked than those of the image models.
Complete-Drag-2694 t1_j63emsz wrote
Reply to comment by LetWrong1932 in [D] CVPR Reviews are out by banmeyoucoward
Hi, do u think 332 has a chance to get accepted? 2 said that he/she is willing to improve the score if I can address his/her concerns (which I think I can).
Thanks!
[deleted] t1_j63em9m wrote
Reply to comment by LetWrong1932 in [D] CVPR Reviews are out by banmeyoucoward
[deleted]
jackilion t1_j63e6ah wrote
Reply to comment by Blutorangensaft in [D] Quantitative measure for smoothness of NLP autoencoder latent space by Blutorangensaft
There is no reason to assume your latent space will be smooth by itself. I remember a paper for image generation that had techniques for smoothing out the latent space that can be applied during training:
https://arxiv.org/abs/2106.09016
​
It's about GANs, not autoencoders, but maybe you can find some ideas in there.
Thanos_nap OP t1_j63dyc3 wrote
Reply to comment by vwings in [P] Building a LSTM based model for binary classification by Thanos_nap
There is a temporaral component. These customer actions are week wise. So the data is Customer ID, week number, action, converted yes or no.
I can get this in the 3d shape with time step as week, features = actions. But I'm confused what would be the batch here.
But yes, i agree with you this is not the best method for my use case!
vwings t1_j63c9z7 wrote
Reply to comment by guava-bandit in [P] Building a LSTM based model for binary classification by Thanos_nap
Yes, good point. I would recommend to use KERAS for this modeling task. As soon as you have the data in the right data structure, you can solve this with maybe 25 lines of code ...
vwings t1_j63c3ss wrote
Reply to comment by teenaxta in [P] Building a LSTM based model for binary classification by Thanos_nap
How do you know that the costumer is male?
vwings t1_j63c23v wrote
Reply to comment by Thanos_nap in [P] Building a LSTM based model for binary classification by Thanos_nap
Lol, LSTM for the sake of it. If there is no temporal component, then it's just the wrong model. Can you tell them that Transformers are the "new" LSTMs? Transformers handle sets (instead of sequences), so they would make a lot of sense in your application..
flukeskywalker t1_j63bylt wrote
Reply to comment by programmerChilli in [P] EvoTorch 0.4.0 dropped with GPU-accelerated implementations of CMA-ES, MAP-Elites and NSGA-II. by NaturalGradient
This is definitely on our todo list!
suflaj t1_j63bf1q wrote
Reply to comment by madmax_br5 in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
Well for starters, it would probably have worse performance due to so many redundant features, and it would be much slower.
Remember that the embedding layer carries loads of overhead, as we're talking V * d matrices. So for a corpus of 250k and embedding vector of 768, ex., we're talking about 192M parameters just for the embedding layer. Maybe you can save some space by having a sparse embedder, but find me a free implementation of sparse layers that work as well as dense ones. Other than that, the 192M parameters are, before compression techniques, equivalent to 768M. And that's just in memory, and the gradient, unless sparsified, will be 768M PER BATCH.
This is without mentioning that you would likely need to increase the embedding dim to account for the 8 times times bigger vocabulary.
I-am_Sleepy t1_j639x7w wrote
Reply to comment by RealKillering in [D] Simple Questions Thread by AutoModerator
Check GPU version with “!nvidia-smi”, and for dataset this probably is not GPU fault but memory bottleneck. See https://stackoverflow.com/questions/49360888/google-colab-is-very-slow-compared-to-my-pc
TankAttack OP t1_j63scac wrote
Reply to [D] Best large language model for Named Entity Extraction? by TankAttack
I also tried pre-trained tools like Spacy, but they only have a few fixed entity types they detect.