Recent comments in /f/MachineLearning
Acceptable-Cress-374 t1_j67w859 wrote
Reply to comment by feloneouscat in [D] Do you know of any model capable of detecting generative model(GPT) generated text ? by CaptainDifferent3116
That was my first try. I went with the gut feeling that any training that they used for their model would assume bland prompts. I made mine different, and got 97% human generated the first try. Someone else mentioned other things that you could do, like mess around with temperature and such. Those work as well.
PleasantBase6967 OP t1_j67w2s0 wrote
Reply to comment by PleasantBase6967 in [D] Laptop recommendations for ML by PleasantBase6967
What is your opinion on macOS and the M1 chip using PyTorch
PleasantBase6967 OP t1_j67vufs wrote
Reply to comment by marcingrzegzhik in [D] Laptop recommendations for ML by PleasantBase6967
Thanks
marcingrzegzhik t1_j67vr5l wrote
Reply to [D] Laptop recommendations for ML by PleasantBase6967
It depends on the scope of your projects. If you're only training small models (like GANs, CNNs, etc.), then a decent modern laptop with 8+ GB RAM and an intel i7 or Ryzen 7 processor should suffice. GPUs are nice to have, but with an Intel i7 or Ryzen 7 processor you can do most of the work without them. As for the OS, Windows and Linux are both fine, but I'd recommend Linux for ML projects for maximum compatibility. Hope this helps!
Meddhouib10 t1_j67v88c wrote
Reply to comment by Meddhouib10 in [D] ImageNet2012 Advice by MyActualUserName99
Also if use mixed precision. And if you have a TPU use Xla
jimmymvp t1_j67uru2 wrote
Diffusion models are effectively score-based, there's a connection with the reversal of the forward process being Gaussian and the noise estimate, effectively you're using scores of Gaussians in the reverse process. The time variable is irrelevant in sense of scale, the discrete time and continuous time essentially do roughly the same, the difference is that one is tied to a specific discretization of the SDE and the other can be solved to arbitrary precision, it's also a difference if you take steps wrt to variance or wrt time. Essentially the continuous formulation should be the limit of the discrete one. So effectively you can take a discrete sampling method and make it a continuous SDE/ODE
mr_birrd t1_j67u5x5 wrote
Reply to comment by Boring_Party8508 in [D] MusicLM: Generating Music From Text by carlthome
Paper is linked at the top of the page, they won't release code as per last lines of the paper.
marcingrzegzhik t1_j67tza8 wrote
No, the license does not mean you cannot use the ideas from the paper in a commercial product. It just means that you cannot use the work itself or any derivative works for commercial purposes. However, you can use the ideas from the paper, as long as you don’t directly copy or use any of the code/materials from the paper. To be safe, you should also make sure that you don’t infringe on any patents associated with the paper.
Meddhouib10 t1_j67tmqr wrote
Reply to comment by deepestdescent in [D] ImageNet2012 Advice by MyActualUserName99
+1 but I think he already knows that
Meddhouib10 t1_j67tlnc wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
You can train/pretrain only on ImageNet1k and compare with swinv2 and ConvNextV2 trained only on ImageNet1k (both share their results on this setting)
visarga t1_j67sivp wrote
Reply to comment by TankAttack in [D] Best large language model for Named Entity Extraction? by TankAttack
My task uses sentence pairs, and I have an efficient prompt that makes many pairs in one go. So in 5 hours I managed to generate 230K pairs. Cost $10. I plan to generate millions to "exfiltrate" more domain knowledge for the small and efficient models I am training downstream.
marcingrzegzhik t1_j67s9fp wrote
Reply to [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
Forward-forward learning is a very interesting concept, and I think that in some cases it could definitely yield better results than distributed learning with backprop. It really depends on the size of the model, the latency of the connection, and the bandwidth of the slowest machine. I'm sure that in some cases it could be much faster, but I'm curious to know if there are any other advantages to using forward-forward learning over backprop for distributed learning.
visarga t1_j67q45m wrote
Reply to comment by madmax_br5 in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
The solution is to put more text in the other languages and re-train the tokeniser, it will adapt to the larger corpus by assigning more tokens.
visarga t1_j67pv49 wrote
Reply to comment by HateRedditCantQuitit in [D] Moving away from Unicode for more equal token representation across global languages? by madmax_br5
It's also the fact that content in English dwarfs content in other languages, and languages more similar to English also benefit, but not languages that have different scripts and fewer cognates.
Complete-Drag-2694 t1_j67opme wrote
Reply to comment by LetWrong1932 in [D] CVPR Reviews are out by banmeyoucoward
Got it, thx!
master3243 t1_j67mcbh wrote
Reply to comment by currentscurrents in [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
> A variant paper, Predictive Forward-Forward
Interesting, I'll have to read it at a more convenient time.
Do share your results if they are promising/fruitful.
currentscurrents OP t1_j67lie8 wrote
Reply to comment by master3243 in [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
I'm messing around with it to try to scale to a non-toy problem, maybe try to adapt it to one of the major architectures like CNNs or transformers. I'm not sitting on a ton of compute though, it's just me and my RTX 3060.
A variant paper, Predictive Forward-Forward, claims performance equal to backprop. They operate the model in a generative mode to create the negative data.
master3243 t1_j67jwad wrote
Reply to [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
Hinton says that it does not generalize as well on the toy problems he investigates. An algorithm not doing well on toy problems is often not a good sign. I predict that unless someone discovers a breakthrough, it will be worse than backprop despite operating faster (due to not having the bottlenecks as you suggested).
deepestdescent t1_j67iajc wrote
Reply to [D] ImageNet2012 Advice by MyActualUserName99
I use PyTorch data loaders to load batches into memory in the background. I believe TensorFlow has similar functionality with tf.data. This should make your data loading speed basically negligible if you have a few CPU cores lying around.
Dagusiu t1_j67hyzk wrote
I've written papers about methods that I applied for a patent for before the paper was published. Proceed with extreme caution.
yauangon t1_j67hram wrote
Reply to [D] Simple Questions Thread by AutoModerator
I'm trying to improve a CNN encoder, as a feature extractor for an AMT (automatic music transcription) model. As the model must be small and fast (for mobile deployment), we are limited to about 3-6 layers of 1D-CNN. I want to improve the encoder with residual block (of ResNet), but my question is: I don't known if Residual block would benefit on such a shallow CNN architecture? Thank everyone :D
Shevizzle t1_j67fv7x wrote
Reply to [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
ray would potentially be a good platform for this
knestleknox t1_j67ezg0 wrote
Reply to [D] MusicLM: Generating Music From Text by carlthome
As someone who works a lot with both music and ML, I'm really excited to see these multi-modal approaches. The image description -> music generation was really cool to see. But it would be incredible to see a (good/large) multi-modal model that can go from audio -> image. Free album artwork and visualizations for all my songs.
purplebrown_updown t1_j67e5vz wrote
Reply to [D] Meta AI Residency 2023 by BeautyInUgly
How are they hiring? I would be cautious about them rescinding offers. Seriously. I’ve heard of interns who had their offers rescinded after they’ve declined other offers.
Grenouillet t1_j67wexx wrote
Reply to comment by currentscurrents in [D] Could forward-forward learning enable training large models with distributed computing? by currentscurrents
Thats very interesting, is there a way to follow your progress?