Recent comments in /f/MachineLearning
CeFurkan OP t1_j7tob1u wrote
Reply to comment by logsinh in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
>Nvidia RTX voice
example link that you can download extract audio quickly if you wish : https://youtu.be/2zY1dQDGl3o
also here 5 min example speech : https://sndup.net/stjs/
CeFurkan OP t1_j7to9qw wrote
Reply to comment by starstruckmon in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
this is pre recording. how can I use it to process this recordings fast?
CeFurkan OP t1_j7to8kq wrote
Reply to comment by Dry-Feature113 in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
here example lecture. i will clean over 100 lectures: https://youtu.be/2zY1dQDGl3o
here 5 min example part of this video : https://sndup.net/stjs/
CeFurkan OP t1_j7tnxp9 wrote
Reply to comment by logsinh in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
yep not confidental
how can I reach you? here my email : monstermmorpg@gmail.com
jeanfeydy t1_j7tnqu0 wrote
Reply to [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
I used https://audo.ai/noise-removal for my own lectures: it’s more than good enough to make up for a poor microphone and background noise. You can try for free on your own audio samples and see for yourself!
Iunaml t1_j7tn4yr wrote
Reply to comment by JackBlemming in [N] New Book on Synthetic Data: Version 3.0 Just Released by MLRecipes
> There's nothing wrong with relevant self promotion, especially if it's high quality material.
Who is the judge?
Do I really care of the quality, if it's a paid book that is not upfront about its price? What could it tell us about the author and the information contained inside the book?
master3243 t1_j7tmpsz wrote
Reply to comment by DigThatData in [D] Are there emergent abilities of image models? by These-Assignment-936
Exactly, the beginning "Clip" part of the entire Dalle model is trained to take any english text and map it to an embedding space.
It's completely natural (and probably surprising if it doesn't happen) that Clip would map (some) gibberish words to a part of the embedding space that is sufficiently close in L2-distance to the projection of a real world.
In that case, the diffusion model would decode that gibberish word to a similar image generated by the real word.
wittfm t1_j7tl9c1 wrote
codename_failure t1_j7tl6f5 wrote
Reply to comment by VectorSpaceModel in What are the best resources to stay up to date with latest news ? [D] by [deleted]
well done little AI!
pommedeterresautee OP t1_j7tk4fx wrote
Reply to comment by SnooHesitations8849 in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee
On large DL models like Whisper large, CPU is never on par with GPUs because CPU is latency oriented hardware and GPU is throughput oriented. The only ways large models are run on CPUs is by reducing the number of operations to perform like by sparsification or pruning.
Moreover, PyTorch is mostly C++ with a Python layer over it (for now at least, PyTorch 2.0 may be a start of change in this architecture). The Python layer brings most of the PyTorch latency.
And then, even C++ engine launching operations on GPU can not be on par with CUDA graphs (most of the time at least), because you have still to send instruction at a time, and there is still some latency overhead associated in running things that way, just much less than Python. With CUDA graphs there is almost none at all.There is a second thing not discussed here, it's that the graph of instructions is optimized.
Main drawback of CG is the memory overhead, you need at least to double the space taken for input tensors. On generative models with K/V cache, it matters as explained in this post. Plus you need to copy input tensors, which offsets a -very-small part of the gains (at least that s what we saw in our tests on Whisper and Bert / Roberta).
That is why TensorRT (a big C++ piece) for instance supports CUDA graphs.
Still, TBH, as you pointed out, the most important thing is that ... it's easier to build and run :-)
Fit_Schedule5951 t1_j7tk2db wrote
Reply to [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
Try denoiser from facebook
SnooHesitations8849 t1_j7tjdla wrote
A C++ implementation on CPU would be on par with python's implementation on GPU. Just mind-blowing how much you can gain from using C++. But for sure, C++ is way harder to code.
[deleted] t1_j7tico3 wrote
Cantmentionthename t1_j7tfbam wrote
Reply to comment by edjez in [D] Are there emergent abilities of image models? by These-Assignment-936
Dayum. That just sounds like generative communication.
starstruckmon t1_j7telrz wrote
Hersmunch t1_j7tc38t wrote
VectorSpaceModel t1_j7tbm5k wrote
Reply to comment by nashtashastpier in What are the best resources to stay up to date with latest news ? [D] by [deleted]
hold on to your papers!
Mescallan t1_j7tblf5 wrote
Reply to comment by amnezzia in [D] Are there emergent abilities of image models? by These-Assignment-936
word might not be correct, as it implies a consistent alphabet, but semantics aside, yes I believe that is what is happening
slashdave t1_j7tbl3o wrote
nashtashastpier t1_j7tbk3x wrote
Reply to comment by VectorSpaceModel in What are the best resources to stay up to date with latest news ? [D] by [deleted]
What a time to be alive !
DigThatData t1_j7tb03a wrote
Reply to comment by edjez in [D] Are there emergent abilities of image models? by These-Assignment-936
i'm not sure that's an emergent ability so much as it is explicitly what the model is being trained to learn. it's not surprising to me that there is a "painting signature" concept it has learned and samples from when it generates gibberish of a particular length and size in the bottom right corner (for example). that sounds like one of the easier "concepts" it would have learned.
ID4gotten t1_j7taz5k wrote
Some 3 dimensional understanding and up/down/gravity seem possible. I think examples of light/shadow/reflection have already been shown. I can't see how it could ever do full tracing but maybe there are heuristics (or overfitting) to be found.
amnezzia t1_j7tav4g wrote
Reply to comment by edjez in [D] Are there emergent abilities of image models? by These-Assignment-936
You mean it takes a mean vector of a cluster and makes up a word for it?
andreichiffa t1_j7t9ul8 wrote
I am pretty sure that was an Anthropic paper first (Predictability and Surprise in Large Generative Models). Makes me truly wonder WTF exactly is going on in Google lately.
As to your question, no one has stacked enough attention layers yet, but there is very high probability that they will. Someone already mentioned the ability to spell, but it could potentially help with things such as hands, number of hands/feet/legs/arms/paws/tails and other things that make a lot of generated images today disturbing.
The issue will most likely be with funding enough data, given that unlike texts most images on the internet are copyrighted (cough Getty cough).
CeFurkan OP t1_j7todbc wrote
Reply to comment by Fit_Schedule5951 in [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan
>denoiser
I need a post-processor for existing recordings. Would that work for that? could give me link?