CeFurkan OP t1_j7todbc wrote on February 9, 2023 at 9:18 AM

I used https://audo.ai/noise-removal for my own lectures: it’s more than good enough to make up for a poor microphone and background noise. You can try for free on your own audio samples and see for yourself!

Iunaml t1_j7tn4yr wrote on February 9, 2023 at 9:00 AM

Reply to comment by JackBlemming in [N] New Book on Synthetic Data: Version 3.0 Just Released by MLRecipes

> There's nothing wrong with relevant self promotion, especially if it's high quality material.

Who is the judge?

Do I really care of the quality, if it's a paid book that is not upfront about its price? What could it tell us about the author and the information contained inside the book?

master3243 t1_j7tmpsz wrote on February 9, 2023 at 8:54 AM

Reply to comment by DigThatData in [D] Are there emergent abilities of image models? by These-Assignment-936

Exactly, the beginning "Clip" part of the entire Dalle model is trained to take any english text and map it to an embedding space.

It's completely natural (and probably surprising if it doesn't happen) that Clip would map (some) gibberish words to a part of the embedding space that is sufficiently close in L2-distance to the projection of a real world.

In that case, the diffusion model would decode that gibberish word to a similar image generated by the real word.

wittfm t1_j7tl9c1 wrote on February 9, 2023 at 8:34 AM

Reply to What are the best resources to stay up to date with latest news ? [D] by [deleted]

Here

codename_failure t1_j7tl6f5 wrote on February 9, 2023 at 8:33 AM

Reply to comment by VectorSpaceModel in What are the best resources to stay up to date with latest news ? [D] by [deleted]

well done little AI!

pommedeterresautee OP t1_j7tk4fx wrote on February 9, 2023 at 8:18 AM

Reply to comment by SnooHesitations8849 in [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee

On large DL models like Whisper large, CPU is never on par with GPUs because CPU is latency oriented hardware and GPU is throughput oriented. The only ways large models are run on CPUs is by reducing the number of operations to perform like by sparsification or pruning.

Moreover, PyTorch is mostly C++ with a Python layer over it (for now at least, PyTorch 2.0 may be a start of change in this architecture). The Python layer brings most of the PyTorch latency.

And then, even C++ engine launching operations on GPU can not be on par with CUDA graphs (most of the time at least), because you have still to send instruction at a time, and there is still some latency overhead associated in running things that way, just much less than Python. With CUDA graphs there is almost none at all.There is a second thing not discussed here, it's that the graph of instructions is optimized.

Main drawback of CG is the memory overhead, you need at least to double the space taken for input tensors. On generative models with K/V cache, it matters as explained in this post. Plus you need to copy input tensors, which offsets a -very-small part of the gains (at least that s what we saw in our tests on Whisper and Bert / Roberta).

That is why TensorRT (a big C++ piece) for instance supports CUDA graphs.

Still, TBH, as you pointed out, the most important thing is that ... it's easier to build and run :-)

Fit_Schedule5951 t1_j7tk2db wrote on February 9, 2023 at 8:18 AM

Reply to [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan

Try denoiser from facebook

SnooHesitations8849 t1_j7tjdla wrote on February 9, 2023 at 8:08 AM

Reply to [P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl by pommedeterresautee

A C++ implementation on CPU would be on par with python's implementation on GPU. Just mind-blowing how much you can gain from using C++. But for sure, C++ is way harder to code.

[deleted] t1_j7tico3 wrote on February 9, 2023 at 7:55 AM

Reply to comment by [deleted] in [N] "I got access to Google LaMDA, the Chatbot that was so realistic that one Google engineer thought it was conscious. First impressions" by That_Violinist_18

[deleted]

Cantmentionthename t1_j7tfbam wrote on February 9, 2023 at 7:16 AM

Reply to comment by edjez in [D] Are there emergent abilities of image models? by These-Assignment-936

Dayum. That just sounds like generative communication.

starstruckmon t1_j7telrz wrote on February 9, 2023 at 7:08 AM

Reply to [D] Are there any AI model that I can use to improve very bad quality sound recording? Removing noise and improving overall quality by CeFurkan

Nvidia RTX voice

Hersmunch t1_j7tc38t wrote on February 9, 2023 at 6:39 AM

Reply to What are the best resources to stay up to date with latest news ? [D] by [deleted]

https://paperswithcode.com/

VectorSpaceModel t1_j7tbm5k wrote on February 9, 2023 at 6:34 AM

Reply to comment by nashtashastpier in What are the best resources to stay up to date with latest news ? [D] by [deleted]

hold on to your papers!

Mescallan t1_j7tblf5 wrote on February 9, 2023 at 6:34 AM

Reply to comment by amnezzia in [D] Are there emergent abilities of image models? by These-Assignment-936

word might not be correct, as it implies a consistent alphabet, but semantics aside, yes I believe that is what is happening

slashdave t1_j7tbl3o wrote on February 9, 2023 at 6:34 AM

Reply to What are the best resources to stay up to date with latest news ? [D] by [deleted]

https://sigmoid.social/explore

nashtashastpier t1_j7tbk3x wrote on February 9, 2023 at 6:33 AM

Reply to comment by VectorSpaceModel in What are the best resources to stay up to date with latest news ? [D] by [deleted]

What a time to be alive !

DigThatData t1_j7tb03a wrote on February 9, 2023 at 6:27 AM

Reply to comment by edjez in [D] Are there emergent abilities of image models? by These-Assignment-936

i'm not sure that's an emergent ability so much as it is explicitly what the model is being trained to learn. it's not surprising to me that there is a "painting signature" concept it has learned and samples from when it generates gibberish of a particular length and size in the bottom right corner (for example). that sounds like one of the easier "concepts" it would have learned.

ID4gotten t1_j7taz5k wrote on February 9, 2023 at 6:27 AM

Reply to [D] Are there emergent abilities of image models? by These-Assignment-936

Some 3 dimensional understanding and up/down/gravity seem possible. I think examples of light/shadow/reflection have already been shown. I can't see how it could ever do full tracing but maybe there are heuristics (or overfitting) to be found.

amnezzia t1_j7tav4g wrote on February 9, 2023 at 6:26 AM

Reply to comment by edjez in [D] Are there emergent abilities of image models? by These-Assignment-936

You mean it takes a mean vector of a cluster and makes up a word for it?

andreichiffa t1_j7t9ul8 wrote on February 9, 2023 at 6:15 AM

Reply to [D] Are there emergent abilities of image models? by These-Assignment-936

I am pretty sure that was an Anthropic paper first (Predictability and Surprise in Large Generative Models). Makes me truly wonder WTF exactly is going on in Google lately.

As to your question, no one has stacked enough attention layers yet, but there is very high probability that they will. Someone already mentioned the ability to spell, but it could potentially help with things such as hands, number of hands/feet/legs/arms/paws/tails and other things that make a lot of generated images today disturbing.

The issue will most likely be with funding enough data, given that unlike texts most images on the internet are copyrighted (cough Getty cough).

Recent comments in /f/MachineLearning