Maleficent_Cod_1055 t1_j6jkz4b wrote on January 30, 2023 at 8:07 PM

Reply to comment by fasttosmile in [D] What's stopping you from working on speech and voice? by jiamengial

Tbh if you're still doing anything like word alignment or phone alignment the first thing people bring up is still Kaldi. Will check out Lhotse!

MysteryInc152 t1_j6jkmus wrote on January 30, 2023 at 8:05 PM

Reply to comment by currentscurrents in [N] OpenAI has 1000s of contractors to fine-tune codex by yazriel0

The human brain has trillions of synapses (the closest biological equivalent to parameters), is multimodal and evolution fine-tuned.

fasttosmile t1_j6jk30j wrote on January 30, 2023 at 8:01 PM

Reply to comment by jiamengial in [D] What's stopping you from working on speech and voice? by jiamengial

Everyone has been moving on from kaldi so it's a little weird to bring that up now.

If you're interested in a modern formats for speech data look into lhotse.

TheCoconutTree t1_j6jjb43 wrote on January 30, 2023 at 7:57 PM

Reply to [D] Simple Questions Thread by AutoModerator

Discrete features as training data:

Say I am using SQL table rows as training data input for a deep neural net classifier. One of the columns contains a number from 1-5 representing a discrete value, say type of computer connection. It could be wifi, mobile-data, LAN, etc. What would be the best way to represent as input features? Right now I'm thinking split into a five dimensional vector, one for each possible value. Then pass 0 or 1 depending on whether a given feature is selected. I'm worried that including the range of values as a single vector would lead to messed up learning since one discrete value doesn't have any meaningful closeness to it's nearest discrete neighbor.

AquacateEnojado t1_j6jjab9 wrote on January 30, 2023 at 7:56 PM

Reply to [D] What's stopping you from working on speech and voice? by jiamengial

Sleep

tennismlandguitar OP t1_j6jiktr wrote on January 30, 2023 at 7:52 PM

Reply to comment by Omnes_mundum_facimus in [D] ML Researchers/Engineers in Industry: Why don't companies use open source models more often? by tennismlandguitar

LOL thanks for adding this.

psma t1_j6jhwdl wrote on January 30, 2023 at 7:48 PM

Reply to comment by jiamengial in [D] What's stopping you from working on speech and voice? by jiamengial

Not sure how. If I have, e.g. a PyTorch model how do I deploy it for streaming data without having to rewrite it in another framework? (e.g. stateful convolutions, ability to receive an arbitrary number of samples as input, etc). It's doable, but mostly amounts to rewriting your model. This should be automated.

theunixman t1_j6jhf69 wrote on January 30, 2023 at 7:45 PM

Reply to comment by farmingvillein in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501

Right, turning it into an actual DSL would be much better, and then you'd have better semantics for the library. But honestly I'm bored talking about aesthetics already, peace.

farmingvillein t1_j6jgv48 wrote on January 30, 2023 at 7:41 PM

Reply to comment by theunixman in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501

And this isn't a good thing, it is a necessary thing--we do it because someone bundled some logic together and you need to interact with it.

None of this addresses whether or why something like Parsel is necessary as an intermediate step. The authors do very little to justify the necessity of an intermediate representation; there is no meaningful analysis of why it apparently performs better, nor an ablation analysis to try to close the gaps.

The key benefits--like enforced test cases--could, hypothetically, very easily be enforced in something like Python, or many other languages.

And given the massive volumes of training data we have for these other languages, there are a lot of good reasons to think that we should be able to see equal or better behavior than with a wholly manufactured pseudocode (effectively) language.

The paper would have been much more convincing and interesting if, e.g., they started with something like python and progressively added the restrictions that apparently helped Parsel provide higher quality results.

mettle OP t1_j6jgkz8 wrote on January 30, 2023 at 7:40 PM

Reply to comment by currentscurrents in [Discussion] ChatGPT and language understanding benchmarks by mettle

Sure, but the question is how often does it happen to get the right answer vs. the wrong answer and how would be measure that.

theunixman t1_j6jff5n wrote on January 30, 2023 at 7:33 PM

Reply to comment by farmingvillein in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501

We have to learn APIs all the time, and basically they're all DSLs that just don't admit they are so they're even harder.

farmingvillein t1_j6jdazy wrote on January 30, 2023 at 7:19 PM

Reply to comment by [deleted] in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501

This is, at best, a distinction without a difference.

The authors literally describe it as "language".

It gets "compiled".

It generates a "Parsel program".

It holds a distinct learning curve such that a user can be an "expert".

The point here is that it is a unique specification that needs to be separately learned--it asks the user to learn, in essence, a domain-specific language. Or, if you prefer, a domain-specific specification; the point stands either way.

pink-science t1_j6jc8qd wrote on January 30, 2023 at 7:13 PM

Reply to [N] Call for Tiny Papers @ ICLR, a DEI initiative by tfburns

it is not clear if tiny papers will be included in the proceedings or not. will they count as ICLR publications?

currentscurrents t1_j6jbokk wrote on January 30, 2023 at 7:09 PM

Reply to [Discussion] ChatGPT and language understanding benchmarks by mettle

I think hallucination occurs because of the next-word-prediction task on which these models were trained. No matter how good a model is, it can never predict the irreducible entropy of the sentence - the 1.5 bits per word or whatever that contains the actual information content. The best it can do is guess.

This is exactly what hallucination looks like; all the sentence structure is right, but the information is wrong. Unfortunately, this is also the most important part of the sentence.

[deleted] t1_j6j9yun wrote on January 30, 2023 at 6:58 PM

Reply to comment by farmingvillein in [R] Parsel: A (De-)compositional Framework for Algorithmic Reasoning with Language Models - Stanford University Eric Zelikman et al - Beats prior code generation sota by over 75%! by Singularian2501

[deleted]

fnbr t1_j6j9f11 wrote on January 30, 2023 at 6:55 PM

Reply to [R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co

Have you looked at some of the architectures that get rid of BatchNorm (e.g. NFNets)? In my experience, BatchNorm tends to be quite slow, so I wonder if there's some speed to be gained there.

jiamengial OP t1_j6j95fq wrote on January 30, 2023 at 6:53 PM

Reply to comment by psma in [D] What's stopping you from working on speech and voice? by jiamengial

Presumably this would be for through certain protocols like Websockets and WebRTC? Or more like direct integration to Zoom?

jiamengial OP t1_j6j8c8c wrote on January 30, 2023 at 6:49 PM

Reply to comment by jiamengial in [D] What's stopping you from working on speech and voice? by jiamengial

To go into your question further, one area that might be really interesting is open standards or formats for speech data; like the MLF formats in HTK and Kaldi but, like, modern, so that (to the point of some others here w.r.t. data storage costs) datasets can be hosted more centrally and people don't have to reformat them to their own data storage structures (which, let's face it, is basically someone's folder structure)

EmmyNoetherRing t1_j6j7zq4 wrote on January 30, 2023 at 6:46 PM

Reply to comment by mettle in [Discussion] ChatGPT and language understanding benchmarks by mettle

I wouldn’t mind being one of those folks. But you make a good point that the old rubrics may not be capturing it.

If you want to nail down what users are observing as its comparison to human performance, practically speaking you may need to shift to diagnostics that were designed to evaluate human performance. With the added challenge of avoiding tests where the answer sheet would already be in its training data.

jiamengial OP t1_j6j6ruc wrote on January 30, 2023 at 6:39 PM

Reply to comment by blackkettle in [D] What's stopping you from working on speech and voice? by jiamengial

If anything this is what's motivating me; getting Kaldi (or any of these other repos) to compile and run on your own data is usually painful enough that it's putting off anyone who isn't already knowledgeable in the area, where wrappers such as pykaldi and Montreal Forced Aligner try to result a lot of problems, but only really add to it.

I've personally had great experiences with repo's like NeMo, though that was mainly through nailing myself to a specific commit in the main branch and heavily wrapping various classes I needed to use (I still have no idea what a manifest file format should look like)

The field is still incredibly recipe-heavy in terms of setting up systems and running them; if you were someone testing the waters with speech processing (especially if you want to go beyond STT or vanilla TTS), there little to nothing that compares to the likes of HuggingFace for the text side

psma t1_j6j51oq wrote on January 30, 2023 at 6:28 PM

Reply to [D] What's stopping you from working on speech and voice? by jiamengial

Streaming inference support. Deploying ML models to work in real-time with little latency is a pain.

seanrescs OP t1_j6j2fef wrote on January 30, 2023 at 6:12 PM

Reply to comment by jiamengial in [D] DL university research PC suggestions? by seanrescs

It can be stored in an active lab environment or away if noise is an issue, its more about which will give more utility for the longest time. It seems a6000 is the better choice from Tim Dettmers, will probably go with that one if I can get one quoted at a good price

seanrescs OP t1_j6j1txs wrote on January 30, 2023 at 6:08 PM

Reply to comment by colugo in [D] DL university research PC suggestions? by seanrescs

Great article, thanks!

PropOnTop t1_j6izaws wrote on January 30, 2023 at 5:53 PM

Reply to comment by piman01 in [D] What's stopping you from working on speech and voice? by jiamengial

No no no, It's always the sun:

https://youtu.be/cYQTL-ws6p4

HermanCainsGhost t1_j6iy9fn wrote on January 30, 2023 at 5:46 PM

Reply to comment by Low_Basil9900 in [R] InstructPix2Pix: Learning to Follow Image Editing Instructions by Illustrious_Row_9971

Sounds like an issue you should talk to your psychologist about. I certainly feel no physical sensation when looking at AI art (or any art) beyond "oh this looks good" or "this looks ugly" (if those even count as physical sensations).

It's very weird to have such a visceral feeling of disgust just based on looking at art.

> the composites between different images to produce the final result

Lol, that's not how AI art works. Are you sure you're in the right place? See that's the problem being in a space like this - you are very likely talking to someone who actually knows how things work.

AI art works by denoising, it isn't a "composite". It isn't "mixing images". It doesn't have images to mix.

Stable Diffusion for example, was trained on 240 terabytes of data - 2.3 billion 512x512 images, and the models are between 2 to 8 gigabytes of data. That means equivalent to about 1-4 bytes of data per image (with a 512x512 image being a bit bigger than 250 kilobytes in total size).

Suffice to say, you cannot compress 250,000 bytes of data into 1-4 bytes of data (mathematically, it is impossible). If that level of compression was possible, that would be the bigger story compared to AI art, because data transmission just got a wholllllllllleeeeeee lot faster, by orders of magnitude.

So yeah, get out of here with that "composite" nonsense. There's no composite. It's literally mathematically impossible for there to be a composite.

Recent comments in /f/MachineLearning