Recent comments in /f/MachineLearning

farmingvillein t1_j6iwb5v wrote

I like the big idea, and it is almost certainly indicative of one of the key tools to improve automated programming.

That said, I wish they had avoided the urge to build an intermediate programming language. This is likely unnecessary and is the type of semi-convoluted solution that you only come up with in an academic research lab (or out of true, deep product need--but I think that is highly unlikely the case).

My guess is that the same basic result in the paper could have been shown by using Python or Rust or similar as the root language, with a little work (time that you could have obtained by swapping out effort spent on the harry potter language development).

They do note:

> We generate 16 Python implementations per high-level plan on 100 randomly sampled problems and find that the performance drops to 6%.

But it isn't well-discussed (unless I skimmed too quickly) as to why a separate language is truly needed. They discussion advantages of Parsel, but there doesn't appear to be a deep ablation on why it is really necessary or where its supposed performance benefits come from, or how those could be enforced in other languages.

There is a bunch of discussion in the appendix, but IMO none of it is very convincing. E.g., Parsel enforces certain conventions around testing and validation...great, lets do that in Python or Rust or similar. Or--leveraging the value of LLMs--through a more natural language interface.

Yes, there is benefit to bridging these gap in a "universal" manner...but, as per https://xkcd.com/927/, a new programming language is rarely the right solution.

20

8-Bit_Soul t1_j6iv8g8 wrote

Ball park conceptual number - how long does training take for AI tasks using medical volumetric data? (for example, something along the lines of training for automated segmentation of an organ using 100 CT studies). Are we talking hours? Days? Weeks?

I'm new to ML and I will need a better GPU (and a PSU and maybe a bigger case), and the amount I would be willing to invest depends on how much of a difference it would make in practice. I figure I can get a used RTX 3090 installed for about $1000 or a new RTX 4090 for about $2000, and if training correlates with AI benchmarks, then it looks like a task that takes 1 day for an A100 GPU would take 1.1 days with an RTX 4090 and 1.7 days with an RTX 3090. If the extra $1k reduces the time by weeks or days, then it should eventually be worth the cost. If it reduces the time by hours or minutes, then it's probably not worth the cost.

Thanks!

1

blackkettle t1_j6itdxo wrote

How familiar are you with the existing frameworks out there for this topic space? There's a lot of active work here; I'm curious about what you are focusing on, and how that reflects against the shortcomings of existing frameworks:

- https://github.com/kaldi-asr/kaldi

- https://github.com/k2-fsa

- https://github.com/espnet/espnet

- https://github.com/speechbrain/speechbrain

- https://github.com/NVIDIA/NeMo

- https://github.com/microsoft/UniSpeech

- https://github.com/topics/wav2vec2 [bajillions of similar]

- https://github.com/BUTSpeechFIT/VBx

this list is of course incomplete, but there is a _lot_ of active work in this space and a lot of opensource. Recently you've also got larger and larger public datasets becoming available. The SOTA is really getting close to commoditization as well.

What sort of OSS intersection or area are you focusing on, and why?

19

fakesoicansayshit t1_j6itb00 wrote

Reply to comment by gunshoes in [D] Remote PhD by TheRealMrMatt

I finished half of a PhD online over a decade ago.

Half your PhD is doing advanced classes, the other half is being a researcher's assistant.

Not sure a PhD means anything nowadays.

−4

qalis t1_j6ir4fh wrote

Yes, you can. Variables in tabular learning are (in general) independent in terms of preprocessing. In fact, in most cases you will perform such different preprocessings, e.g. one-hot + SVD for high cardinality categorical variables, binary encoding for simple binary choices, integer encoding for ordinal variables.

2

qalis t1_j6iqvql wrote

Somewhat more limited than your question, but I know two such papers: "Tunability: Importance of Hyperparameters of Machine Learning Algorithms" P. Probst et al., and "Hyperparameters and tuning strategies for random forest" P. Probst et al.

Both are on Arxiv. First one concerns tunability of multiple ML algorithms, i.e. how sensitive are they in general to hyperparameter choice. Second one delves deeper into the same area, but specifically for random forests, gathering results from many other works. Using those ideas, I was able to dramatically decrease the computational resources for tuning by better designing hyperparameter grids.

1

gunshoes t1_j6im7go wrote

Atm, my hard drive failed and SSD doesn't come until Tuesday.

In actuallity, I work in the space and the main limitation is hardware. Most small problems still require a ton of storage space and Google Collab ain't giving me a terribyte for audio until I start paying tiers.

2

duck_mopsi t1_j6iksqy wrote

Welp, it seemed like the problem was, that the inputs need to be defined as 2-dimensional with the sequence length as the first parameter. I thought one would give the RNN only 1 dimension of latent noise and get the sequence through reiterating it trough the RNN.

1

jiamengial t1_j6iiux3 wrote

Where do you plan to put the machine? If it's anywhere near where you (or anyone else) work I'd recommend getting it liquid cooled if you want to save your hearing.

The A6000s don't have active cooling on themselves and are definitely meant to last a whole lot longer than the 4090's, so will be better if you plan to use the machine for quite a while or want to retain resell value for the future

1

RedYican t1_j6ih1ui wrote

Does it make sense to combine Tsetlin Machine with NNs (language understanding) via triplets?

If we had some statements S_n about entity X and then some other statement as training example Sn+1 could one use TM to discover what other statements matter for Sn+1?

EDIT: found your other paper - https://arxiv.org/pdf/2102.10952.pdf

1