Recent comments in /f/MachineLearning

limb3h t1_j5lzdx6 wrote

Cerebras is pretty well suited for large language models like GPT3. Their latest generation product can be clustered easily to train huge models. I wouldn't say they're ending AMD and NVDA though, but in order for huge language models to be democratized, some disruptive technologies have to happen. No one other than whales today can afford to train GPT3.

1

EducationalLayer1051 OP t1_j5ltl0x wrote

I found a paper from the Lawrence Berkeley national laboratory referencing Hough Transform a few weeks ago! That led me to the same conclusion about projecting that geometry. But their example was a flat-roof commercial building so it only outlined the building. Based on what you know, do you think this method would be a good fit for typical residential roofs like the one I illustrated above? Thanks so much!

1

kernel_KP t1_j5lpxnn wrote

I have a dataset (unlabelled) containing a lot of audio files and for each file, I have computed the chromagram. I would need some advices for the implementation of a possibly efficient Neural Network to cluster these audio files relying on their chromagram. Consider this data to be already correctly pre-processed so chromagram have all the same size. Thanks a lot!

1

ArnoF7 t1_j5lknua wrote

I have the similar suspicion as well, that the training will be bottlenecked by the slow 1080. But I am wondering if it’s possible to treat 1080 as a pure VRAM extension?

Although it’s possible that the time spent on transferring between different memories makes the gain of having more VRAM pointless

1

WigglyHypersurface OP t1_j5ldsn7 wrote

The reason I'm curious is that FastText embeddings tend to work better on small corpora. I'm wondering if you took one of the small-data-efficient LLMs that you can train yourself on a few A100s (like ELECTRA) and changed the embeddings to a bag-of-character ngrams if you'd see further gains on small training sets.

1

terath t1_j5l8t4k wrote

Oh I see what you mean. I remember that there were some character level language models, but they fell out of favour for subwords as I think the accuracy difference wasn't enough to justify the extra compute required for the character level.

Reviewing the fast text approach, they still end up hashing the character-ngrams rather then training an embedding for each. This could introduce the same sorts of inconsistencies that you're observing. That said, the final fast text embeddings are already the sum of the character embeddings, so I'm not clear on how your approach is different than just using the final fast text embeddings.

3

Zyj t1_j5l7oog wrote

When i use 2 RTX 3090 with nVLink bridge plugged into PCIe 3.0 x8 slots each instead of PCIe 4.0 x16 slots, what kind of performance hit will i get?

1

WigglyHypersurface OP t1_j5l3vlk wrote

I have - the whole point of my post is this limits information sharing across tokens, depending on the split.

So, for example, if the tokenizer splits the -ed off the end of a rare verb - like "refactored" but does not for a common verb, like "calmed" it splits representations for the verbal morphology into two, when really those -ed endings serve the same function.

5

dojoteef t1_j5l399n wrote

This has been studied quite a bit. You can just follow the citation graph of the fastText paper: Enriching Word Vectors with Subword Information

For example, people have investigated sampling different subword tokenizations during training (Stochastic Tokenization with a Language Model for Neural Text Classification) and character-aware embeddings (CharBERT: Character-aware Pre-trained Language Model).

4

PulPol_2000 t1_j5kwl2u wrote

I have a project that would use AR Core and Google ML kit to be able to recognize vehicles from a video feed and besides recognizing the objects is that it will be able to know the distance measurement of the object from the origin camera point. I'm lost on how I would integrate the distance measurement into the object detected of the ML kit. sorry for lack of knowledge as I only entered the ML community. thanks in advance!

1