Recent comments in /f/MachineLearning
limb3h t1_j5m01yl wrote
Reply to comment by zepmck in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon
They're trying pretty hard, but Nvidia has spent thousands of man years on this stuff and built ecosystem and community around it. It's not easy. Plus it's hard for AMD to hire the best software folks.
limb3h t1_j5lzdx6 wrote
Reply to comment by memberjan6 in [News] AMD Instinct MI300 APU for AI and HPC announced by samobon
Cerebras is pretty well suited for large language models like GPT3. Their latest generation product can be clustered easily to train huge models. I wouldn't say they're ending AMD and NVDA though, but in order for huge language models to be democratized, some disruptive technologies have to happen. No one other than whales today can afford to train GPT3.
limb3h t1_j5lyue0 wrote
How many APUs can be connected together via IF? Hopefully they can do 8-16 to challenge DGX.
londons_explorer t1_j5lwy1o wrote
Reply to comment by EducationalLayer1051 in [D] Automated Extraction of Building Geometry by EducationalLayer1051
Oh, and this paper has code published.
londons_explorer t1_j5lwbk1 wrote
Reply to comment by EducationalLayer1051 in [D] Automated Extraction of Building Geometry by EducationalLayer1051
This paper has their algorithm in pseudocode:
taleofbenji t1_j5lw5ie wrote
I love your book and refer to it often. I keep hitting F5 for Chapter 19. :-)
EducationalLayer1051 OP t1_j5lv2sg wrote
Reply to comment by londons_explorer in [D] Automated Extraction of Building Geometry by EducationalLayer1051
Wonderful. I did not think about curved/arched roof surfaces. The last thing I want to do is make this more complicated than it needs to be, so thank you! If you don't mind, where might I find an algorithm to test out?
promiise t1_j5lu8wl wrote
Nice, thanks for sharing your hard-work!
EducationalLayer1051 OP t1_j5ltl0x wrote
Reply to comment by Pavarottiy in [D] Automated Extraction of Building Geometry by EducationalLayer1051
I found a paper from the Lawrence Berkeley national laboratory referencing Hough Transform a few weeks ago! That led me to the same conclusion about projecting that geometry. But their example was a flat-roof commercial building so it only outlined the building. Based on what you know, do you think this method would be a good fit for typical residential roofs like the one I illustrated above? Thanks so much!
Own_Quality_5321 t1_j5lt463 wrote
Nice. I wil have a look and possibly recommend it. Thanks for sharing, that must have been a huge amount of work
Philpax t1_j5lqko9 wrote
Awesome! I'll add it to my reading list :)
kernel_KP t1_j5lpxnn wrote
Reply to [D] Simple Questions Thread by AutoModerator
I have a dataset (unlabelled) containing a lot of audio files and for each file, I have computed the chromagram. I would need some advices for the implementation of a possibly efficient Neural Network to cluster these audio files relying on their chromagram. Consider this data to be already correctly pre-processed so chromagram have all the same size. Thanks a lot!
ArnoF7 t1_j5lknua wrote
Reply to comment by FastestLearner in [D] Multiple Different GPUs? by Maxerature
I have the similar suspicion as well, that the training will be bottlenecked by the slow 1080. But I am wondering if it’s possible to treat 1080 as a pure VRAM extension?
Although it’s possible that the time spent on transferring between different memories makes the gain of having more VRAM pointless
[deleted] t1_j5ldutu wrote
Reply to [D] Simple Questions Thread by AutoModerator
[deleted]
WigglyHypersurface OP t1_j5ldsn7 wrote
Reply to comment by terath in [D] Embedding bags for LLMs by WigglyHypersurface
The reason I'm curious is that FastText embeddings tend to work better on small corpora. I'm wondering if you took one of the small-data-efficient LLMs that you can train yourself on a few A100s (like ELECTRA) and changed the embeddings to a bag-of-character ngrams if you'd see further gains on small training sets.
terath t1_j5l8t4k wrote
Reply to comment by WigglyHypersurface in [D] Embedding bags for LLMs by WigglyHypersurface
Oh I see what you mean. I remember that there were some character level language models, but they fell out of favour for subwords as I think the accuracy difference wasn't enough to justify the extra compute required for the character level.
Reviewing the fast text approach, they still end up hashing the character-ngrams rather then training an embedding for each. This could introduce the same sorts of inconsistencies that you're observing. That said, the final fast text embeddings are already the sum of the character embeddings, so I'm not clear on how your approach is different than just using the final fast text embeddings.
Zyj t1_j5l7oog wrote
Reply to [D] Simple Questions Thread by AutoModerator
When i use 2 RTX 3090 with nVLink bridge plugged into PCIe 3.0 x8 slots each instead of PCIe 4.0 x16 slots, what kind of performance hit will i get?
boadie t1_j5l6tmh wrote
Reply to comment by knestleknox in [R] New Tsetlin machine learning scheme creates up to 80x smaller logical rules, benefitting hardware efficiency and interpretability. by olegranmo
You might find this paper on the limits of current models very interesting: https://arxiv.org/pdf/2207.02098.pdf
WigglyHypersurface OP t1_j5l49mq wrote
Reply to comment by dojoteef in [D] Embedding bags for LLMs by WigglyHypersurface
Thanks these are helpful. Seems like "embedding bag" is used in ML libraries but not always in papers.
Edit: from a quick look neither of these is actually just an embedding bag, rather different approaches to incorporating subword information.
WigglyHypersurface OP t1_j5l3vlk wrote
Reply to comment by terath in [D] Embedding bags for LLMs by WigglyHypersurface
I have - the whole point of my post is this limits information sharing across tokens, depending on the split.
So, for example, if the tokenizer splits the -ed off the end of a rare verb - like "refactored" but does not for a common verb, like "calmed" it splits representations for the verbal morphology into two, when really those -ed endings serve the same function.
dojoteef t1_j5l399n wrote
Reply to [D] Embedding bags for LLMs by WigglyHypersurface
This has been studied quite a bit. You can just follow the citation graph of the fastText paper: Enriching Word Vectors with Subword Information
For example, people have investigated sampling different subword tokenizations during training (Stochastic Tokenization with a Language Model for Neural Text Classification) and character-aware embeddings (CharBERT: Character-aware Pre-trained Language Model).
terath t1_j5kz6tz wrote
Reply to [D] Embedding bags for LLMs by WigglyHypersurface
Have you not heard of byte pair encoding? There are plenty of subword tokenizers and many language models are built on them.
Here is a quick article on them: https://towardsdatascience.com/byte-pair-encoding-subword-based-tokenization-algorithm-77828a70bee0
PulPol_2000 t1_j5kwl2u wrote
Reply to [D] Simple Questions Thread by AutoModerator
I have a project that would use AR Core and Google ML kit to be able to recognize vehicles from a video feed and besides recognizing the objects is that it will be able to know the distance measurement of the object from the origin camera point. I'm lost on how I would integrate the distance measurement into the object detected of the ML kit. sorry for lack of knowledge as I only entered the ML community. thanks in advance!
[deleted] t1_j5kwjgy wrote
Reply to comment by [deleted] in [D] Is it a time to seriously regulate and restrict AI research? by Baturinsky
[removed]
aristotle137 t1_j5m25xi wrote
Reply to [P] New textbook: Understanding Deep Learning by SimonJDPrince
Btw, I absolutely loved your computer vision textbook, clear, comprehensible and so much fun! Best visulations in the biz. Also loved your UCL course on the subject, I was there 2010/2011 -- will definitely check out the next book