Recent comments in /f/MachineLearning

knestleknox t1_j5d0852 wrote

oh wow this looks super interesting. I had no idea what Tsetlin machines were until today. It's actually something I've basically tried emulating with standard ML approaches.

I have an unsolved mathematics problem that I've been working on for almost a decade since my professor showed me in undergrad. It's a very specific problem that maybe 10-20 combinatorists are working on or aware of and it's still unsolved to this day. One of the biggest parts of the problem is finding a bijection between these two infinite classes of integer partitions. Being able to find a rules-based bijection would prove a large part of the overall problem.

My idea was to try and model these bijections as a supervised learning problem and feed them into various ML models. I've tried standard feed-forward networks, auto encoders, CNNs, and many more. But it's never worked because of the rules-based nature of the problem. I suspect the rules that govern the bijecection are a bit too complicated to be modeled by the approximation methods found in standard models. But this looks very promising or at least something to play around with. I'm going to try it out this weekend. Thanks!

8

dancingnightly t1_j5c31u6 wrote

Oh ok. Thank you for taking the time to explain. I see that this graph approach isn't for extending beyond the existing context of RoBERTa/similar transformer models, but rather enhancing performance.

I was hoping graphs could capture relational information (in a way compatible with transformer embeddings) within the document at far parts between it essentially (like: for each doc.ents, connect in a fully connected graph), sounds like this dynamic graph size/structure per document input wouldn't work with the transformer embeddings for now though.

1

UnderstandingDry1256 t1_j5c0y0o wrote

What are the training strategies used for GPT models? Are transformer blocks or layers trained independently? Are they trained using some subset of data and fine tuned then?

I would appreciate any references or details :)

2

gunshoes t1_j5bg8j0 wrote

Yeah, I may have come off a bit too negative because of second hand responses. You'll still have good opportunities to acquire experience and build a starting resume. The glass ceiling just comes down hard for some people if interested in mainline ML.

1

Tgs91 t1_j5bemlb wrote

Piggybacking on this. There are a lot of data science jobs that hire out of undergrad. A lot of the work is actually more data analysis and simple business analytics, but you can get started with a broad data science role, learn on the job, and specialize in ML

4

gunshoes t1_j5bc1gg wrote

It's basically masters/PhD. A lot of ML involves research work and those degrees are strong filtering mechanisms for selecting that skill. You caaaan do ML adjacent work but it's a lot of the grunt tasks and you'll grow frustrated in about a year.

Source: PhD student. Collaborate in NLP/CompLing lab. The recurring refrain mong the masters students in that lab is they wanted to kill themselves at Conversational AI jobs.

1

currentscurrents t1_j5b6jf2 wrote

Interesting! I think it's good to remember that the important part of neural networks is the optimization-based learning process - you can run optimization on things other than neural networks. Like how plenoxels got 100x speedup over NeRF by running optimization on a structure more naturally suited to 3D voxel data.

I do wonder how scalable TMs are to less toy tasks though. MINST is pretty easy in 2023, and I think you can solve the BBC Sports dataset just by looking for keywords.

23

I_will_delete_myself t1_j5b4ccq wrote

It doesn't make any sense to run neural network on the client side at all. Youtube takes a moment to process your video before it gets uploaded, which is probably when their deep learning algorithms get to work. After that you just save the frames and don't run the neural networks again.

This is a valid guess because it takes a lot longer to upload a video on Youtube in comparison to other platforms that do no checks at all.

1

axm92 t1_j5b2ug8 wrote

I’m not sure if I understand you, but you can generate these graphs over long documents, and then run a GNN.

For creating graphs over long documents, one trick I’ve used in my past papers is to create a graph per 3 paragraphs, and then merge these graphs (by fusing similar nodes).

1