Recent comments in /f/MachineLearning

maizeq t1_j69vuec wrote

Nice. How are you converting between dataset size and number of tokens?

Doesn’t common crawl get deduplicated and that’s why the number of usable tokens decreases - or is it also curation? How much of that 380TiB is actually utilisable.

Given the ostensibly impressive performance of the bilingual GLM-130B (Chinese+English) model that came out of Tsinghua university that might very well be the case.

1

pandasiloc t1_j69uj8v wrote

The human brain doesn’t work like this. It’s not a question about “being smart” or simply having learned something previously. In order to perform an implementation of this on the spot in a stressful situation, the relevant theory needs to be very fresh in your memory.

I highly doubt you would be able to reproduce a proof of the Fundamental Theorem of Algebra on the spot, even though it’s a simple concept that many people learn in middle school.

I would probably fail this question because I haven’t worked with deep learning much since I graduated 4 years ago. I majored in math at an Ivy League school and graduated with a pretty good GPA, so I don’t think my math is ‘weak’, either.

This kind of question does not make sense to ask on a live call unless someone claims to be working with deep learning architectures as part of their daily work.

3

Mechanical_Number t1_j69tl7z wrote

I don't think it is great question generally but this also depends on the position, the level of technical aptitude and seniority expected.

It is not something I would expect most people to rock up with in 15' while someone is looking over their shoulder. Probably it is a question that will help me to distinguish a kick-ass junior than someone who has a standard Keras syntax understanding but for more seniors roles this is likely a bad indicator. Far less senior engineer tasks fail because the person couldn't code backprop from scratch than because the wrong architecture was chosen, or they didn't know what part of an existing pipeline to optimise, or where to look for potential bugs given a particular unexpected behaviour, etc.

In general, I think it was more a point of "showing your thought process" than actually getting the code right. I would "abstract" things quite a bit first and then "start coding". But as others said, if they absolutely need to "split hairs" that is a way to do it too.

Best of luck with your interview in any case!

2

xorbinant_ranchu t1_j69tasv wrote

Would be interested to know what kind of experience you have?

I think literally none of the ML engineers I work with (myself very much included) could pull a chain rule implementation out in 10 mins.

90% of this job is just finding an existing implementation of something to make work.

2

Lord_of_Many_Memes t1_j69sr57 wrote

my general feeling is even if it works, it will take more steps to get to the same loss than backprop, which in some sense cancel out the hardware advantage of the forward forward setting. I tried that on gpt and wikitext it just doesn’t converge on real problems, maybe something crucial is still missing.

1

Featureless_Bug t1_j69nsjq wrote

>It's definitely an easy question if it was a common question and hence featured on leetcode, where candidates would practice it before the interview.

I mean, if it was on leetcode, it wouldn't make sense to ask it in the interview, because then you will get prepared answers.

>Someone with 2 years of experience don't remember the knitty gritty maths to implement NN from scratch

If you cannot apply chain rule, your math is very weak. If your math is very weak, you probably won't be a great ML engineer. It's not that you need a lot of math, but you need a broad general understanding of what can work and what can't quite often, actually.

−4

OkAssociation8879 OP t1_j69n96y wrote

It's definitely an easy question if it was a common question and hence featured on leetcode, where candidates would practice it before the interview.

Someone with 2 years of experience don't remember the knitty gritty maths to implement NN from scratch. This question is more suited for someone fresh out of college, in my opinion.

3

Featureless_Bug t1_j69mojw wrote

I mean, it is kind of a very basic question and it takes like 15 minutes at most if you understand what you are doing. It is similar to leetcode-style questions for SE, it is not something that you will do on the job, but if you are smart, you will pass easily, and if you are not, you will struggle - so a great interview task

−6

SteffenGO t1_j69kd3w wrote

I think often times with these absurdly complicated interview questions, they’re less interested in the final answer and more interested that you have the correct knowledge and problem solving skills to work through how your would attempt it. Often times with highly competitive positions they’re splitting hairs for the best candidate and the most extreme questions can nudge their decision one way or another when on paper candidates are fairly equivalent. Super stressful to be put on the spot like that nonetheless.

4

marcingrzegzhik t1_j69k6qd wrote

It's definitely a valid interview question, but it's not something you should be asked to do during a live call. It's too much to tackle in the limited time of a call and it's not a fair way to assess your skills. I would suggest asking to review a code sample you've written in the past that demonstrates your knowledge and experience. That would be a better way to assess your skills and it would be much less stressful. Good luck!

6

mocny-chlapik t1_j69h5ud wrote

More and more information is popping up about the huge human annotation efforts going on at OpenAI. It seems that the secret ingredient missing was money, that could buy you lots of relevant data. This has several implications: (1) It might be impossible to replicate some of these models without millions of dollars invested in similar data collection efforts, (2) The range of applications can actually be broader than thought previously, if we are willing to pay people to generate the data. (3) They were not able to find significant improvements with scaling anymore. The scaling era might be nearly over.

34

SaifKhayoon t1_j69e65n wrote

They had a problem sourcing labeled training data of 3D videos, you can tell this tech is still early from the shield in the bottom right example

They could generate a labeled 3D environments from 2D images using InstantNGP and GET3D with Laion's labeled dataset of 5.85 billion CLIP-filtered image-text pairs to create a useful dataset for training because this currently relies on a workaround of only being trained on text-image pairs and unlabeled videos due to lack of labeled 3D training data.

1

theoryanddata t1_j69dx28 wrote

I remember reading about this type of concept, and iirc it does seem that there is quite a bit of local learning in biological neural networks. But global convergence of the model seems like a challenge with this type of scheme. Maybe there's some way to incorporate a periodic global backprop to address that? Has anyone tried it? Or maybe you don't even need it and the problem will disappear with enough scale

2