Recent comments in /f/MachineLearning

tblume1992 t1_j8y9oti wrote

  1. MLForecast treats it more like a time series - it does differencing and moving averages as levels to encode the general level of each time series along with the ar lags. Not entirely necessary as you can just scale with like a standard scaler or even box cox at the time series level and pass a time series 'id' as a categorical variable to lightgbm and outperform MLForecast although it is pretty snappy with how they have it written.
  2. I honestly just wouldn't use Prophet in general...But if you have 50 regressors it (I believe) fits them with a normal prior which is equivalent to a ridge regression so it shrinks the coefficients but you are stuck with this 'average' effect.
  3. ARIMAX absolutely still has a place but it really all comes down to your features. If you you have good quality predictive features then it is usually better to do ML and 'featurize' the time pieces. You lose out on the time component but gain a lot due to the features. There are other issues like now you have to potentially forecast for those features. The alternative is having bad features. If that is the case then usually you are stuck with just standard time series methods. So it really is 100% dependent on your data and if there is use in learning stuff across multiple time series or not.

An alternative view is hierarchical forecasting which sometimes works well to take advantage of higher level seasonalities and trends that may be harder to see at the lower level and outperforms ML a good chunk in my experience unless you have good regressors.

As many are saying - SOTA are boosted trees with time features. If the features are bad then it is TS stuff like arimax. The best way to find out is to test each.

Edit: In regards to M5 - there was a lot of 'trickery' done to maximize the cost function there so it might not be 100% super useful, at least in my experience.

4

PassionatePossum t1_j8y9oam wrote

Reply to comment by [deleted] in [D] Coauthor Paper? by [deleted]

I assume that you are based in the U.S. I'm not really familiar with the U.S. system of "grad school" so take what I say with a grain of salt.

Publishing a paper is certainly a good way to show your professor, that you are capable of doing research but probably not absolutely necessary. Having a reputation as a reliable and capable student should also go a long way to convince your professor that you are a good cancidate.

Working with one of the PhD students on their research project should also be a good way to earn your professor's trust.

1

dancingnightly t1_j8y81v9 wrote

Do you know of any kind of similar encoding where you vectorise relative time? as multiple proportions of completeness, if that makes sense?

​

Say, completeness within a paragraph, within a chapter, within a book? (Besides sinusidal embeddings which push up the number of examples you need)

3

dancingnightly t1_j8y7fny wrote

"If you look at the internals, it's a nightmare. A literal nightmare."

Yes, the copy paste button is heavily rinsed at HF HQ.

But you won't believe how much easier they made it to run, tokenize and train models in 2018-19, and at that, train compatible models.

We probably owe a month of NLP progress just to them coming in with those one liners and sensible argument API surfaces.

​

Now, yes, it's getting crazy - but if there's a new paradigm, a new complex way to code, then a similar library will simplify it, and we'll mostly jump there except for legacy. It'll become like scikit learn (although that still holds up for most real ML tasks), lots of finegrained detail and slightly questionable amounts of edge cases (looking at the clustering algorithms in particular), but as easy as pie to keep going.

​

I personally couldn't ask for more. I was worried they were going to push auto-switching models to their API at some point, but they've been brilliant. There are bugs, but I've never seen them in inference(besides your classic CUDA OOM), and like Fit_Schedule5951 says, it's all about that with HF.

1

Appropriate_Ant_4629 t1_j8y3koe wrote

> Hi, I lead product for Colab.

Thanks for your responses here!

And thank google's management chain above you for allowing you to represent the product here.

Your comments here just saved a number of subscriptions that would have otherwise canceled.

33

cubej333 t1_j8y25e7 wrote

Reply to comment by [deleted] in [D] Coauthor Paper? by [deleted]

I would expect that good recommendations by known people in the field, collaborated by research productivity, would be excellent to get into graduate school. Maybe not to get a great job after graduate school, but you would have all of graduate school to get first author papers.

Arguably if you have a number of first author papers out of undergrad, you don't need graduate school.

1

DigThatData t1_j8xxnpp wrote

unrelated to OP: what is the "best practice" method for a notebook to self-test if it's running in a colab environment? i think the method I'm currently using is something like

probably_colab = False
try:
    import google.colab
    probably_colab = True
except ImportError:
    pass

which I'm not a fan of for a variety of reasons. what would you recommend?

5

Mental-Reference8330 t1_j8xup7w wrote

in the early days, researchers considered the architecture itself to be a form of regularization. LeCunn didn't invent it, but he did popularize the idea that a convolutional layer (like LeNet in his case) is like a fully-connected layer, but constrained to only allow solutions where the layer weights could be expressed in terms of a convolution kernel. In their introduction, ResNets were also motivated by the fact that they're "constrained" to start from better minima, even though you could also convert a resnet model to a fully-connected model without loss of precision.

1

ckperry t1_j8xtufq wrote

lol please do complain very loudly if we 10x your prices! and thank you!!

in this case it appears only the messaging was affected, and nobody was charged the 94 euros thankfully. I'll update when we get our page fixed. thanks again!

48

aCuRiOuSguuy t1_j8xsce1 wrote

I am currently a graduate student in Computer Science and am taking a class that talks about the foundation of Machine Learning. The class is very math rigorous in nature.

The textbook that we use is Foundations of Machine Learning by M. Mohri, A. Rostamizadeh, A. Talwalkar.

https://github.com/paullintilhac/Machine-Learning/blob/master/Foundations%20of%20Machine%20Learning%20by%20M.%20Mohri%2C%20A.%20Rostamizadeh%2C%20A.%20Talwalkar.pdf

I am seeking a paid private tutor to help me with the content and homework of the class. Pay is negotiable!

1

FreePenalties OP t1_j8xrm5m wrote

Thank you very much for the response, will edit the post to be less alarmist. Would also like to just say thank you for making a great platform for data science collaboration, and also for finally bringing pro to scandinavia :D it is a great value product and im very happy to pay 9 euro for it, but 94 would definitely have been too much.

45

weeeeeewoooooo t1_j8xoy8u wrote

This is a great question. Steve Brunton has some great videos about dynamical systems and their properties that are very accessible. This one I think does a good job showing the behavioral relationship between the eigenvalues and the underlying system: https://youtu.be/XXjoh8L1HkE

Recursive application of a system (model) over a "long" period of time gets rid of transients, so the system will fall onto the governing attractors of the system, which are generally dictated by the eigenvalues of the system. The recursive application also helps isolate the system so you are observing the model autonomously, rather than being driven by external inputs. This helps you tease out how expressive your model actually is versus how dependent it is on you feeding it from the target system's observations, which helps reduce over fitting and reduces bias.

6

cubej333 t1_j8xo81r wrote

Generally the most important thing is your letter's of recommendation, which should be good if the professors are putting you on the paper (so the paper is collaborative of that). A first author on an important paper is probably better, but if someone was a first author on an important paper but had lousy letters of recommendation it would be a red flag.

5