Recent comments in /f/MachineLearning

bridgeton_man t1_j68njfs wrote

Quesiton about goodness of fit.

​

For regressions, R-squared and Adj. R-Squared are typically considered the primary goodness-of-fit measures.

​

But in many supervised machine-learning models, RMSE is the main measure that I keep running across. For example, decision tree models that I create in R using Rpart do that.

​

So, my question is how to compare the predictive accuracy of OLS regression models that report R-sq to equivalent Rpart regression trees that report RMSE.

1

DCBAtrader t1_j68h2fr wrote

Basic question on regression/AutoML (pycaret mainly).

When do p-values versus error metric (MAE, MSE, R Squared matter).

My previous model building experience (multivariate regression) was to first use various combinations of variables in OLS such that all the variables were statistically significant, and then use an AutoML (pycaret) to build models, and judge them by MAE, MSE or R squared. Using proper cross-validation test/train splits of course.

I'm wondering if this step is needed, and I just can just run the entire data-set in pycaret, and thus judge a model based on said metrics (MAE, MSE, R squared)?

My gut says that the simpler model with stat. significant variables should perform better but maybe I can just look at the best error metric?

1

Vegetable-Skill-9700 OP t1_j68g80z wrote

So, you know how it’s almost impossible to build 100% accurate and super-generalised ML models. On top, the performance of these models degrade over time. Furthermore, due to the black boxiness of ML models, identifying problems with them and fixing those problems is super-hard.

UpTrain solves for these exact issues. It identifies cases where the model is going wrong, collects those problematic data-points and retrains the model on them to improve it's accuracy!

You can checkout the repo here: https://github.com/uptrain-ai/uptrain

14