Recent comments in /f/dataisbeautiful

ar243 t1_jb26smq wrote

The problem is that OP is comparing two different things.

One is the sum of all deaths year round, the other is a single event that lasts for a few hours to a few days at most.

The other problem is exposure. Most people drive every day, but most people aren't in a natural disaster every day.

It's just a bad way to compare data.

0

Strange_Ad_6206 t1_jb23hcq wrote

It's comparing data gathered through WHO research (traffic fatalities) with sampling obtained through Wikipedia's list articles. These include mostly incidents significant enough to have their own article.

A less manipulative comparison would be with Wikipedia's list of road accidents.

Also, flood contains one huge outlier in the 1931 China floods in which deaths from subsequent famine and epidemics are included, increasing the number of fatalities from ~150,000 to 4 million, and that is just one example.

0

VikThorior t1_jb22gkq wrote

Reply to comment by Barra79 in [OC] Wind Speed Vs Wind Power by Barra79

As I said below another post you made, don't do a regression if you don't have a model in mind. It may just be hypothetical, but you must have an explanation as to why you chose this regression in particular, other than "it fits pretty well". A 100th degree polynomial function will fit better, a Ngh degree polynomial, with N the number of points, will fit perfectly.

Also, the problem you have here is that you have "positive" outliers but you don't have negative outliers for the lowest values, because energy production can't go below 0. So you have a regression which is higher than the truth. You should find a way to identfy and eliminate these outliers.

And if you can't that's not a problem! We don't need a regression all the time. We see the relationship pretty well, the red line is not needed. It just shows a model which is obviously wrong for many reasons.

5