Sergen Cansiz
Mar 22, 2021

--

Hi Simone,

Chi-square distribution is used to findi the cutoff point that indicates the outliers. It is the distribution of the sum of squares of k independent standard normal random variables. We try to detect outliers in multivariate data based on chi-squared distribution. If the variables would have been normal as multivariate (not each as normal), there would be no need to detect outliers because there wouldn't be an outlier. But, if your purpose is to find the true distance between two points in multidimensional data based on covariance between variables, we can say that it's better to have data as multivariate normal. If the data is not multivariate normal, you can find true distance after removing the outliers.

--

--

Sergen Cansiz
Sergen Cansiz

Written by Sergen Cansiz

Data Scientist, Statistician, Python and R Developer

Responses (1)