Hướng dẫn dùng python kstest python
Performs the (one-sample or two-sample) Kolmogorov-Smirnov test for goodness of fit. The one-sample test compares the underlying distribution F(x) of a sample against a given distribution G(x). The two-sample test compares the underlying distributions of two independent samples. Both tests are valid only for continuous distributions. Parametersrvsstr, array_like, or callableIf an array, it should be a 1-D array of observations of random variables. If a callable, it should be a
function to generate random variables; it is required to have a keyword argument size. If a string, it should be the name of a distribution in If array_like, it should be a 1-D array of observations of random variables, and
the two-sample test is performed (and rvs must be array_like). If a callable, that callable is used to calculate the cdf. If a string, it should be the name of a distribution in Distribution parameters, used if rvs or cdf are strings or callables. Nint, optionalSample size if rvs is string or callable. Default is 20. alternative{‘two-sided’, ‘less’, ‘greater’}, optionalDefines the null and alternative hypotheses. Default is ‘two-sided’. Please see explanations in the Notes below. method{‘auto’, ‘exact’, ‘approx’, ‘asymp’}, optionalDefines the distribution used for calculating the p-value. The following options are available (default is ‘auto’): Returnsstatisticfloat KS test statistic, either D, D+ or D-. pvaluefloatOne-tailed or two-tailed p-value. Notes There are three options for the null and corresponding alternative hypothesis that can be selected using the alternative parameter.
Note that the alternative hypotheses describe the CDFs of the underlying distributions, not the observed values. For example, suppose x1 ~ F and x2 ~ G. If F(x) > G(x) for all x, the values in x1 tend to be less than those in x2. Examples Suppose we wish to test the null hypothesis that a sample is distributed according to the standard normal. We choose a confidence level of 95%; that is, we will reject the null hypothesis in favor of the alternative if the p-value is less than 0.05. When testing uniformly distributed data, we would expect the null hypothesis to be rejected. >>> from scipy import stats >>> rng = np.random.default_rng() >>> stats.kstest(stats.uniform.rvs(size=100, random_state=rng), ... stats.norm.cdf) KstestResult(statistic=0.5001899973268688, pvalue=1.1616392184763533e-23) Indeed, the p-value is lower than our threshold of 0.05, so we reject the null hypothesis in favor of the default “two-sided” alternative: the data are not distributed according to the standard normal. When testing random variates from the standard normal distribution, we expect the data to be consistent with the null hypothesis most of the time. >>> x = stats.norm.rvs(size=100, random_state=rng) >>> stats.kstest(x, stats.norm.cdf) KstestResult(statistic=0.05345882212970396, pvalue=0.9227159037744717) As expected, the p-value of 0.92 is not below our threshold of 0.05, so we cannot reject the null hypothesis. Suppose, however, that the random variates are
distributed according to a normal distribution that is shifted toward greater values. In this case, the cumulative density function (CDF) of the underlying distribution tends to be less than the CDF of the standard normal. Therefore, we would expect the null hypothesis to be rejected with >>> x = stats.norm.rvs(size=100, loc=0.5, random_state=rng) >>> stats.kstest(x, stats.norm.cdf, alternative='less') KstestResult(statistic=0.17482387821055168, pvalue=0.001913921057766743) and indeed, with p-value smaller than our threshold, we reject the null hypothesis in favor of the alternative. For convenience, the previous test can be performed using the name of the distribution as the second argument. >>> stats.kstest(x, "norm", alternative='less') KstestResult(statistic=0.17482387821055168, pvalue=0.001913921057766743) The examples above have all been one-sample tests identical to those performed by >>> sample1 = stats.laplace.rvs(size=105, random_state=rng) >>> sample2 = stats.laplace.rvs(size=95, random_state=rng) >>> stats.kstest(sample1, sample2) KstestResult(statistic=0.11779448621553884, pvalue=0.4494256912629795) As expected, the p-value of 0.45 is not below our threshold of 0.05, so we cannot reject the null hypothesis. |