The empirical distribution function
User Rating: / 0
PoorBest 
Written by theoretic   
Since the empirical distribution function of a sample is itself a distribution function, one can calculate its quantile function F^~ which we call the empirical quantile function. If the sample has no ties then it is not difficult to see that
Fn(X^) = k/n, k = 1,...,n,
i.e., Fn jumps by 1/n at every value X(f.) and is constant in [X(f.)>X(f.+ i)) for к < n. This means that the empirical quantile function F 1~ jumps at the values k/n by X(fc) — X(fc_1) and remains constant in ((к — )/n, k/n]:{X(k\ t G ((к — 1)/n, k/n], к = 1,... ,n — 1, X(n) t G ((n — 1)/n, 1) .
A fundamental result of probability theory, the Glivenko-Cantelli lemma, (see for example Billingsley [13], p. 275) tells us the following: if Xi,X2,... is an iid sequence with distribution function F, then
sup \Fn(x) — F(x)\ —^'0,
хеш
implying that Fn(x) « F(x) uniformly for all x. One can show that the Glivenko-Cantelli lemma implies F^(t) —> i?<_(t) a.s. as n —> oo for all continuity points t of F^; see Resnick [64], p. 5. This observation is the basic idea for the QQ-plot: if Xi,..., Xn were a sample with known distribution function F, we would expect that F^(t) is close to i?<_(t) for all t G (0,1), provided n is large. Thus, if we plot F^(t) against i?<_(t) for t G (0,1) we should roughly see a straight line. It is common to plot the graph
for a given distribution function F. Modifications of the plotting positions have been used as well. Chambers [21] gives the following properties of a QQ-plot:
(a) Comparison of distributions. If the data were generated from a random sample of the reference distribution, the plot should look roughly linear. This remains true if the data come from a linear transformation of the distribution.
(b) Outliers. If one or a few of the data values are contaminated by gross error or for any reason are markedly different in value from the remaining values, the latter being more or less distributed like the reference distribution, the outlying points may be easily identified on the plot.