Normal Q-Q plot: A step-by-step approach

visualization

A normal Q-Q plot allows you to visually assess whether your data follow a normal distribution.

Author

Konstantinos Bougioukas

Published

November 30, 2025

A normal Q-Q plot is a visualization for assessing whether a given sample is consistent with a normal (Gaussian) distribution. It is constructed by plotting the theoretical quantiles of a standard normal distribution on the horizontal axis against the sample quantiles (i.e., the ordered values of the dataset) on the vertical axis.

Sample quantiles

Let Y be a random variable with \(n\) observations \(y_1, y_2,..., y_n\). First, we sort the observations from smallest to largest: \(y_{(1)} \leq y_{(2)} \leq \cdots \leq y_{(n)}\). Then, for each ordered value \(y_{(i)}\) (sample quantile), we assign a cumulative probability—referred to as plotting position—according to Hazen’s formula \(P_i = \frac{i-0.5}{n}\), where \(i = 1, 2, ..., n\) (Table 1). This convention, originally proposed by Hazen (1914) and later discussed by Cunnane (1978), is often applied for sufficiently large sample sizes (e.g. n > 10).

Table 1: Sample quantiles and plotting positions for \(n>10\)

Sample quantile \(y_{(i)}\)	Plotting position \(P_i = \frac{i-0.5}{n}\)
\(y_{(1)}\)	\(\frac{0.5}{n}\)
\(y_{(2)}\)	\(\frac{1.5}{n}\)
\(y_{(3)}\)	\(\frac{2.5}{n}\)
\(\vdots\)	\(\vdots\)
\(y_{(n)}\)	\(\frac{n-0.5}{n}\)

Example: Twenty 9-year-old children with age-appropriate development completed a visual matching task on a computer. A target image appeared on the left, and either an identical copy or mirror image appeared on the right. Children pressed one key for matches and another for mirror images. The data represents reaction times (RT) in milliseconds for the 20 children (Table 2).

Table 2: Reaction times (RT) in milliseconds for 20 children

2549	1938	1698	1725	1236	2953	1367	837	1843	1369
1400	3696	5179	1584	1908	1415	4417	1500	2265	1687

Applying Hazen’s rule, we compute the plotting positions (cumulative probability estimates) for each sample quantile, as shown in Table 3.

Table 3: Sample quantiles and plotting positions for 20 reaction times

	Sample quantile \(y_{(i)}\)	Plotting position \(P_i = (i-0.5)/20\)
	837	0.025
	1236	0.075
	1367	0.125
	1369	0.175
	1400	0.225
	1415	0.275
	1500	0.325
	1584	0.375
	1687	0.425
	1698	0.475
	1725	0.525
	1843	0.575
	1908	0.625
	1938	0.675
	2265	0.725
	2549	0.775
	2953	0.825
	3696	0.875
	4417	0.925
	5179	0.975

In R

The reaction times are:

y <- c(2549, 1938, 1698, 1725, 1236, 2953, 1367, 837, 1843, 1369, 
       1400, 3696, 5179, 1584, 1908, 1415, 4417, 1500, 2265, 1687)

We sort the these observations form the smallest to largest:

y_sorted <- sort(y)
y_sorted

 [1]  837 1236 1367 1369 1400 1415 1500 1584 1687 1698 1725 1843 1908 1938 2265
[16] 2549 2953 3696 4417 5179

Then, we compute the plotting position \(P_i\) for each \(i = 1, 2, ..., n\):

n <- length(y)
i <- seq_len(n)
Pi <- (i - 0.5) / n
Pi

 [1] 0.025 0.075 0.125 0.175 0.225 0.275 0.325 0.375 0.425 0.475 0.525 0.575
[13] 0.625 0.675 0.725 0.775 0.825 0.875 0.925 0.975

Theoretical quantiles

The theoretical quantiles are the z-values from the standard normal distribution corresponding to the plotting positions. For each cumulative probability \(P_i\), the theoretical quantile \(z_i\) is the value such that: \(P(Z< z_i) = P_i\), where \(Z \sim N(0,1)\) is a standard normal random variable.

The z-value can be calculated using \(z_i = \Phi^{-1}(P_i)\), where \(\Phi^{-1}\) is the inverse cumulative distribution function of the standard normal distribution. As an example, for the first (\(i=1\)) plotting position, we have \(P_1 = 0.025\). Therefore: \(P(Z < z_1) = 0.025 \Rightarrow z_1 = \Phi^{-1}(0.025) \approx -1.96\) (Table 4). This means that 2.5% of the standard normal distribution falls below \(z_1 = -1.96\).

Table 4: Plotting positions and theoretical quantiles

Plotting position \(P_i\)	Theoretical quantile \(z_i = \Phi^{-1}(P_i)\)
0.025	-1.96
0.075	-1.44
0.125	-1.15
0.175	-0.93
0.225	-0.76
0.275	-0.60
0.325	-0.45
0.375	-0.32
0.425	-0.19
0.475	-0.06
0.525	0.06
0.575	0.19
0.625	0.32
0.675	0.45
0.725	0.60
0.775	0.76
0.825	0.93
0.875	1.15
0.925	1.44
0.975	1.96

In R

The theoretical quantiles can be calculated using the qnorm() function:

z_i <- qnorm(Pi)
z_i

 [1] -1.95996398 -1.43953147 -1.15034938 -0.93458929 -0.75541503 -0.59776013
 [7] -0.45376219 -0.31863936 -0.18911843 -0.06270678  0.06270678  0.18911843
[13]  0.31863936  0.45376219  0.59776013  0.75541503  0.93458929  1.15034938
[19]  1.43953147  1.95996398

The Q-Q plot

The last step is to generate the normal Q-Q plot by plotting the theoretical quantiles (\(z_i\)) on the x-axis against the sample quantiles (\(y_{(i)}\)) on the y-axis using the values from Table 5.

Table 5: Theoretical quantiles and sample quantiles for RT data

Theoretical quantile \(z_i\)	Sample quantile \(y_{(i)}\)
-1.96	837
-1.44	1236
-1.15	1367
-0.93	1369
-0.76	1400
-0.60	1415
-0.45	1500
-0.32	1584
-0.19	1687
-0.06	1698
0.06	1725
0.19	1843
0.32	1908
0.45	1938
0.60	2265
0.76	2549
0.93	2953
1.15	3696
1.44	4417
1.96	5179

# Make QQ plot
plot(z_i, y_sorted,
     xlab = "Theoretical Quantiles", 
     ylab = "Sample Quantiles", 
     main = "Normal Q-Q Plot",
     pch = 1, col = "black")

Adding a reference line (Quartile Method)

The Quartile Method is a robust technique for defining a reference line on a Quantile-Quantile (Q-Q) plot. The line passes through the first quartile (\(Q_1\)) and third quartile (\(Q_3\)) of both the theoretical and sample distributions.

The reference line is \(y = b_o + b_1 z\), where the slope and intercept are calculated as:

\(b_1 = \frac{y_{0.75} - y_{0.25}}{z_{0.75} - z_{0.25}}\)

\(b_0 = y_{0.25} - b_1 \, z_{0.25}\)

Here, \(z_{0.25}\) and \(z_{0.75}\) are the first and third quartiles of the theoretical standard normal distribution, while \(y_{0.25}\) and \(y_{0.75}\) are the corresponding quartiles of the sample.

In R

# Compute quartiles
z25 <- qnorm(0.25)
z75 <- qnorm(0.75)
y25 <- quantile(y, 0.25)
y75<- quantile(y, 0.75)

# Compute line parameters
b1 <- (y75 - y25) / (z75 - z25)
b0 <- y25 - b1 * z25

# Make QQ plot
plot(z_i, y_sorted,
     xlab = "Theoretical Quantiles", 
     ylab = "Sample Quantiles", 
     main = "Normal Q-Q Plot",
     pch = 1, col = "black")

# Add the quartile-based line
abline(b0, b1, col = "red")

The Q–Q plot shows clear deviations from the reference line, particularly in the upper tail, indicating that the data are not normally distributed and exhibit positive skewness. This conclusion is further supported by the histogram and boxplot, which display an asymmetric distribution of reaction times, with a long right tail and two outliers.

layout(mat = matrix(c(1,2), ncol = 1), heights = c(2, 8))

# Top: horizontal boxplot
par(mar = c(0, 4, 1, 2))
boxplot(y, horizontal = TRUE, axes = FALSE, 
        col = "lightgray", outline = TRUE)

# Bottom: histogram
par(mar = c(5, 4, 0, 2))
hist(y, main = "")

The normal Q–Q plot can be generated using the built-in R functions qqnorm() and qqline().

qqnorm(y)
qqline(y, col = "red")

We can also add a box‑plot for the sample quantiles to the right-side of the Q–Q plot as follows:

# Set up the plotting layout: 2 columns, first column wider
layout(matrix(c(1, 2), nrow = 1), widths = c(3, 0.6))

# Set margins for both plots
par(mar = c(5, 4, 4, 1))

# Create Q-Q plot
qqnorm(y)
qqline(y, col = "red")

# Set margins for boxplot (reduce left margin)
par(mar = c(5, 1, 4, 2))

# Create boxplot on the right
boxplot(y, yaxt = "n", frame = FALSE, ylim = range(y))

References

Cunnane, C. 1978. “Unbiased Plotting Positions — a Review.” Journal of Hydrology 37 (3): 205–22. https://doi.org/10.1016/0022-1694(78)90017-3.

Hazen, A. 1914. “Storage to Be Provided in Impounding Reservoirs for Municipal Water Supply.” Trans. Amer. Soc. Civ. Eng. Pap 1308 (77): 1547–50.