Comparing Birth Weights: Probability Using Normal Distributions
What This Problem Teaches
- Working with differences of independent normal random variables
- Understanding how variances combine when subtracting random variables
- Converting probability questions into standard normal calculations
- Interpreting negative z-scores and their corresponding probabilities
- Recognizing when to model comparative scenarios using distribution theory
Visualizing the Problem
Before diving into calculations, let's see what we're comparing. We have two overlapping normal distributions with different means and standard deviations:
Notice that while males have a higher average birth weight, there's substantial overlap between the distributions. The question asks for the probability that a randomly chosen female weighs more than a randomly chosen male.
Solution: Method 1 — The Difference Distribution Approach
When comparing two normal random variables, the key insight is to work with their difference. This transforms a two-variable problem into a single-variable problem.
Step 1 — Define the random variables
Let M be the weight of a randomly selected male baby: M ~ N(3622, 532²)
Let F be the weight of a randomly selected female baby: F ~ N(3465, 414²)
We want to find P(F > M), which is equivalent to P(F - M > 0).
Step 2 — Create the difference variable
Define D = F - M. Since F and M are independent normal random variables, D is also normally distributed.
For independent normal variables X ~ N(μ₁, σ₁²) and Y ~ N(μ₂, σ₂²):
X - Y ~ N(μ₁ - μ₂, σ₁² + σ₂²)
Step 3 — Calculate the mean of D
The mean of the difference is:
μ_D = μ_F - μ_M = 3465 - 3622 = -157 grams
Step 4 — Calculate the variance and standard deviation of D
The variance of the difference is (note that variances add, even when subtracting):
σ²_D = σ²_F + σ²_M = 414² + 532² = 171,396 + 283,024 = 454,420
Therefore: σ_D = √454,420 ≈ 674.1 grams
Step 5 — Convert to standard normal
We want P(D > 0). Using standardization:
z = (0 - μ_D) / σ_D = (0 - (-157)) / 674.1 = 157 / 674.1 ≈ 0.233
Step 6 — Find the probability
Using the standard normal table:
P(D > 0) = P(Z > 0.233) = 1 - Φ(0.233) = 1 - 0.5920 = 0.4080
Solution: Method 2 — Direct Integration Approach
We can also solve this using the joint distribution of the two independent variables, though this leads to the same calculation through a different path.
Step 1 — Set up the joint probability
Since F and M are independent, their joint probability density function is:
f(f,m) = f_F(f) × f_M(m)
Step 2 — Define the region of interest
We want P(F > M), which means integrating over the region where f > m:
P(F > M) = ∫∫_{f>m} f_F(f) × f_M(m) df dm
Step 3 — Transform the integration
Making the substitution d = f - m, this double integral becomes:
P(F > M) = P(F - M > 0) = P(D > 0)
Step 4 — Apply the normal difference formula
This brings us back to the same calculation as Method 1:
D ~ N(-157, 674.1²), so P(D > 0) ≈ 0.408
Why this works: The integration approach confirms that when we have independent normal variables, finding P(X > Y) is equivalent to finding P(X - Y > 0), which reduces to a single-variable normal probability.
Verification
Checking our calculation
Mean check:μ_D = 3465 - 3622 = -157 ✓
Variance check:σ²_D = 414² + 532² = 171,396 + 283,024 = 454,420 ✓
Standard deviation:σ_D = √454,420 = 674.07 ≈ 674.1 ✓
Z-score:z = 157/674.1 = 0.2328 ≈ 0.233 ✓
Probability: From standard normal table, P(Z > 0.233) = 0.408 ✓
Sanity check
Our answer of 40.8% makes intuitive sense. Since males have a higher average birth weight (3622g vs 3465g), we'd expect the probability of a female weighing more than a male to be less than 50%. The substantial overlap between the distributions means this probability isn't tiny—40.8% represents significant overlap between the two distributions.
Watch Out For These
❌ Mistake #1: Subtracting variances instead of adding them
Wrong:σ²_D = 414² - 532² = -111,628 (impossible!)
Why it's wrong: When working with independent random variables, Var(X - Y) = Var(X) + Var(Y), not Var(X) - Var(Y). Variances always add because they measure uncertainty, and subtracting two uncertain quantities increases the total uncertainty.
❌ Mistake #2: Using the wrong mean in the z-score calculation
Wrong:z = (0 - 3465) / 674.1 or z = (0 - 3622) / 674.1
Why it's wrong: We're working with the distribution of D = F - M, which has mean μ_D = -157, not the mean of either original distribution.
❌ Mistake #3: Misinterpreting P(Z > negative z-score)
Confusion: "The z-score is positive, so the answer should be less than 0.5"
Why it's wrong: A positive z-score means we're looking to the right of the mean. Since P(Z > positive value) < 0.5, our answer of 40.8% is correct. The fact that it's less than 50% reflects that females typically weigh less than males.
The General Pattern
This problem illustrates a fundamental principle in probability: comparing normal random variables using their difference distribution.
General Formula: If X ~ N(μ₁, σ₁²) and Y ~ N(μ₂, σ₂²) are independent, then:
P(X > Y) = P(Z > (μ₂ - μ₁) / √(σ₁² + σ₂²))
where Z is standard normal.
This pattern appears throughout statistics and applied probability:
- Quality control: Comparing measurements from two production lines
- A/B testing: Determining if one version significantly outperforms another
- Medical research: Comparing treatment effects between groups
- Finance: Comparing returns from different investment strategies
Important limitation: This approach only works for independent normal variables. If there's correlation between the variables (e.g., birth weights of twins), you need to account for the covariance term.
Why This Matters
Beyond birth weight studies, this type of calculation has widespread applications:
Manufacturing: Quality engineers use this to compare products from different production lines. If line A produces items with mean strength 100 (σ = 8) and line B produces items with mean 95 (σ = 12), what's the probability a random item from B is stronger than one from A?
Education: Comparing test scores between different teaching methods or schools. Educational researchers constantly need to determine if observed differences are statistically meaningful.
Clinical trials: When testing a new treatment, researchers compare patient outcomes between treatment and control groups. The statistical framework is identical to our birth weight problem.
Four "What-If?" Problems
We want P(F - M ≥ 200) instead of P(F - M > 0).
We already know D = F - M ~ N(-157, 674.1²)
z = (200 - (-157)) / 674.1 = 357 / 674.1 = 0.530
P(Z > 0.530) = 1 - 0.7019 = 0.2981
29.8% probability that a female weighs at least 200g more than a male.
For M ~ N(3622, 532²), find x such that P(M < x) = 0.75
z₀.₇₅ = 0.674, so x = 3622 + 0.674 × 532 = 3622 + 358.6 = 3980.6g
For F ~ N(3465, 414²): z = (3980.6 - 3465) / 414 = 515.6 / 414 = 1.245
P(Z > 1.245) = 1 - 0.8934 = 0.1066
The 75th percentile male weight is 3981g. Only 10.7% of females exceed this weight.
Let F̄ be the mean of 5 females and M̄ be the mean of 5 males.
F̄ ~ N(3465, 414²/5) = N(3465, 171.4²)
M̄ ~ N(3622, 532²/5) = N(3622, 238.0²)
D = F̄ - M̄ ~ N(-157, √(171.4² + 238.0²)) = N(-157, 293.8)
z = (0 - (-157)) / 293.8 = 0.534
P(Z > 0.534) = 1 - 0.7033 = 0.2967
29.7% probability that the female sample average exceeds the male sample average.
We want P(F > M) = 0.45, which means P(Z > z) = 0.45
If P(Z > z) = 0.45, then P(Z ≤ z) = 0.55, so z = 0.126
z = (3622 - μ_F) / 674.1 = 0.126
3622 - μ_F = 0.126 × 674.1 = 84.9
μ_F = 3622 - 84.9 = 3537.1 grams
The female mean would need to be 3537 grams for a 45% probability.
Frequently Asked Questions
Create a new random variable D = X - Y representing their difference. Since both original variables are normal and independent, D is also normal with mean μ_D = μ_X - μ_Y and variance σ²_D = σ²_X + σ²_Y. Then P(X > Y) = P(D > 0), which you find using the standard normal distribution.
For independent random variables, Var(X - Y) = Var(X) + Var(Y), not Var(X) - Var(Y). This is because variance measures spread, and subtracting two variable quantities increases the total uncertainty. In this problem, σ²_D = 532² + 414² = 454,168, so σ_D = 674 grams.
A negative z-score means the value is below the distribution's mean. When finding P(Z > negative value), you're looking for the area to the right of a point left of center, which is greater than 0.5. Here, z = 0.233 gives P(Z > 0.233) = 0.408 or about 40.8%.
2026-06-05