Sunday, October 28, 2012

Probability Distribution Functions and Random Variables

A probability distribution function (p.d.f.) is the collected measure of the likelihoods of each possible outcome of a random event. So a p.d.f. for the future price of a particular stock would give a likelihood of each possible future stock price, from zero on up.

Probability distribution functions have a couple qualities:
  1. they must be strictly non-negative; and
  2. the sum of all probabilities under a p.d.f. must be one.
If a random variable has a known p.d.f., two important values can be determined for it: its expected value, or mean, which is sometimes designated with the Greek letter μ; and variance, which is designated with σ². Variance is the expected value of the square of the difference between a random variable and its expected value.

There are a lot of things of interest about mean and variance, but for my purpose, only a couple are important.

First, the square root of variance, σ, or standard deviation, can be used as a measure of confidence intervals for a random variable: for a normally distributed variable, for example, the span within about 1.96 standard deviations of the mean of the variable forms a 95% confidence interval.

Second, both mean and variance are additive, which is to say that if X and Y are random variables, then generally the mean of X+Y is the mean of X plus the mean of Y, and the variance of X and Y is the variance of X plus the variance of Y. (The latter isn't quite true, because if X and Y are not independent -- the outcome of X affects the p.d.f. of Y -- then the variance of X+Y is the variance of X plus the variance of Y plus twice the covariance of X and Y. The covariance of two random variables is the expected value of the product of the differences between the variable and their respective means. I'll almost always be assuming independence between variables, so covariance won't matter to me much.)

In combination, these two factors create an important effect: the expected value for a sum of identically distributed independent variables grows with the number of variables included in the sum, while confidence intervals for this sum grow with the square root of the number of variables included in the sum. This gives the Law of Large Numbers: the average of a number of outcomes of independent variables from an identical distribution will approach their expected value.

No comments:

Post a Comment