User:Jmath666/Conditional probability and expectation

Elementary description

edit

If     are events such that  , the conditional probability of the event   given   is defined by

 

If   is fixed, the mapping   is a conditional probability distribution given the event  .

If also  , then also

 

and so

 

which is known as the Bayes theorem.

Conditioning of discrete random variables

edit

If   is a discrete real random variable (that is, attaining only values  ,  ), then the conditional probability of an event   given that   is

 

The mapping   defines a conditional probability distribution given that  .

Note that   is a number, that is, a deterministic quantity. If we allow   to be a realization of the random variable  , we obtain conditional probability of the event   given random variable  , denoted by  , which is a random variable itself. The conditional probability   attains the value of   with probability  .

Now suppose   and   are two discrete real random variables with a joint distribution. Then the conditional probability distribution of   given   is

 

If we allow   to be a realization of the random variable  , we obtain the conditional distribution   of random variable   given random variable  . Given  , the random variable   that attains the value   with probability  .

The random variables   and   are independent when the events   and   are independent for all   and  , that is,

 

Clearly, this is equivalent to

 

The conditional expectation of   given the value   is

 

which is defined whenever the marginal probability

 

This is a description common in statistics [1]. Note that   is a number, that is, a deterministic quantity, and the particular value of   does not matter; only the probabilities   do.

If we allow   to be a realization of the random variable  , we obtain conditional expectation of random variable   given random variable  , denoted by  . This form is closer to the mathematical form favored by probabilists (described in more detail below), and it is a random variable itself. The conditional expectation   attains the value   with probability  .

Conditioning of continuous random variables

edit

For continuous random variables  ,   with joint density  , the conditional probability density of   given that   is

 

where

 

is the marginal density of  . The conventional notation   is often used to mean the same as  , that is, the function   of two variables   and  . The notation  , often used in practice, is ambigous, because if   and   are substituted for by something else (like specific numbers), the information what   means is lost.

The continuous random variables are independent if, for all   and  , the events   and   are independent, which can be proved to be equivalent to

 

This is clearly equivalent to

 

The conditional probability density of   given   is the random function  . The conditional expectation of   given the value   is

 

and the conditional expectation of   given   is the random variable

 

dependent on the values of  .

Warning

edit

Unfortunately, in the the literature, esp. more elementary oriented statistics texts, the authors do not always distinguish properly between conditioning given the value of a random variable (the result is a number) and conditioning given the random variable (the result is a random variable), so, confusingly enough, the words “ given the random variable\textquotedblright can mean either.

Mathematical synopsis

edit

This section follows [2]. In probability theory, a conditional expectation (also known as conditional expected value or conditional mean) is the expected value of a random variable with respect to a conditional probability distribution, defined as follows.

If   is a real random variable, and   is an event with positive probability, then the conditional probability distribution of   given   assigns a probability   to the Borel set  . The mean (if it exists) of this conditional probability distribution of   is denoted by   and called the conditional expectation of   given the event  .

If   is another random variable, then the conditional expectation   of   given that the value   is a function of  , let us say  . An argument using the Radon-Nikodym theorem is needed to define   properly because the event that   may have probability zero. Also,   is defined only for almost all  , with respect to the distribution of  . The conditional expectation of   given random variable  , denoted by  , is the random variable  .

It turns out that the conditional expectation   is a function only of the sigma-algebra, say  , generated by the events   for Borel sets  , rather than the particular values of  . For a  -algebra  , the conditional expectation   of   given the  -algebra   is a random variable that is  -measurable and whose integral over any  -measurable set is the same as the integral of   over the same set. The existence of this conditional expectation is proved from the Radon-Nikodym theorem. If   happens to be  -measurable, then  .

If   has an expected value, then the conditional expectation   also has an expected value, which is the same as that of  . This is the law of total expectation.

For simplicity, the presentation here is done for real-valued random variables, but generalization to probability on more general spaces, such as   or normed metric spaces equipped with a probability measure, is immediate.

Mathematical prerequisites

edit

Recall that probability space is  , where   is a  -algebra of subsets of  , and   a probability measure with   measurable sets. A random variable on the space   is a  -measurable function.   is the sigma algebra of all Borel sets in  . If   is a set and   a random variable,   or   are common shorthands for the event  

Probability conditional on the value of a random variable

edit

Let   be probability space,   a  -measurable random variable with values in  ,   (i.e., an event not necessarily independent of  ), and  . For   and  , the conditional probability of   given   is by definition

 

We wish to attach a meaning to the conditional probability of   given   even when  . The following argument follows Wilks [3], who attributes it to Kolmogorov [4]. Fix   and define

 

Since   is  -measurable, the set function   is a measure on Borel sets  . Define another measure   on   by

 

Clearly,

 

\newline and hence   implies  . Thus the measure   is absolutely continuous with respect to the measure   and by the Radon-Nykodym theorem, there exists a real-valued  -measurable function   such that

 

We interpret the function   as the conditional probability of   given  ,

 

Once the conditional probability is defined, other concepts of probability follow, such as expectation and density.

One way to justify this interpretation is   as the conditional probability of   given   the limit of probability conditioned on the value of   being in a small neighborhood of  . Set   (a neighborhood of   with radius  ) to get

 

and using the fact that  , we have

 

so

 

for almost all   in the measure  .\footnote{I do not know how to prove that without additional assumptions on  , like continuous. [3] claims the limit a.e. “ can\textquotedblright be proved, though he does not proceed this way, and neglects to mention a.e. is in the measure  .}

As another illustration and justification for understanding   as the conditional probability of   given  , we now show what happens when the random variable   is discrete. Suppose   attains only values  ,  , with  . Then

 

Choose   and   as a neighborhood   of   with radius   so small that   does not contain any other  ,  . Then for any  ,

 

by the definition of  , and from the definition of   by Radon-Nykodym derivative,

 

This gives, for  ,

 

by definition of conditional probability. The function   is defined only on the set  . Because that's where the variable   is concentrated, this is a.s.

Expectation conditional on the value of a random variable

edit

Suppose that   and   are random variables,   integrable. Define again the measures on   generated by the random variable  ,

 

and a signed finite measure on  ,

 

Here,   is the indicator function of the event  , so   if   and zero otherwise. Since

 

and  , we have that  , so   is absolutely continuous with respect to  . Consequently, there exists Radon-Nikodym derivative   such that

 

The value   is conditional expectation of   given   and denoted by  . Then the result can be written as

 

for almost all   in the measure   generated by the random variable  .

This definition is consistent with that of conditional probability: the conditional probability of   given   is the same as the conditional mean of the indicator function of   given  . The proof is also completely the same. Actually we did not have to do conditional probability at all and just call it a special case of conditional expectation.

Expectation conditional on a random variable and on a -algebra

edit

Let   be conditional expectation of the random variable   given that random variable  . Here   is a fixed, deterministic value. Now take   random, namely the value of the random variable  ,  . The result is called the conditional expectation of   given  , which is the random variable

 

So now we have the conditional expectation given in terms of the sample space   rather than in terms of  , the range space of the random variable  . It will turn out that after the change of the independent variable, the particular values attained by the random variable   do not matter that much; rather, it is the granularity of   that is important. The granularity of   can be expressed in terms of the  -algebra generated by the random variable  , which is

 

By substitution, the conditional expectation   satisfies

 

which, by writing

 

is seen to be the same as

 

It can be proved that for any  -algebra  , the random variable   exists and is defined by this equation uniquely, up to equality a.e. in   [5]. The random variable   is called the conditional expectation of   given the  -algebra  . It can be interpreted as a sort of averaging of the random variable   to the granularity given by the  -algebra   [6].

The conditional probability   of a an event (that is, a set)   given the  -algebra   is obtained by substituting  , which gives

 

An event   is defined to be independent of a  -algebra   if   and any   are independent. It is easy to see that   is independent of  -algebra   if and only if

 

that is, if and only if   a.s. (which is a particularly obscure way to write independence given how complicated the definitions are).

Two random variables  ,   are said to be independent if

 

which is now seen to be the same as

 

Properties of conditional expectation

edit

To be done.

Conditional density and likelihood

edit

Now that we have   for an arbitrary event  , we can define the conditional probability   for a random variable   and Borel set  . Thus we can define the conditional density   as the Radon-Nikodym derivative,

 

where   is the Lebesgue measure. In the conditional density  ,   and   are random variables that identify the density function, and   and   are the arguments of the density function.

Note that in general   is defined only for almost all   (in Lebesgue measure) and almost all   (in the measure   generated by the random variable  ).\textbf{ }Under reasonable additional conditions (for example, it is enough to assume that the joint density   is continuous at  , and  ), the density of   conditional on   satisfies

 

Note that this density is a deterministic function.

Density of a random variable   conditional on a random variable   is

 

It is a function valued random variable obtained from the deterministic function   by taking   to be the value of the random variable  .

A common shorthand for the conditional density is

 

This abuse of notation identifies a function from the symbols for its arguments, which is incorrect. Imagine that we wish to evaluate the value of the conditional density of   at   given  ; then   becomes  , which is a nonsense.

When the value of   is constant, the function   is a probability density function of  . When the value of   is constant, the function   is called the likelihood function.

References

edit
  1. ^ William Feller. An introduction to probability theory and its applications. Vol. I. Third edition. John Wiley \& Sons Inc., New York, 1968.
  2. ^ Wikipedia. Conditional expectation. Version as of 18:29, 28 March 2007 (UTC), 2007.
  3. ^ a b Samuel S. Wilks. Mathematical statistics. A Wiley Publication in Mathematical Statistics. John Wiley \& Sons Inc., New York, 1962.
  4. ^ A. N. Kolmogorov. Foundations of the theory of probability. Chelsea Publishing Co., New York, 1956. Translation edited by Nathan Morrison, with an added bibliography by A. T. Bharucha-Reid.
  5. ^ Claude Dellacherie and Paul-Andr{\'e} Meyer. Probabilities and potential, volume 29 of North-Holland Mathematics Studies. North-Holland Publishing Co., Amsterdam, 1978.
  6. ^ S. R. S. Varadhan. Probability theory, volume 7 of Courant Lecture Notes in Mathematics. New York University Courant Institute of Mathematical Sciences, New York, 2001.