8.1 Concept of joint probability – Joint probability distribution Discrete and Independent random variables | unit 8 concept of joint probability

EM-IV

Unit 8

Concept of joint probability

8.1 Concept of joint probability – Joint probability distribution, Discrete and Independent random variables

Let X and Y be two discrete random variables defined on the sample space S of a random experiment then the function (X, Y) defined on the same sample space is called a two-dimensional discrete random variable. In others words, (X, Y) is a two-dimensional random variable if the possible values of (X, Y) are finite or countably infinite. Here, each value of X and Y is represented as a point ( x, y ) in the xy-plane.

Joint probability mass function-

Let (X, Y) be a two-dimensional discrete random variable. With each possible

Outcome (we associate a number p(representing

Where P [X = satisfies the following conditions-

The function ‘p’ is called joint probability mass function of X and Y.

if (X, Y) is a discrete two-dimensional random variable which take up the

values ( then the probability distribution of X is-

which is known as the marginal probability mass function of X. Similarly, the

probability distribution of Y is

and is known as the marginal probability mass function of Y.

Conditional Probability Mass Function-

Let (X, Y) be a discrete two-dimensional random variable. Then the conditional probability mass function of X, given Y  y is defined as

The conditional probability mass function of Y, given X = x, is defined as

Independence of random variables-

Two discrete random variables X and Y are said to be independent iff-

Example: Find the joint distribution of X and

Y , which are independent random variables with the

following respective distributions:

	1	2
P [ X =	0.7	0.3

And

	-2	5	8
P [ Y =	0.3	0.5	0.2

Sol.

Since X and Y are independent random variables,

Thus, the entries of the joint distribution are the products

of the marginal entries

Example: The following table represents the joint probability distribution of

the discrete random variable (X, Y):

X/Y	1	2
1	0.1	0.2
2	0.1	0.3
3	0.2	0.1

Then find-

i) The marginal distributions.

ii) The conditional distribution of X given Y = 1.

iii) P[(X + Y) < 4].

Sol.

i) The marginal distributions.

X/Y	1	2	p(x) [totals]
1	0.1	0.2	0.3
2	0.1	0.3	0.4
3	0.2	0.1	0.3
p(y) [ totals]	0.4	0.6	1

The marginal probability distribution of X is-

X	1	2	3
p(x)	0.3	0.4	0.3

The marginal probability distribution of Y is

Y	1	2
p(x)	0.4	0.6

ii) The conditional distribution of X given Y = 1.

The conditional distribution of X given Y = 1 is-

X	1	2	3
	¼	1/4	½

(iii) The values of (X, Y) which satisfy X + Y < 4 are (1, 1), (1, 2) and (2, 1)

only.

Which gives-

Example: Find-

(a) Marginal distributions

(b) E(X) and E(Y )

(d) σX , σY and

(e) ρ(X, Y )

For the following joint probability distribution-

Sol.

(a) Marginal distributions

The marginal distribution of x

X	1	5
p(x)	1/2	1/2

The marginal distribution of Y-

Y	-4	2	7
p(y)	3/8	3/8	¼

(b) E(X) and E(Y )

As we know that-

= -1.5

(d) σX , σY and

(e) ρ(X, Y )

Key takeaways

Conditional Probability Mass Function-

2. Independence of random variables-

Two discrete random variables X and Y are said to be independent iff-

8.2 Expectation, Covariance, Correlation coefficient

Expectation

The mean value of the probability distribution of a variate X is commonly known as its expectation current is denoted by E (X). If f(x) is the probability density function of the variate X, then

(discrete distribution)

(continuous distribution)

In general expectation of any function is given by

(discrete distribution)

(continuous distribution)

(2) Variance offer distribution is given by

(discrete distribution)

(continuous distribution)

Where is the standard deviation of the distribution.

(3) The rth moment about mean (denoted by is defined by

(discrete function)

(continuous function)

(4) Mean deviation from the mean is given by

(discrete distribution)

(continuous distribution)

Example. In a lottery, m tickets are drawn at a time out of a ticket numbered from 1 to n. Find the expected value of the sum of the numbers on the tickets drawn.

Solution. Let be the variables representing the numbers on the first, second, nth ticket. The probability of drawing a ticket out of n ticket spelling in each case 1/n, we have

Therefore, expected value of the sum of the numbers on the tickets drawn

Example. X is a continuous random variable with probability density function given by

Find k and mean value of X.

Solution.

Since the total probability is unity.

Mean of X =

Example. The frequency distribution of a measurable characteristic varying between 0 and 2 is as under

Calculate two standard deviation and also the mean deviation about the mean.

Solution.

Total frequency N =

(about the origin) =

Hence,

i.e., standard deviation

Mean derivation about the mean

Variance of a sum

One of the applications of covariance is finding the variance of a sum of several random variables. In particular, if Z = X + Y, then

Var (Z) =Cov (Z,Z)

More generally, for a, bR we conclude

Variance

Consider two random variables X and Y with the following PMFs

(3.3)

(3.4)

Note that EX =EY = 0. Although both random variables have the same mean value, their distribution is completely different. Y is always equal to its mean of 0, while X is IDA 100 or -100, quite far from its mean value. The variance is a measure of how spread out the distribution of a random variable is. Here the variance of Y is quite small since its distribution is concentrated value. Why the variance of X will be larger since its distribution is more spread out.

The variance of a random variable X with mean , is defined as

By definition the variance of X is the average value of Since ≥0, the variance is always larger than or equal to zero. A large value of the variance means that is often large, so X often X value far from its mean. This means that the distribution is very spread out. on the other hand a low variance means that the distribution is concentrated around its average.

Note that if we did not square the difference between X and its mean the result would be zero. That is

X is sometimes below its average and sometimes above its average. Thus is sometimes negative and sometimes positive but on average it is zero.

To compute , note that we need to find the expected value of , so we can use LOTUS. In particular we can write

For example, for X and Y defined in equations 3.3 and 3.4 we have

As we expect, X has a very large variance while Var (Y) = 0

Note that Var (X) has a different unit than X. For example, if X is measured in metres then Var(X) is in .to solve this issue we define another measure called the standard deviation usually shown as which is simply the square root of variance.

The standard deviation of a random variable X is defined as

The standard deviation of X has the same unit as X. For X and Y defined in equations 3.3 and 3.4 we have

Here is a useful formula for computing the variance.

Computational formula for the variance

To prove it note that

Note that for a given random variable X, is just a constant real number. Thus so we have

Equation 3.5 is equally easier to work with compared to . To use this equation, we can find using LOTUS.

And then subtract to obtain the variance.

Example. I roll a fair die and let X be the resulting number. Find E(X), Var(X), and

Solution.

We have and for k = 1,2,…,6. Thus we have

Thus ,

Theorem

For random variable X and real number, a and b

Proof.

From equation 3.6, we conclude that, for standard deviation, . We mentioned that variance is NOT a linear operation. But there is a very important case, in which variance behave like a linear operation and that is when we look at sum of independent random variables,

Theorem

If are independent random variables and , then

Example. If Binomial (n, p) find Var (X).

Solution.

We know that we can write a Binomial (n, p) random variable as the sum of n independent Bernoulli (p) random variable, i.e.

If Bernoulli (p) then its variance is

Problem. If , find Var (X).

Solution.

We already know , thus Var (X). You can find directly using LOTUS, however, it is a little easier to find E [X (X-1)] first. In particular using LOTUS we have

So we have . Thus, and we conclude

Covariance-

We denote the covariance of X and Y by Cov(x, y) and it is given as-

Correlation coefficient

Whenever two variables x and y are so related that an increase in the one is accompanied by an increase or decrease in the other, then the variables are said to be correlated.

For example, the yield of crop varies with the amount of rainfall.

If an increase in one variable corresponds to an increase in the other, the correlation is said to be positive. If increase in one corresponds to the decrease in the other the correlation is said to be negative. If there is no relationship between the two variables, they are said to be independent.

Perfect Correlation:

If two variables vary in such a way that their ratio is always constant, then the correlation is said to be perfect.

KARL PEARSON’S COEFFICIENT OF CORRELATION:

Here- and

Note-

1. Correlation coefficient always lies between -1 and +1.

2. Correlation coefficient is independent of change of origin and scale.

3. If the two variables are independent then correlation coefficient between them is zero.

Correlation coefficient	Type of correlation
+1	Perfect positive correlation
-1	Perfect negative correlation
0.25	Weak positive correlation
0.75	Strong positive correlation
-0.25	Weak negative correlation
-0.75	Strong negative correlation
0	No correlation

Example: Find the correlation coefficient between Age and weight of the following data-

Age	30	44	45	43	34	44
Weight	56	55	60	64	62	63

Sol.

x	y					())
30	56	-10	100	-4	16	40
44	55	4	16	-5	25	-20
45	60	5	25	0	0	0
43	64	3	9	4	16	12
34	62	-6	36	2	4	-12
44	63	4	16	3	9	12
Sum= 240	360	0	202	0	70	32

Karl Pearson’s coefficient of correlation-

Here the correlation coefficient is 0.27. which is the positive correlation (weak positive correlation), this indicates that the as age increases, the weight also increase.

Short-cut method to calculate correlation coefficient-

Here,

Example: Find the correlation coefficient between the values X and Y of the dataset given below by using short-cut method-

X	10	20	30	40	50
Y	90	85	80	60	45

Sol.

X	Y
10	90	-20	400	20	400	-400
20	85	-10	100	15	225	-150
30	80	0	0	10	100	0
40	60	10	100	-10	100	-100
50	45	20	400	-25	625	-500
Sum = 150	360	0	1000	10	1450	-1150

Short-cut method to calculate correlation coefficient-

Example: Ten students got the following percentage of marks in Economics and Statistics

Calculate the of correlation.

Roll No.
Marks in Economics
Marks in

Solution:

Let the marks oftwo subjects be denoted by and respectively.

Then the mean for marks and the mean ofy marks

and are deviations ofx’s and ’s from their respective means, then the data may be arranged in the following form:

x	y	X=x=65	Y=y=66			XY
78	84	13	18	169	234	234
36	51	-29	-15	841	225	435
98	91	33	1089	1089	625	825
25	60	-40	1600	1600	36	240
75	68	10	100	100	4	20
82	62	17	289	289	16	-68
90	86	25	625	625	400	500
62	58	-3	9	9	64	24
65	53	0	0	0	169	0
39	47	-26	676	676	361	494
650	660	0	5398	5398	2224	2704

Here,

Spearman’s Rank Correlation

Solution:

Let be the ranks of individuals corresponding to two characteristics.

Assuming nor two individuals are equal in either classification, each individual takes the values 1, 2, 3, and hence their arithmetic means are, each

Let , , , be the values of variable and , , those of

Then

where and y are deviations from the mean.

Clearly, and

SPEARMAN’S RANK CORRELATION COEFFICIENT:

Where denotes the rank coefficient of correlation and refers to the difference ofranks between paired items in two series.

Example: Compute Spearman’s rank correlation coefficient r for the following data:

Person	A	B	C	D	E	F	G	H	I	J
Rank Statistics	9	10	6	5	7	2	4	8	1	3
Rank in income	1	2	3	4	5	6	7	8	9	10

Solution:

Person	Rank Statistics	Rank in income	d=
A	9	1	8	64
B	10	2	8	64
C	6	3	3	9
D	5	4	1	1
E	7	5	2	4
F	2	6	-4	16
G	4	7	-3	9
H	8	8	0	0
I	1	9	-8	64
J	3	10	-7	49

Example:

If X and Y are uncorrelated random variables, the of correlation between and

Solution:

Let and

Then

Now

Similarly

Now

Also

(As and are not correlated, we have )

Similarly

Key takeaways

KARL PEARSON’S COEFFICIENT OF CORRELATION:

2. Correlation coefficient always lies between -1 and +1.

3. Correlation coefficient is independent of change of origin and scale.

4. If the two variables are independent then correlation coefficient between them is zero.

5. Short-cut method to calculate correlation coefficient-

6. Spearman’s Rank Correlation

5. For random variable X and real number a and b

6. If are independent random variables and , then

8.3 Probability vectors, Stochastic matrices, Fixed points, Regular stochastic matrices

Probability vector-

Probability vector is a vector-

If for every ‘i’ and

Stochastic process-

Stochastic process is a family of random variables {X(t )|t ∈ T } defined on a common sample space S and indexed by the parameter t , which varies on an index set T .

The values assumed by the random variables X(t ) are called states, and the set of all possible values from the state space of the process is denoted by I .

If the state space is discrete, the stochastic process is known as a chain.

A stochastic process consists of a sequence of experiments in which each experiment has a finite number of outcomes with given probabilities.

Stochastic matrices-

All the entries of square matrix P are non-negative and the sum of the entries of any row is one.

A vector v is said to be a fixed vector or a fixed point of a matrix A if vA = v and v = 0.

if v is a fixed vector of A, so is kv since

(kv)A = k(vA) = k(v) = kv.

Note-

If v = (v1v2v3) is a probability vector of a stochastic matrix- P = then vP is also a probability vector.
If P and Q are stochastic matrices then their product P Q is also stochastic matrix. Thus is stochastic matrix for all positive integer values of n.

3. The transition matrix P of a Markov chain is a stochastic matrix.

Example: Which vectors are probability vectors?

(5/2, 0, 8/3, 1/6, 1/6)
(3, 0, 2, 5, 3)

Sol.

It is not a probability vector because the sum of the components do not add up to 1
Dividing by 3 + 0 + 2 + 5 + 3 = 13, we get the probability vector

(3/13, 0, 2/13, 5/13, 3/13)

Example: show that = (b a) is a fixed point of the stochastic matrix-

Sol.

Example: Find the unique fixed probability vector t of

Sol.

Suppose t = (x, y, z) be the fixed probability vector.

By definition x + y + z = 1. So t = (x, y,

1 − x − y), t is said to be fixed vector, if t P = t

On solving, we get-

Required fixed probability vector is-

8.4 Markov chains, Higher transition probabilities, Stationary distribution of regular Markov chains and absorbing states

Markov chain-

Markov chain is a finite stochastic process consisting

of a sequence of trials whose outcomes say satisfy the following two conditions:

Each outcome belongs to the state space I = { which is the finite set of outcomes.
The outcome of any trial depends at most upon the outcome of the immediately preceding trial and not upon any other previous outcomes.

Higher transition probabilities-

The probability that a Markov chain will move from state i to state j in exactly n steps, is denoted by

And defined as-

Stationary distribution-

Stationary distribution of a Markov chain is the unique fixed probability vector t of the regular transition matrix P of the Markov chain because every sequence of probability distribution approach t.

Absorbing States-

A state ai of a Markov chain is said to be an absorbing state if the system remains in the state ai once it enters there, i.e., a state ai is absorbing if pii = 1. Thus once a Markov chain enters such an absorbing state, it is destined there to remain forever. In other words the i’th row in P has 1 at the main diagonal (i, i) position and zeros everywhere else.

References

E. Kreyszig, “Advanced Engineering Mathematics”, John Wiley & Sons, 2006.
P. G. Hoel, S. C. Port And C. J. Stone, “Introduction To Probability Theory”, Universal Book Stall, 2003.
S. Ross, “A First Course in Probability”, Pearson Education India, 2002.
W. Feller, “An Introduction To Probability Theory and Its Applications”, Vol. 1, Wiley, 1968.
N.P. Bali and M. Goyal, “A Text Book of Engineering Mathematics”, Laxmi Publications, 2010.
B.S. Grewal, “Higher Engineering Mathematics”, Khanna Publishers, 2000.
T. Veerarajan, “Engineering Mathematics”, Tata Mcgraw-Hill, New Delhi, 2010
Higher engineering mathematics, HK Dass

Sign Up

Index

Notes

Highlighted

Underlined

Browse by Topics

Notes

Highlighted

Underlined