Back to Study material
EM-IV

Unit 8

Concept of joint probability


Let X and Y be two discrete random variables defined on the sample space S of a random experiment then the function (X, Y) defined on the same sample space is called a two-dimensional discrete random variable. In others words, (X, Y) is a two-dimensional random variable if the possible values of (X, Y) are finite or countably infinite. Here, each value of X and Y is represented as a point ( x, y ) in the xy-plane.

Joint probability mass function-

Let (X, Y) be a two-dimensional discrete random variable. With each possible

Outcome (we associate a number p(representing

Where P [X = satisfies the following conditions-

  1. p(

The function ‘p’ is called joint probability mass function of X and Y.

if (X, Y) is a discrete two-dimensional random variable which take up the

values ( then the probability distribution of X is-

which is known as the marginal probability mass function of X. Similarly, the

probability distribution of Y is

and is known as the marginal probability mass function of Y.

 

Conditional Probability Mass Function-

Let (X, Y) be a discrete two-dimensional random variable. Then the conditional probability mass function of X, given Y y is defined as

 

The conditional probability mass function of Y, given X = x, is defined as

 

Independence of random variables-

Two discrete random variables X and Y are said to be independent iff-

 

 

 

Example: Find the joint distribution of X and

Y , which are independent random variables with the

following respective distributions:

1

2

P [ X =

0.7

0.3

 And

 

-2

5

8

P [ Y =

0.3

0.5

0.2

 

Sol.

Since X and Y are independent random variables,

Thus, the entries of the joint distribution are the products

of the marginal entries

 

 

 

Example: The following table represents the joint probability distribution of

the discrete random variable (X, Y):

X/Y

1

2

1

0.1

0.2

2

0.1

0.3

3

0.2

0.1

Then find-

i) The marginal distributions.

ii) The conditional distribution of X given Y = 1.

iii) P[(X + Y) < 4].

 

Sol.

i) The marginal distributions.

 

X/Y

1

2

p(x) [totals]

1

0.1

0.2

0.3

2

0.1

0.3

0.4

3

0.2

0.1

0.3

p(y) [ totals]

0.4

0.6

1

 

The marginal probability distribution of X is-

X

1

2

3

p(x)

0.3

0.4

0.3

 

The marginal probability distribution of Y is

 

 

Y

1

2

p(x)

0.4

0.6

 

ii) The conditional distribution of X given Y = 1.

 

 

 

The conditional distribution of X given Y = 1 is-

X

1

2

3

¼

1/4

½

 

(iii) The values of (X, Y) which satisfy X + Y < 4 are (1, 1), (1, 2) and (2, 1)

only.

Which gives-

 

Example: Find-

(a) Marginal distributions

(b) E(X) and E(Y )

(c) Cov (X, Y )

(d) σX , σY and

(e)  ρ(X, Y )

For the following joint probability distribution-

 

Sol.

 

(a) Marginal distributions

 

The marginal distribution of x

X

1

5

p(x)

1/2

1/2

The marginal distribution of Y-

Y

-4

2

7

p(y)

3/8

3/8

¼

 

(b) E(X) and E(Y )

 

(c) Cov (X, Y )

As we know that-

= -1.5

(d) σX , σY and

 

(e)  ρ(X, Y )

 

 

Key takeaways

  1. Conditional Probability Mass Function-

2.     Independence of random variables-

Two discrete random variables X and Y are said to be independent iff-

 

 


Expectation

The mean value of the probability distribution of a variate X is commonly known as its expectation current is denoted by E (X). If f(x) is the probability density function of the variate X, then

                              (discrete distribution)

                (continuous distribution)

In general expectation of any function is given by

         (discrete distribution)

      (continuous distribution)

(2) Variance offer distribution is given by

             (discrete distribution)

           (continuous distribution)

Where is the standard deviation of the distribution.

(3) The rth moment about mean (denoted by is defined by

                    (discrete function)

               (continuous function)

(4) Mean deviation from the mean is given by

                                 (discrete distribution)

                        (continuous distribution)

 

Example. In a lottery, m tickets are drawn at a time out of a ticket numbered from 1 to n. Find the expected value of the sum of the numbers on the tickets drawn.

Solution. Let be the variables representing the numbers on the first, second, nth ticket. The probability of drawing a ticket out of n ticket spelling in each case 1/n, we have

Therefore, expected value of the sum of the numbers on the tickets drawn

 

Example. X is a continuous random variable with probability density function given by

Find k and mean value of X.

Solution.

Since the total probability is unity.

Mean of X =

 

Example. The frequency distribution of a measurable characteristic varying between 0 and 2 is as under

Calculate two standard deviation and also the mean deviation about the mean.

Solution.

Total frequency N =

(about the origin) =

 

(about the origin) =

Hence,

i.e., standard deviation

Mean derivation about the mean

 

Variance of a sum

One of the applications of covariance is finding the variance of a sum of several random variables. In particular, if Z = X + Y, then

                                                     Var (Z) =Cov (Z,Z)

More generally, for a, bR we conclude

 

Variance

Consider two random variables X and Y with the following PMFs

                      (3.3)

                           (3.4)

Note that EX =EY = 0. Although both random variables have the same mean value, their distribution is completely different. Y is always equal to its mean of 0, while X is IDA 100 or -100, quite far from its mean value. The variance is a measure of how spread out the distribution of a random variable is. Here the variance of Y is quite small since its distribution is concentrated value. Why the variance of X will be larger since its distribution is more spread out.

The variance of a random variable X with mean , is defined as

By definition the variance of X is the average value of Since ≥0, the variance is always larger than or equal to zero. A large value of the variance means that is often large, so X often X value far from its mean. This means that the distribution is very spread out. on the other hand a low variance means that the distribution is concentrated around its average.

Note that if we did not square the difference between X and its mean the result would be zero. That is

X is sometimes below its average and sometimes above its average. Thus is sometimes negative and sometimes positive but on average it is zero.

To compute , note that we need to find the expected value of , so we can use LOTUS. In particular we can write

For example, for X and Y defined in equations 3.3 and 3.4 we have

As we expect, X has a very large variance while Var (Y) = 0

 

Note that Var (X) has a different unit than X. For example, if X is measured in metres then Var(X) is in .to solve this issue we define another measure called the standard deviation usually shown as which is simply the square root of variance.

The standard deviation of a random variable X is defined as

The standard deviation of X has the same unit as X. For X and Y defined in equations 3.3 and 3.4 we have

Here is a useful formula for computing the variance.

Computational formula for the variance

To prove it note that

Note that for a given random variable X, is just a constant real number. Thus so we have

Equation 3.5 is equally easier to work with compared to . To use this equation, we can find using LOTUS.

And then subtract to obtain the variance.

 

Example. I roll a fair die and let X be the resulting number. Find E(X), Var(X), and

Solution.

We have   and for k = 1,2,…,6. Thus we have

Thus ,

 

Theorem

For random variable X and real number, a and b

Proof.

If

From equation 3.6, we conclude that, for standard deviation, . We mentioned that variance is NOT a linear operation. But there is a very important case, in which variance behave like a linear operation and that is when we look at sum of independent random variables,

 

Theorem

If are independent random variables and , then

 

Example. If Binomial (n, p) find Var (X).

Solution.

We know that we can write a Binomial (n, p) random variable as the sum of n independent Bernoulli (p) random variable, i.e.

If Bernoulli (p) then its variance is

 

Problem. If , find Var (X).

Solution.

We already know , thus Var (X). You can find directly using LOTUS, however, it is a little easier to find E [X (X-1)] first. In particular using LOTUS we have

S

So we have . Thus, and we conclude

 

Covariance-

We denote the covariance of  X and Y by Cov(x, y) and it is given as-

 

Correlation coefficient

Whenever two variables x and y are so related that an increase in the one is accompanied by an increase or decrease in the other, then the variables are said to be correlated.

 

For example, the yield of crop varies with the amount of rainfall.

 

If an increase in one variable corresponds to an increase in the other, the correlation is said to be positive. If increase in one corresponds to the decrease in the other the correlation is said to be negative. If there is no relationship between the two variables, they are said to be independent.

 

Perfect Correlation:

If two variables vary in such a way that their ratio is always constant, then the correlation is said to be perfect.

 

KARL PEARSON’S COEFFICIENT OF CORRELATION:

Here- and

Note-

1. Correlation coefficient always lies between -1 and +1.

2. Correlation coefficient is independent of change of origin and scale.

3. If the two variables are independent then correlation coefficient between them is zero.

Correlation coefficient

Type of correlation

+1

Perfect positive correlation

-1

Perfect negative correlation

0.25

Weak positive correlation

0.75

Strong positive correlation

-0.25

Weak negative correlation

-0.75

Strong negative correlation

0

No correlation


 

Example: Find the correlation coefficient between Age and weight of the following data-

Age

30

44

45

43

34

44

Weight

56

55

60

64

62

63

 

Sol.

x

y

())

30

56

-10

100

-4

16

40

44

55

4

16

-5

25

-20

45

60

5

25

0

0

0

43

64

3

9

4

16

12

34

62

-6

36

2

4

-12

44

63

4

16

3

9

12

 

Sum= 240

 

360

 

0

 

202

 

0

 

70

 

 

32

 

Karl Pearson’s coefficient of correlation-

Here the correlation coefficient is 0.27. which is the positive correlation (weak positive correlation), this indicates that the as age increases, the weight also increase.

 

Short-cut method to calculate correlation coefficient-

Here,

 

Example: Find the correlation coefficient between the values X and Y of the dataset given below by using short-cut method-

X

10

20

30

40

50

Y

90

85

80

60

45

 

Sol.

X

Y

10

90

-20

400

20

400

-400

20

85

-10

100

15

225

-150

30

80

0

0

10

100

0

40

60

10

100

-10

100

-100

50

45

20

400

-25

625

-500

 

Sum = 150

 

360

 

0

 

1000

 

10

 

1450

 

-1150

 

Short-cut method to calculate correlation coefficient-

 

 

Example: Ten students got the following percentage of marks in Economics and Statistics

 

Calculate the of correlation.

Roll No.

Marks in Economics

Marks in

 

Solution:

Let the marks oftwo subjects be denoted by and respectively.

Then the mean for marks and the mean ofy marks

and are deviations ofx’s and ’s from their respective means, then the data may be arranged in the following form:

x

y

X=x=65

Y=y=66

XY

78

84

13

18

169

234

234

36

51

-29

-15

841

225

435

98

91

33

1089

1089

625

825

25

60

-40

1600

1600

36

240

75

68

10

100

100

4

20

82

62

17

289

289

16

-68

90

86

25

625

625

400

500

62

58

-3

9

9

64

24

65

53

0

0

0

169

0

39

47

-26

676

676

361

494

650

660

0

5398

5398

2224

2704

 

Here,

 

 

 

Spearman’s Rank Correlation

Solution:

Let be the ranks of individuals corresponding to two characteristics.

Assuming nor two individuals are equal in either classification, each individual takes the values 1, 2, 3, and hence their arithmetic means are, each

Let , , , be the values of variable and , , those of

Then

where and y are deviations from the mean.

Clearly, and

SPEARMAN’S RANK CORRELATION COEFFICIENT:

 

Where denotes the rank coefficient of correlation and refers to the difference ofranks between paired items in two series.

 

Example: Compute Spearman’s rank correlation coefficient r for the following data:

Person

A

B

C

D

E

F

G

H

I

J

Rank Statistics

9

10

6

5

7

2

4

8

1

3

Rank in income

1

2

3

4

5

6

7

8

9

10

 

Solution:

Person

Rank Statistics

Rank in income

d=

A

9

1

8

64

B

10

2

8

64

C

6

3

3

9

D

5

4

1

1

E

7

5

2

4

F

2

6

-4

16

G

4

7

-3

9

H

8

8

0

0

I

1

9

-8

64

J

3

10

-7

49

 

 

Example:

If X and Y are uncorrelated random variables, the of correlation between and

Solution:

Let and

Then

Now

Similarly

Now

Also

(As and are not correlated, we have )

Similarly

 

 

Key takeaways

  1. KARL PEARSON’S COEFFICIENT OF CORRELATION:

2.     Correlation coefficient always lies between -1 and +1.

3.     Correlation coefficient is independent of change of origin and scale.

4.     If the two variables are independent then correlation coefficient between them is zero.

5.     Short-cut method to calculate correlation coefficient-

 

 

6.     Spearman’s Rank Correlation

 

 

4.     

5.      For random variable X and real number a and b

6.      If are independent random variables and , then

 


Probability vector-

Probability vector is a vector-

If for every ‘i’ and

Stochastic process-

Stochastic process is a family of random variables {X(t )|t T } defined on a common sample space S and indexed by the parameter t , which varies on an index set T .

The values assumed by the random variables X(t ) are called states, and the set of all possible values from the state space of the process is denoted by I .

If the state space is discrete, the stochastic process is known as a chain.

A stochastic process consists of a sequence of experiments in which each experiment has a finite number of outcomes with given probabilities.

 

Stochastic matrices-

All the entries of square matrix P are non-negative and the sum of the entries of any row is one.

A vector v is said to be a fixed vector or a fixed point of a matrix A if vA = v and v = 0.

if v is a fixed vector of A, so is kv since

(kv)A = k(vA) = k(v) = kv.

Note-

  1. If v = (v1v2v3) is a probability vector of a stochastic matrix- P = then vP is also a probability vector.
  2. If P and Q are stochastic matrices then their product P Q is also stochastic matrix. Thus is stochastic matrix for all positive integer values of n.

 

3.     The transition matrix P of a Markov chain is a stochastic matrix.

 

Example: Which vectors are probability vectors?

  1. (5/2, 0, 8/3, 1/6, 1/6)
  2. (3, 0, 2, 5, 3)

Sol.

  1. It is not a probability vector because the sum of the components do not add up to 1
  2. Dividing by 3 + 0 + 2 + 5 + 3 = 13, we get the probability vector

(3/13, 0, 2/13, 5/13, 3/13)

 

Example: show that = (b a) is a fixed point of the stochastic matrix-

Sol.

 

 

Example: Find the unique fixed probability vector t of

Sol.

Suppose t = (x, y, z) be the fixed probability vector.

By definition x + y + z = 1. So t = (x, y,

1 x y), t is said to be fixed vector, if t P = t

On solving, we get-

Required fixed probability vector is-

 

 

 8.4 Markov chains, Higher transition probabilities, Stationary distribution of regular Markov chains and absorbing states

Markov chain-

Markov chain is a finite stochastic process consisting

of a sequence of trials whose outcomes say satisfy the following two conditions:

  1. Each outcome belongs to the state space I = { which is the finite set of outcomes.
  2. The outcome of any trial depends at most upon the outcome of the immediately preceding trial and not upon any other previous outcomes.

 

Higher transition probabilities-

The probability that a Markov chain will move from state i to state j in exactly n steps, is denoted by

And defined as-

 

Stationary distribution-

Stationary distribution of a Markov chain is the unique fixed probability vector t of the regular transition matrix P of the Markov chain because every sequence of probability distribution approach t.

 

Absorbing States-

A state ai of a Markov chain is said to be an absorbing state if the system remains in the state ai once it enters there, i.e., a state ai is absorbing if pii = 1. Thus once a Markov chain enters such an absorbing state, it is destined there to remain forever. In other words the i’th row in P has 1 at the main diagonal (i, i) position and zeros everywhere else.

 

References

  1. E. Kreyszig, “Advanced Engineering Mathematics”, John Wiley & Sons, 2006.
  2. P. G. Hoel, S. C. Port And C. J. Stone, “Introduction To Probability Theory”, Universal Book Stall, 2003.
  3. S. Ross, “A First Course in Probability”, Pearson Education India, 2002.
  4. W. Feller, “An Introduction To Probability Theory and Its Applications”, Vol. 1, Wiley, 1968.
  5. N.P. Bali and M. Goyal, “A Text Book of Engineering Mathematics”, Laxmi Publications, 2010.
  6. B.S. Grewal, “Higher Engineering Mathematics”, Khanna Publishers, 2000.
  7. T. Veerarajan, “Engineering Mathematics”, Tata Mcgraw-Hill, New Delhi, 2010
  8. Higher engineering mathematics, HK Dass

Index
Notes
Highlighted
Underlined
:
Browse by Topics
:
Notes
Highlighted
Underlined