Unit - 2
Context-free languages and pushdown automata
Q1) Define ‘context free grammar’ and ‘context free language’ ?
A1)
Context-free grammars (CFG)
Context-free grammar (CFG) is a collection of rules (or productions) for recursive rewriting used to produce string patterns.
It consists of:
● a set of terminal symbols,
● a set of non terminal symbols,
● a set of productions,
● a start symbol.
Context-free grammar G can be defined as:
G = (V, T, P, S)
Where
G - grammar, used for generate string,
V - final set of non terminal symbol, which are denoted by capital letters,
T - final set of terminal symbol, which are denoted be small letters,
P - set of production rules, it is used for replacing non terminal symbol of a string in both side (left or right)
S - start symbol, which is used for derive the string
A CFG for Arithmetic Expressions -
An example grammar that generates strings representing arithmetic expressions with the four operators +, -, *, /, and numbers as operands is:
1. <expression> --> number
2. <expression> --> (<expression>)
3.<expression> --> <expression> +<expression>
4. <expression> --> <expression> - <expression>
5. <expression> --> <expression> * <expression>
6. <expression> --> <expression> / <expression>
The only non terminal symbol in this grammar is <expression>, which is also the
Start symbol. The terminal symbols are {+,-,*,/ , (,),number}
● The first rule states that an <expression> can be rewritten as a number.
● The second rule says that an <expression> enclosed in parentheses is also an <expression>
● The remaining rules say that the sum, difference, product, or division of two <expression> is also an expression.
Context free language (CFL)
A context-free language (CFL) is a language developed by context-free grammar (CFG) in formal language theory. In programming languages, context-free languages have many applications, particularly most arithmetic expressions are created by context-free grammars.
● Through comparing different grammars that characterise the language, intrinsic properties of the language can be separated from extrinsic properties of a particular grammar.
● In the parsing stage, CFLs are used by the compiler as they describe the syntax of a programming language and are used by several editors.
A grammatical description of a language has four essential components -
- There is a set of symbols that form the strings of the language being defined. They are called terminal symbols, represented by V t .
- There is a finite set of variables, called non-terminals. These are represented by V n .
- One of the variables represents the language being defined; it is called the start symbol. It is represented by S.
- There are a finite set of rules called productions that represent the recursive definition of a language. Each production consists of:
● A variable that is being defined by the production. This variable is also called the production head.
● The production symbol ->
● A string of zero terminals and variables, or more.
Q2) Define normal forms of Chomsky normal forms ?
A2)
Chomsky normal forms
A context free grammar (CFG) is in Chomsky Normal Form (CNF) if all production
Rules satisfy one of the following conditions:
A non-terminal generating a terminal (e.g.; X->x)
A non-terminal generating two non-terminals (e.g.; X->YZ)
Start symbol generating ε. (e.g.; S->ε)
Consider the following grammars,
G1 = {S->a, S->AZ, A->a, Z->z}
G2 = {S->a, S->aZ, Z->a}
The grammar G1 is in CNF as production rules satisfy the rules specified for
CNF. However, the grammar G2 is not in CNF as the production rule S->aZ contains
Terminal followed by non-terminal which does not satisfy the rules specified for CNF.
Q3) Define normal form of Greibach normal forms?
A3)
Greibach normal forms
A CFG is in Greibach Normal Form if the Productions are in the following forms −
A → b
A → bD 1 …D n
S → ε
Where A, D 1 ,....,D n are non-terminals and b is a terminal.
Algorithm to Convert a CFG into Greibach Normal Form
Step 1 − If the start symbol S occurs on some right side, create a new start
Symbol S’ and a new production S’ → S.
Step 2 − Remove Null productions.
Step 3 − Remove unit productions.
Step 4 − Remove all direct and indirect left-recursion.
Step 5 − Do proper substitutions of productions to convert it into the proper form of
GNF.
Problem
Convert the following CFG into CNF
S → XY | Xn | p
X → mX | m
Y → Xn | o
Solution
Here, S does not appear on the right side of any production and there are no unit or
Null productions in the production rule set. So, we can skip Step 1 to Step 3.
Step 4
Now after replacing
X in S → XY | Xo | p
With
MX | m
We obtain
S → mXY | mY | mXo | mo | p.
And after replacing
X in Y → X n | o
With the right side of
X → mX | m
We obtain
Y → mXn | mn | o.
Two new productions O → o and P → p are added to the production set and then
We came to the final GNF as the following −
S → mXY | mY | mXC | mC | p
X → mX | m
Y → mXD | mD | o
O → o
P → p
Q4) Describe a Parse tree?
A4)
Parse trees
● A parse tree is an entity which represents the structure of the derivation of a
Terminal string from some non-terminal.
● The graphical representation of symbols is the Parse tree. Terminal or non-terminal may be the symbol.
● In parsing, using the start symbol, the string is derived. The starter symbol is the root of the parse tree.
● It is the symbol's graphical representation, which can be terminals or non-terminals.
● The parse tree follows operators' precedence. The deepest sub-tree went through first. So, the operator has less precedence over the operator in the sub-tree in the parent node.
Key features to define are the root ∈ V and yield ∈ Σ * of each tree.
● For each σ ∈ Σ, there is a tree with root σ and no children; its yield is σ
● For each rule A → ε, there is a tree with root A and one child ε; its yield is ε
● If t 1 , t 2 , ..., t n are parse trees with roots r 1 , r 2 , ..., r n and respective yields y 1 , y 2 ,..., y n , and A → r 1 r 2 ...r n is a production, then there is a parse tree with root A whose children are t 1 , t 2 , ..., t n . Its root is A and its yield is the concatenation of yields: y 1 y 2 ...y n
Here, parse trees are constructed from bottom up, not top down.
The actual construction of “adding children” should be made more precise, but we intuitively know what’s going on.
As an example, here are all the parse (sub) trees used to build the parse tree
For the arithmetic expression 4 + 2 * 3 using the expression grammar
E → E + T | E - T | T
T → T * F | F
F → a | ( E )
Where a represents an operand of some type, be it a number or variable. The
Trees are grouped by height.
Fig : example of parse tree
Parse Trees and Derivations
A derivation is a sequence of strings in V * which starts with a non-terminal
In V-Σ and ends with a string in Σ * .
Let’s consider the sample grammar
E → E+E | a
We write:
E ⇒ E+E ⇒ E+E+E ⇒a+E+E⇒a+a+E⇒a+a+a
But this is incomplete, because it doesn’t tell us where the replacement rules are
Applied.
We actually need "marked" strings which indicate which non-terminal is replaced in all but the first and last step:
E ⇒ Ě+E ⇒ Ě+E+E ⇒a+Ě+E ⇒a+a+Ě ⇒a+a+a
In this case, the marking is only necessary in the second step; however it is crucial,
Because we want to distinguish between this derivation and the following one:
E ⇒ E+Ě ⇒ Ě+E+E ⇒a+Ě+E ⇒a+a+Ě ⇒a+a+a
We want to characterize two derivations as “coming from the same parse tree.”
The first step is to define the relation among derivations as being “more left-oriented at one step”. Assume we have two equal length derivations of length n > 2:
D: x 1 ⇒ x 2 ⇒ ... ⇒x n
D′: x 1 ′ ⇒ x 2 ′ ⇒ ... ⇒x n ′
Where x 1 = x 1 ′ is a non-terminal and
x n = x n ′ ∈ Σ *
Namely they start with the same non-terminal and end at the same terminal string
And have at least two intermediate steps.
Let’s say D < D′ if the two derivations differ in only one step in which there are 2 non-
Terminals, A and B, such that D replaces the left one before the right one and D′ does the opposite. Formally:
D < D′ if there exists k, 1 < k < n such that
x i = x i ′ for all i ≠ k (equal strings, same marked position)
x k-1 = uǍvBw, for u, v, w ∈ V*
x k-1 ′ = uAvB̌w, for u, v, w ∈ V*
x k =uyvB̌w, for production A → y
x k ′ = uǍvzw, for production B → z
x k+1 = x k+1 ′ = uyvzw (marking not shown)
Two derivations are said to be similar if they belong to the reflexive, symmetric, transitive closure of <.
Q5) What is Ambiguity in CFG?
A5)
Ambiguity in CFG
Suppose we have a context free grammar G with production rules:
S->aSb|bSa|SS|ɛ
Left most derivation (LMD) and Derivation Tree:
Leftmost derivation of a string from staring symbol S is done by replacing leftmost
Non-terminal symbol by RHS of corresponding production rule.
For example: The leftmost derivation of string abab from grammar G above is done
As:
S =>aSb =>abSab =>abab
The symbols in bold are replaced using production rules.
Derivation tree: It explains how string is derived using production rules from S and
Is shown in Figure.
Fig : derivation tree
Right most derivation (RMD):
It is done by replacing rightmost non-terminal symbol S by RHS of corresponding production rule.
For Example: The rightmost derivation of string abab from grammar G above
Is done as:
S =>SS =>SaSb =>Sab =>aSbab =>abab
The symbols in bold are replaced using production rules.
The derivation tree for abab using rightmost derivation is shown in Figure.
Fig : right most derivation
A derivation can be either LMD or RMD or both or none.
For Example:
S =>aSb =>abSab =>abab is LMD as well as RMD But
S => SS =>SaSb =>Sab =>aSbab =>abab is RMD but not LMD.
Ambiguous Context Free Grammar:
● A context free grammar is called ambiguous if there exists more than one
LMD or RMD for a string which is generated by grammar.
● There will also be more than one derivation tree for a string in ambiguous grammar.
● The grammar described above is ambiguous because there are two derivation
Trees.
● There can be more than one RMD for string abab which are:
S =>SS =>SaSb =>Sab =>aSbab =>abab
S =>aSb =>abSab =>abab
Q6) Describe pumping lemma for context-free languages?
A6)
Pumping lemma for context-free languages
Lemma: The language L = {anbncn | n ≥ 1} is not context free.
Proof (By contradiction)
Assuming that this language is context-free; hence it will have a context-free
Grammar.
Let K be the constant of the Pumping Lemma.
Considering the string akbkck , where L is length greater than K .
By the Pumping Lemma this is represented as uvxyz , such that all uvixyiz are also
In L , which is not possible, as:
Either v or y cannot contain many letters from {a,b,c}; else they are in the wrong
Order .
If v or y consists of a’s, b’s or c’s, then uv2xy2z cannot maintain the balance
Amongst the three letters.
Lemma: The language L = {aibjck | i < j and i < k } is not context free.
Proof (By contradiction)
Assuming that this language is context-free; hence it will have a context-free grammar.
Let K be the constant of the Pumping Lemma.
Considering the string akbk+1ck+1 , which is L > K.
By the Pumping Lemma this must be represented as uvxyz , such that all are also
In L .
-As mentioned previously neither v nor y may contain a mixture of symbols.
-Suppose consists of a’s.
Then there is no way y cannot have b’s and c’s. It generate enough letters to keep
Them more than that of the a’s (it can do it for one or the other of them, not both)
Similarly y cannot consist of just a’s.
-So suppose then that v or y contains only b’s or only c’s.
Consider the string uv0xy0z which must be in L . Since we have dropped both
v and y, we must have at least one b’ or one c’ less than we had in uvxyz , which
Was akbk+1ck+1. Consequently, this string no longer has enough of either b’s or
c’s to be a member of L .
Q7) Define ‘deterministic pushdown automata’ ?
A7)
Deterministic pushdown automata
- Machine transitions are based on the current state and input symbol, and also
The current topmost symbol of the stack.
- Symbols lower in the stack are not visible and have no immediate effect.
- Machine actions include pushing, popping, or replacing the stack top.
- A deterministic pushdown automaton has at most one legal transition for the
Same combination of input symbol, state, and top stack symbol.
- This is where it differs from the nondeterministic pushdown automaton.
- DPDA is a subset of context free language
A deterministic pushdown automata M can be defined as 7 tuples -
M = (Q, ∑, T, q0, Z0, A, δ)
Where,
● Q is finite set of states,
● ∑ is finite set of input symbol,
● T is a finite set of stack symbol,
● q0 ∈ Q is a start state,
● Z0 ∈ T is the starting stack symbol,
● A ⊆ Q, where A is a set of accepting states,
● δ is a transition function, where :
δ :(Q X ( ∑ U {ε }) X T ) ➝ p(Q X T* )
Q8) What are closure properties of CFL ?
A8)
They are closed under −
● Union
● Concatenation
● Kleene Star operation
Union
Let A1 and A2 be two context free languages. Then A1 ∪ A2 is also context free.
Example
Let A1 = { xn yn , n ≥ 0}. Corresponding grammar G 1 will have P: S1 → aAb|ab
Let A2 = { cm dm , m ≥ 0}. Corresponding grammar G 2 will have P: S2 → cBb| ε
Union of A1 and A2 , A = A1 ∪ A2 = { xn yn } ∪ { cm dm }
The corresponding grammar G will have the additional production S → S1 | S2
Note - So under Union, CFLs are closed.
Concatenation
If A1 and A2 are context free languages, then A1 A2 is also context free.
Example
Union of the languages A 1 and A 2 , A = A 1 A 2 = { an bn cm dm }
The corresponding grammar G will have the additional production S → S1 S2
Note - So under Concatenation, CFL are locked .
Kleene Star
If A is a context free language, so that A* is context free as well .
Example
Let A = { xn yn , n ≥ 0}. G will have corresponding grammar P: S → aAb| ε
Kleene Star L 1 = { xn yn }*
The corresponding grammar G 1 will have additional productions S1 → SS 1 | ε
Note - So under Kleene Closure, CFL's are closed .
Context-free languages are not closed under −
● Intersection − If A1 and A2 are context free languages, then A1 ∩ A2 is not necessarily context free.
● Intersection with Regular Language − If A1 is a regular language and A2 is a context free language, then A1 ∩ A2 is a context free language.
● Complement − If A1 is a context free language, then A1’ may not be context free.
Q9) Convert the given CFG to CNF , considering the provided grammar G1
S → a | aA | B
A → aBB | ε
B → Aa | b
A9)
Convert the given CFG to CNF
Step 1: As the starting symbol S appears on the RHS, we will build a new S1 ⁇ S production. The grammar is going to be:
S1 → S
S → a | aA | B
A → aBB | ε
B → Aa | b
Step 2 : Since grammar G1 contains the output of A Ø ε null, its removal from grammar yields :
S1 → S
S → a | aA | B
A → aBB
B → Aa | b | a
Now, since grammar G1 includes Unit output S → B, its removal yield:
S1 → S
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Remove the development unit S1 → S and remove it from the grammar yields:
S0 → a | aA | Aa | b
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Step 3 : In the Law of Development S0 ⁇ aA | Aa, S→ aA | Aa, A → aBB and B → Aa, terminal an occurs with non-terminals on RHS. So we're going to replace Terminal A with X,
S0 → a | XA | AX | b
S → a | XA | AX | b
A → XBB
B → AX | b | a
X → a
Step 4 : RHS has more than two symbols in the output rule A ⁇ XBB, excluding it from grammar yield.
S0 → a | XA | AX | b
S → a | XA | AX | b
A → RB
B → AX | b | a
X → a
R → XB
Hence, this is the necessary CNF for the given grammar.
Q10) Write the steps for converting CFG into GNF ?
A10)
Steps for converting CFG into GNF
Step 1: Convert the grammar into CNF.
Convert it to CNF if the given grammar is not in CNF.
Step 2: If the left recursion occurs in the grammar, delete it.
Delete it if the context free grammar includes left recursion. To delete left recursion, you may refer to the following subject: Left recursion
Step 3: Convert the given production rule to the GNF form in the grammar.
If any development rule is not in GNF form in the grammar, convert it.