Unit - 2
Regular Expressions (RE)
● Simple expressions called Regular Expressions can readily define the language adopted by finite automata. This is the most powerful way of expressing any language.
● Regular languages are referred to as the languages recognized by certain regular expressions.
● A regular expression may also be defined as a pattern sequence that defines a series.
● For matching character combinations in strings, regular expressions are used. This pattern is used by the string search algorithm to locate operations on a string.
A Regular Expression can be recursively defined as follows -
● ε is a Regular Expression indicating the language containing an empty string. (L (ε) = {ε})
● φ is a Regular Expression denoting an empty language. (L (φ) = {})
● x is a Regular Expression where L = {x}
● If X is a Regular Expression denoting the language L(X) and Y is a Regular Expression denoting the language L(Y), then
● X + Y is a Regular Expression corresponding to the language L(X) ∪ L(Y) where L(X+Y) = L(X) ∪ L(Y).
● X. Y is a Regular Expression corresponding to the language L(X). L(Y) where L (X.Y) = L(X). L(Y)
● R* is a Regular Expression corresponding to the language L(R*) where L(R*) = (L(R)) *
Example:
Write the regular expression beginning with a but not including consecutive b's for the language.
Sol: The regular expression has to be created for the language:
L = {a, aba, aab, aba, aaa, abab, .....}
Key takeaway
- Regular languages are referred to as the languages recognized by certain regular expressions.
- A regular expression may also be defined as a pattern sequence that defines a series.
Regular expressions are used to produce patterns of strings, much as finite automata are used to recognize patterns of strings. A regular expression is an algebraic formula whose value is a pattern made up of a collection of strings known as the expression's language.
In a regular expression, the operands can be:
● The alphabetic characters that the regular expression is defined over.
● Any pattern defined by a regular expression can be used as a value for variables.
● epsilon, which stands for an empty string with no characters.
● The empty set of strings is denoted by null.
Regular expressions use the following operators:
Union - R1 | R2 (sometimes written as R1 U R2 or R1 + R2) is a regular expression if R1 and R2 are regular expressions.
L(R1|R2) = L(R1) U L(R2).
Concatenation - R1R2 (sometimes written as R1.R2) is a regular expression if R1 and R2 are regular expressions.
L(R1R2) = L(R1) concatenated with L(R2).
Kleene closure - R1* (the Kleene closure of R1) is a regular expression if R1 is a regular expression.
L(R1*) = epsilon U L(R1) U L(R1R1) U L(R1R1R1) U ...
Closure takes precedence over concatenation, which takes precedence over union.
Example
Over the set ∑ = {a}, write the regular expression for the language that accepts all combinations of a's.
A can be zero, single, double, and so on in any combination of a's. If the letter an appears zero times, the string is null. That is, the set of {ε, a, aa, aaa, ....}. As a result, we can write this as a regular expression:
R = a*
That's how a Kleen closure works.
Precedence
The regular-expression operators, like any algebra, have an assumed order of "precedence," which means that operators are associated with operands in a specific order. The following is the sequence of events:
1. The star operator is the most important.
2. The dot or concatenation operator is next.
3. Finally, all the unions (+) with their operands are gathered. (0(1) *) +1 is the grouping for the expression 01* +1.
In set notation, write the language L (a*. (a+b)).
L (a*. (a+b)) = L(a*) L(a+b)
= (L(a)) *(L(a)∪L(b))
= {ε, a, aa, aaa,….} {a, b}
= {a, aa, aaa, …, b, ab, aab, aaab,..}
Create a regular expression for the collection of strings that alternate between 0s and 1s.
Let's start by creating a single string 01. Regular expression for the language. The star operator may then be used to get an expression of all strings of the form 0101....0.
The RE base rule says that 0 and 1 are expressions that denote the languages 0 and 1, respectively. We get a regular expression for the language 01 by concatenating the two phrases; this is the expression 01.
Now we use the regular expression (01) * to get all strings with zero or more instances of 01, and we get L (01) *. However, because it only includes strings beginning with 0, this isn't exactly what we want. To account for strings beginning with 1 and strings ending with 0 or 1:
(01) * +(10) *+ 0 (10) *+ 1(01) *
Key takeaway
Regular expressions are used to produce patterns of strings, much as finite automata are used to recognize patterns of strings. A regular expression is an algebraic formula whose value is a pattern made up of a collection of strings known as the expression's language.
Let's look at the Regular Expression Algebraic Laws.
Associativity law
Let's look at the Regular Expression Associativity Laws.
A + (B + C) = (A + B) + C and A. (B.C) = (A.B). C.
Commutativity for regular expression
Let's look at the Regular Expression Commutativity Laws.
A + B = B + A. However, A.B 6= B.A in general.
Identity for regular expression
Let's look at the Regular Expression Identity.
∅ + A = A + ∅ = A and ε.A = A.ε = A
Distributivity
Let's look at the Regular Expression Distributivity.
Left distributivity: A. (B + C) = A.B + A.C.
Right distributivity: (B + C). A = B.A + C.A.
Idempotent A + A = A.
Closure laws
Let's look at the Regular Expression Closure laws.
(A*) * = A*, ∅* = ε, ε* = ε, A+ = AA* = A*A, and A* = A + + ε.
DeMorgan Type law
Let's look at the Regular Expression DeMorgan laws.
(L + B) * = (L*B*) *
Understand with the examples –
Example 1:
Over the set ∑ = {a}, write the regular expression for the language that accepts all combinations of a's except the null string.
Solution:
The language's regular expression must be created.
L = {a, aa, aaa, ....}
This set denotes the absence of a null string. As a result, regular expression can be defined as:
R = a+
Example 2:
Write the language's regular expression that accepts any string with any number of a's and b's.
Solution:
The regular expression will look like this:
r.e. = (a + b)*
The set will be L = {ε, a, aa, b, bb, ab, ba, aba, bab,.....}, any combination of a and b.
Any combination of a and b, including a null string, is represented by the (a + b)*.
Equivalence of RE
Remember that Regular language is a type of language that is approved by some FAs. The two notions of REs and Regular Language are fundamentally the same, in that (for) any regular language, a RE may be constructed, and (for) every RE, a Regular Language exists. This is surprising, because the RE method to characterize language differs significantly from the FA approach. However, in terms of descriptive power, REs and FA are equal. This fact can be used to prove the following Theorem.
Theorem: If some RE describes a language, it is regular.
This Theorem has two directions, each of which is stated and proved separately below.
Lemma: If L(r) is a language described by the RE r, then it is regular i.e., there is a FA such that L(M) ≌ L(r).
Proof: We use a structured index on the expression r to prove the lemma. First, we demonstrate how to build FA for the basic elements: ø, ∈ and ɑ ∈ ∑ for any. Then we explain how to merge these Finite Automata into Complex Automata that accept the Union, Concatenation, and Kleene Closure of the languages that the smaller automata accepted.
The use of NFAs is advantageous in the case where we generate NFAs for each RE represented solely by a transition diagram.
Constructing an FA from an RE
We begin by showing how to construct an FA for the operands in a regular expression.
If the operand is a character c, then our FA has two states, s0 (the start state)
And sF (the final, accepting state), and a transition from s0 to sF with label c.
If the operand is epsilon, then our FA has two states, s0 (the start state) and sF
(The final, accepting state), and an epsilon transition from s0 to sF.
If the operand is null, then our FA has two states, s0 (the start state) and sF (the
Final, accepting state), and no transitions.
Given FA for R1 and R2, we now show how to build an FA for R1R2, R1|R2, and
R1*.
Let A (with start state a0 and final state aF) be the machine accepting L(R1) and B (with start state b0 and final state bF) be the machine accepting L(R2).
● The machine C accepting L(R1R2) includes A and B, with start state a0, final state bF, and an epsilon transition from aF to b0.
● The machine C accepting L(R1|R2) includes A and B, with a new start state c0, a new final state cF, and epsilon transitions from c0 to a0 and b0, and from aF and bF to cF.
● The machine C accepting L(R1*) includes A, with a new start state c0, a new final state cF, and epsilon transitions from c0 to a0 and cF, and from aF to a0, and from aF to cF.
Arden’s theorem
● The Arden's Theorem is useful both for testing the equivalence of two regular phrases and for converting the DFA to a regular phrase.
● In the conversion of DFA to a regular expression, let us see its use.
● There are some algorithms used to build the regular expression form given in the DFA.
1. Let q1 be the initial state.
2. There are q2, q3, q4 ....qn number of states. The final state may be some qj where j<= n.
3. Let αji represent the transition from qj to qi.
4. Calculate qi such that
qi = αji * qj
If qj is a start state then we have:
qi = αji * qj + ε
5. Similarly, compute the final state which ultimately gives the regular expression 'r'.
Algebraic Method Using Arden’s Theorem
Proof -
R = Q +RP
R = Q + QP* P (Substituting the value of R)
R = Q (ε + P *P)
R = QP * (P * P = P+, P+ + ε = P *)
Sol: With the use of Arden's Theorem, let's solve the given automata
C = Ba
There is a self-loop on input b in state B, a transition from A when input is a, a transition from state C when input is b.
B = Bb + Cb + Aa
In state A, there is a transition from ε (being the start state, transition from ε must be included), a self-loop on input a, a transition from B when input is b.
A = ε + Aa + Bb
Put the value (2) in (1), then
C = Ba
C = (Aa + Bb + Cb) a
C = Aaa + Bba + Cba
Place the value (1) in (2), then
B = Bb + Cb + Aa
B = Aa + Bb + (Ba)b
B = Aa + B (b + ab)
B = Aa (b + ab)* ( using R + QP*)
Place (2) in (3), after that
A = ε +Aa +Bb
A = ε +Aa +Aa (b + ab)*b
A = ε +A (a +a (b + ab)*b)
A = ε + (a +a (b + ab) *b) *
A = (a + a (b + ab) *b) *
Let's combine all the simplified equations with the final state C as a final step.
C = Ba
C = Aa (b + ab) *a
C = (a + a (b + ab) *b) * a (b + ab) * a
Key takeaway
- The Arden's Theorem is useful both for testing the equivalence of two regular phrases and for converting the DFA to a regular phrase.
Theorem -
Let L be a regular language. Then a constant c' exists such that for every string w in L
|w| ≥ c
We are able to split w into three strings, w=xyz, so that
● |y| > 0
● |xy| ≤ c
● For all k ≥ 0, the string xyKz is also in L.
Application of Pumping Lemma
It is important to apply the Pumping Lemma to show that certain languages are not normal. It can never be used to illustrate the regularity of a language.
● If L is regular, it satisfies Pumping Lemma.
● If L does not satisfy Pumping Lemma, it is non-regular.
Method for demonstrating that a language L is not regular
● We have to assume that L is regular.
● The pumping lemma should hold for L.
● Use the pumping lemma to obtain a contradiction −
● Select w such that |w| ≥ c
● Select y such that |y| ≥ 1
● Select x such that |xy| ≤ c
● Assign the remaining string to z.
● Select k such that the resulting string is not in L.
Therefore, L is not regular.
Example:
Prove that L = {aibi | i ≥ 0} is not regular.
Sol:
● At first, we assume that L is regular and n is the number of states.
● Let w = anbn. Thus |w| = 2n ≥ n.
● By pumping lemma, let w = xyz, where |xy| ≤ n.
● Let x = ap, y = aq, and z = arbn, where p + q + r = n, p ≠ 0, q ≠ 0, r ≠ 0. Thus |y| ≠ 0.
● Let k = 2. Then xy2z = apa2qarbn.
● Number of as = (p + 2q + r) = (p + q + r) + q = n + q
● Hence, xy2z = an+q bn. Since q ≠ 0, xy2z is not of the form anbn.
● Thus, xy2z is not in L.
Hence L is not regular.
Regular languages are closed under a wide variety of operations.
Union and intersection -
Pick DFAs recognizing the two languages and use the cross-product construction to build a DFA recognizing their union or intersection.
Set complement -
Pick a DFA recognizing the language, then swap the accept/non-accept markings on its states.
Set difference -
Rewrite set difference using a combination of intersection and set complement
Concatenation and Star -
Pick an NFA recognizing the language and modify it
Homomorphism -
A homomorphism is a function from strings to strings. What makes it a homomorphism is that its output on a multi-character string is just the concatenation of its outputs on each individual character in the string. Or, equivalently, h(xy) = h(x)h(y) for any strings x and y. If S is a set of strings, then h(S) is {w: w = h(x) for some x in S}.
To show that regular languages are closed under homomorphism, choose an arbitrary regular language L and a homomorphism h. It can be represented using a regular expression R. But then h(R) is a regular expression representing h(L). So, h(L) must also be regular.
Notice that regular languages are not closed under the subset/superset relation. For
Example, 0 * 1 * is regular, but its subset {O n 1 n: n > = 0} is not regular, but its subset {01,0011, 000111} is regular again.
Decision properties
If the language L of all yes instances to P is decidable, a decision problem P is said to be decidable (i.e., have an algorithm).
Example -
● (DFA acceptance problem) Provided that a DFA accepts a given problem, Word?
● Given a DFA, does it accept any word? (Emptiness problem for DFA)
● (DFA equivalence issue) Given two DFAs, do they accept the DFA problem? Language of the same?
Problem: Some of regular L1 and L2 languages, the problem is whether or not there is a string 'w' in both L1 and L2, a deciding problem.
Sol: Firstly, two Turing TM1 and TM2 machines simulate the DFAs of the L1 and L2 languages, respectively. We know a DFA always stops, so a Turing machine that simulates a DFA will always stop, too. In TM1 and TM2 we join the string 'w'. The two Turing machines are going to halt and give us an answer.
We may connect the Turing machine outputs to a modified 'AND' gate that will only output 'yes' when both Turing machines respond to 'yes'. Otherwise, 'no' is production.
This problem is a deciding problem, as this system of two Turing machines and a changed AND gate will always stop.
Key takeaway
- Regular languages can be recognized by some DFA.
- Regular language consists of all strings with an equal amount of zeroes and ones, with the ones following the zeroes.
The Myhill-Nerode language non-regularity test is based on the following theorem, which is in the contrapositive form of the theorem used for the non-regularity test. The Myhill-Nerode theorem is also a consequence:
Theorem: A language L over the alphabet is regular if and only if the set of equivalence classes is finite.
Proof of Theorem -
Suppose L is a regular language and two strings, such as x and y, can be separated with respect to L. Then a string z is given so that xz is in L and yz is not in L (or xz is not in L and yz is in L). This means that the DFA enters various states if x and y are read by a DFA that recognizes L. If there are three strings that are separated with respect to L, then after reading those three strings, the DFA enters three distinct states....
Therefore, with respect to L, if there are infinitely many strings to be separated, then the DFA must have infinitely many states, which it cannot since a DFA must have a finite number of states.
Therefore, if there is an infinite set of strings that can be distinguished pairwise with regard to a language, then the language is not natural.
If, on the other hand, the number of string classes that are pairwise indistinguishable with respect to the L language is finite, the L language is normal.
To prove this, let [x] denote a class of strings that, with respect to L, are indistinguishable from string x. Note that 'indistinguishable from L' is an equivalence relationship over the string set (refer to IL) and that [x]'s equivalence classes. We can demonstrate that using these equivalence groups, a DFA that accepts L can be built.
Myhill Nerode Theorem
Let us state the Myhill-Nerode theorem here. Any terminology first.
For every x, y ∈ *, if x R y, then for every z ∈ *, xR y, then for every z ∈ *, xz R yz, an equivalence relationship R on * is said to be the correct invariant.
If the set of its equivalence classes is finite, an equivalence relation is often said to be of a finite index.
With these words, it is now possible to state the Myhill-Nerode Theorem as follows:
The three sentences that follow are equivalent:
- A language is regular.
- L is the union of some of the equivalence classes of a right invariant equivalence relation of finite index.
- Is of finite index.
Key takeaway
- The Myhill-Nerode language non-regularity test is based on the following theorem, which is in the contrapositive form of the theorem used for the non-regularity test.
References:
- Daniel Cohen, “Introduction to Computer Theory”, Wiley & Sons, ISBN 97881265133454
- Kavi Mahesh, “Theory of Computation: A Problem-Solving Approach”, Wiley India, ISBN1081265331106
- Michael Sipser, “Introduction to the Theory of Computation”, Cengage Learning, ISBN- 13: 97811331878137
- Vivek Kulkarni, “Theory of Computation”, Oxford University Press, ISBN 0-19-808458