2.1 DAG for expressions | unit 2 intermediate code generation

Back to Study material

Unit – 2

Intermediate Code Generation

2.1 DAG for expressions

A simple block DAG is a directed acyclic graph containing the following labels on nodes:

Graph leaves are identified by a unique identifier and variable names or constants may be the identifier.

The graph's interior nodes are labelled by an operator symbol.

Nodes are also given a set of label identifiers to store the computed value.

● DAGs are a type of structure for data. It is used on simple blocks to execute transformations.

● A good way to decide a common sub-expression is given by DAG.

● This provides an image of how the value determined by the argument is used in subsequent statements.

Algorithm for construction of DAG

Input: It contains a basic block

Output: It contains the details given below:

● A label is embedded in every node. For leaves, an identifier is the name.

● A list of attached identifiers for keeping the computed values is stored in each node.

Case (i) x: = y OP z

Case (ii) x: = OP y

Case (iii) x: = y

Method

Step 1:

If the operand of y is undefined, create a node (y).

If z operand is undefined, build a node for Case(i) (z).

Step 2:

Construct node (OP) for case(i), whose right child is node(z) and whose left child is node(y).

Check if there is a node (OP) with one child node (y) for case(ii).

Node n will be the node (y) for case(iii).

Output

For node(x), remove x from the ID list. Append x to the list of attached identifiers for the n node listed in step 2. Third, set node(x) to n.

Example:

Find the following three statements of address:

S1: = 4 * i

S2: = a[S1]

S3: = 4 * i

S4: = b[S3]

S5: = s2 * S4

S6: = prod + S5

Prod: = s6

S7: = i+1

i: = S7

if i<= 20 goto (1)

Stages in DAG Construction:

Key takeaway:

DAGs are a type of structure for data.

DAG is a directed acyclic graph.

It is used on simple blocks to execute transformations.

2.2 Three address codes - Quadruples and Triples

An intermediate code is a three-address code. It is used by the compilers that optimize it. The provided expression is broken down into multiple separate instructions in the three-address code. Such instructions can be easily translated into the language of assembly.

Any instruction for the three address codes has three operands at most. It is a combination of a binary operator and a mission

The compiler defines the order of operation generated by three code addresses.

General representation:

x = y op z

Where x, b or c represents temporary generated operations such as names, constants or compiler and op represents the operator.

Example

Given expression:

w: = (-y * x) + (-y * z)

The code with three addresses is as follows:

t1: = -y

t2: = x *t1

t3: = -y

t4: = z * t3

t5: = t2 + t4

w: = t5

It is used as registers in the target programme.

It is possible to represent the three address codes in two forms: quadruples and triples.

Quadruple & triples

Quadruple

For the implementation of the three address codes, the quadruples have four fields. The quadruple field includes the name of the generator, the first operand source, the second operand source, and the output, respectively

Operator

Argument 1

Argument 2

Destination

Fig: quadruple field

Example

w: = -x * y + z

The code with three addresses is as follows:

t1: = -x

t2: = y + z

t3: = t1 * t2

w: = t3

These statements are expressed as follows by quadruples:

	Operator	Source 1	Source 2	Destination
(0)	uminus	x		t1
(1)	+	y	z	t2
(2)	*	t1	t2	t3
(3)	:=	t3		w

Triples

For the implementation of the three-address code, the triples have three fields. The area of triples includes the name of the operator, the first operand of the source and the second operand of the source.

The outcomes of the respective sub-expressions are denoted by the expression location in triples. Although describing expressions, Triple is equal to DAG.

Operator

Argument 1

Argument 2

Fig: Triples field

Example:

w: = -x * y + z

The three code addresses are as follows:

t1: = -x

t2: = y + zM

t3: = t1 * t2

w: = t3

Triples represent these statements as follows:

	Operator	Source 1	Source 2
(0)	uminus	x
(1)	+	y	z
(2)	*	(0)	(1)
(3)	:=	(2)

Key takeaway:

In quadruple, instruction is divided into 4 separate fields.

For the implementation of the three address codes, the triples have three fields.

Triple does not use temporary variables.

2.3 Types and declarations

We need to lay out storage for the declared variables when we encounter declarations.

You can manage declaration with a list of names as follows:

D T id; D | ε

T B C | record ‘{‘D’}’

B int | float

C ε | [ num] C

A declaration sequence is generated by Nonterminal D.

A fundamental, array or record form is generated by nonterminal T.

One of the basic int or float types is generated by nonterminal B.

Nonterminal C produces a string of zero or more integers, each surrounded by brackets for "component".

We create a ST (Symbol Table) entry for every local name in a procedure, containing:

The type of the name

How much storage the name requires

Production:

D → integer, id

D → real, id

D → D1, id

For declarations, an appropriate transformation scheme would be:

Production rule	Semantic action
D → integer, id	ENTER (id.PLACE, integer) D.ATTR = integer
D → real, id	ENTER (id.PLACE, real) D.ATTR = real
D → D1, id	ENTER (id.PLACE, D1.ATTR) D.ATTR = D1.ATTR

ENTER is used to enter the table of symbols and ATTR is used to trace the form of data.

Key takeaway:

We need to lay out storage for the declared variables when we encounter declarations.

2.4 Translation of Expressions

In the syntax directed translation, the assignment statement mainly deals with expressions. The expression can be of type real, integer, array and records.

Consider the grammar

S → id: = E

E → E1 + E2

E → E1 * E2

E → (E1)

E → id

The translation scheme of above grammar is given below:

Production rule	Semantic actions
S → id: =E	{p = look_up(id.name); If p ≠ nil then Emit (p = E.place) Else Error; }
E → E1 + E2	{E.place = newtemp(); Emit (E.place = E1.place '+' E2.place) }
E → E1 * E2	{E.place = newtemp(); Emit (E.place = E1.place '*' E2.place) }
E → (E1)	{E.place = E1.place}
E → id	{p = look_up(id.name); If p ≠ nil then Emit (p = E.place) Else Error; }

● The p returns the entry for id.name in the symbol table.

● The Emit function is used for appending the three-address code to the output file. Otherwise, it will report an error.

● The newtemp () is a function used to generate new temporary variables.

● E.place holds the value of E.

2.5 Array references

● Array elements can be accessed easily if elements are stored in a sequential position block.

● Elements are numbered 0,1,.........,n-1(n elements) for an array.

● If the width of each element of an array is w, the ith element of array A starts at position

Base + i * w

Where base - relative address, define storage allocation.

(Base is relatively address of A [0])

● The array may be one- or two-dimensional.

For one - dimensional array:

A: array [low. high] of the ith elements is at:

base + (i-low) *width → i*width + (base - low*width)

For multi - dimensional array:

Row major or column major forms

● Row major: a [1,1], a [1,2], a [1,3], a [2,1], a [2,2], a [2,3]

● Column major: a [1,1], a [2,1], a [1, 2], a [2, 2], a [1, 3], a [2,3]

● In row major form, the address of a [i1, i2] is

● Base+((i1-low1) *(high2 - low2+1) +i2-low2) *width

Layout for 2D array:

Fig 3: two-dimensional array

Key takeaway:

Array elements can be accessed easily if elements are stored in a sequential position block.

2.6 Type checking and Conversions

A compiler must check that the source program follows both syntactic and semantic conventions of the source language. This checking, called static checking, detects and reports programming errors.

Some examples of static checks:

1. Type checks - A compiler should report an error if an operator is applied to an incompatible operand. Example: If an array variable and function variable are added together.

2. Flow-of-control checks - Statements that cause flow of control to leave a construct must have some place to which to transfer the flow of control. Example: An enclosing statement, such as break, does not exist in switch statements.

Fig: position of type checker

A type checker verifies that the type of a construct matches that expected by its context. For example: arithmetic operator mod in Pascal requires integer operands, so a type checker verifies that the operands of mod have type integer. Type information gathered by a type checker may be needed when code is generated.

Type Systems

The design of a type checker for a language is based on information about the syntactic constructs in the language, the notion of types, and the rules for assigning types to language constructs.

For example: “if both operands of the arithmetic operators of +, - and * are of type integer, then the result is of type integer”

Type Expressions

The type of a language construct will be denoted by a “type expression.” A type expression is either a basic type or is formed by applying an operator called a type constructor to other type expressions. The sets of basic types and constructors depend on the language to be checked. The following are the definitions of type expressions:

1. Basic types such as boolean, char, integer, real are type expressions.

A special basic type, type_error, will signal an error during type checking; void denoting “the absence of a value” allows statements to be checked.

2. Since type expressions may be named, a type name is a type expression.

3. A type constructor applied to type expressions is a type expression.

4. Type expressions may contain variables whose values are type expressions.

Fig: tree representation for char x char → pointer (integer)

Type systems

A type system is a collection of rules for assigning type expressions to the various parts of a program. A type checker implements a type system. It is specified in a syntax-directed manner. Different type systems may be used by different compilers or processors of the same language.

Static and Dynamic Checking of Types

Checking done by a compiler is said to be static, while checking done when the target program runs is termed dynamic. Any check can be done dynamically, if the target code carries the type of an element along with the value of that element.

Conversions

● Expression X + I (The X is real and I is int type)

● First compiler check both of the operand are of same type or not

● Usually convert int type into real type and perform real operand operation

How to Convert

● X + I (int convert into real)

● X I intoreal real +

● Intoreal operation convert I intoreal

● Or real+ operation perform real addition in between both real operands

Coercions

● Convert one type to another automatically called implicit or coercions like ASCII

● It is limited in many languages in case of no information loss

● Integer convert into real but no vice versa because of some memory loss

● When programmer write something to convert called Explicit

Overloading Functions

● The same name is used for several different operations over several different types.

● Type checker is used to detect the error while creating overloading functions

Polymorphic Functions

● A piece of code that can be executed with arguments of different types

Key takeaway:

A compiler must check that the source program follows both syntactic and semantic conventions of the source language.

This checking, called static checking, detects and reports programming errors.

2.7 Translation of Boolean expressions and control flow statements

Boolean expressions have two primary purposes.

1. They are used for computing the logical values.

2. They are also used as conditional expressions that alter the flow of control, such as if-then-else or while-do.

Boolean expressions are composed of the boolean operators (and, or, and not) applied to elements that are boolean variables or relational expressions. Relational expressions are of the form E1 relop E2, where E1 and E2 are arithmetic expressions.

Consider the grammar

E → E or E

E → E and E

E → not E

E → (E)

E → id relop id

E → TRUE

E → id

E → FALSE

The relop is any of <, ≤, =, ≠, >, or ≥. We assume that or and are left associative. or has lowest precedence, then and, then not.

Methods of Translating Boolean Expressions:

There are two methods of representing the value of a Boolean.

1.To encode true or false numerically

• 1 is used to denote true.

• 0 is used to denote false.

2.To evaluate a boolean expression analogously to an arithmetic expression.

• Flow of control –representing the value of a boolean expression by a position reached in a program.

• Used in flow of control statements such as if-then-else or while-do statements.

Numerical Representation of Boolean Expressions:

First consider the implementation of boolean expressions using 1 to denote TRUE and 0 to denote FALSE. Expressions will be evaluated from left to right.

Example 1:

A or B and C Three address sequence:

T1 = B and C

T2 = A or T1

Example 2:

A relational expression A< B is equivalent to the conditional statement. if A < B then 1 else 0

Three address sequence:

(1) if A < B goto (4)

(2) T=0

(3) goto (5)

(4) T=1

(5) ---

Semantic Rules for Boolean Expression:

The Semantic rules for Boolean expression are given as below:

Example: Code for a < b or c < d and e <f

100: if a < b goto 103

101: t1 = 0

102: goto 104

103: t1 = 1

104: if c < d goto 107

105: t2 =0

106: goto 108

107: t2 = 1

108: if e < f goto 111

109: t3 =0

110: goto 112

111: t3 = 1

112: t4 = t2 and t3

113: t5 = t1 or t4

Control Flow Representation:

Short Circuit Evaluation of Boolean Expressions: Flow of control statements with the jump method.

Consider the following: (short circuit code)

S → if E then S | if E then S1 else S2

S → while E do S1

Fig: attribute used for if E then S1 else S2

E is associated with two attributes:

1. E.true: label which controls if E is true

2. E.false : label which controls if E is false

S.next - is a label that is attached to the first three address instruction to be executed after the code for S (S1 or S2)

The Semantic rule for flow of control statements is given by:

Fig: semantic rule for control statements

Key takeaway:

Boolean expressions are composed of the boolean operators (and, or, and not) applied to elements that are boolean variables or relational expressions.

Purpose of boolean expression is - They are also used as conditional expressions that alter the flow of control, such as if-then-else or while-do.

2.8 Back Patching

All labels cannot be recognized in a single pass in the code generation process (primarily, 3 Address Code), so we use a technique called Backpatching.

Backpatching is the task of filling in unspecified label information during the code generation process using acceptable semantic behaviour.

Three functions are used to implement the backpatching technique. The three functions performed in two passes to create code using backpatching are Makelist(), merge() and backpatch().

Three kinds of operations are carried out by Backpatching Algorithms

makelist(i) - Creates a new list that contains only an index in the quadruple array, and returns a pointer to the list that it has generated.

merge (p1, p2) - This function concatenates lists to which p1 and p2 are pointed, returning a pointer to the list of concatenated ones. This is used for assigning more than one address to the same true / false labels.

backpatch (p, i) - This role is used for the insertion of I as the target mark for each of the The statements in the list which p points to. All statements are attached using the data given by these feature labels.

Here ‘p’ is the pointer to any statement say p=> “if a<b goto ____”.

Initially, we don’t know the label to fill just after goto.

After backpatch(p,i)

p=>“if a<b goto i”

Key takeaway:

Backpatching is the task of filling in unspecified label information during the code generation process using acceptable semantic behaviour.

Three functions are used to implement the backpatching technique.

2.9 Intermediate Code Generation for Procedures

In order to convert the source code into the computer code, intermediate code is used. Between the high-level language and the computer language lies the intermediate code.

Fig: position of intermediate code generator

● If the compiler converts the source code directly into the machine code without creating intermediate code, then each new machine needs a complete native compiler.

● For all compilers, the intermediate code holds the research section the same, which is why it does not require a complete compiler for each specific machine.

● Input from its predecessor stage and semantic analyzer stage is obtained by the intermediate code generator. This takes feedback in the form of an annotated tree of syntax.

● The second stage of the compiler synthesis process is modified according to the target machine by using the intermediate code.

Intermediate representation:

It is possible to represent the intermediate code in two ways:

High level intermediate code:

The intermediate code of the high level can be interpreted as the source code. We can easily add code modifications to boost the performance of the source code. Yet there is less preference for optimizing the goal machine.

2. Low level intermediate code:

The low-level intermediate code is similar to the target computer, making it ideal for registering and allocating memory, etc. It is used for optimizations depending on the machine.

Key takeaway:

It lies between the high-level language and the computer language.

The intermediate code of the high level can be interpreted as the source code.

It is used for optimizations depending on the machine.

References:

Compilers – Principles, Techniques and Tools, A. V. Aho, M. S. Lam, R. Sethi, J. D. Ullman, 2nd Ed., Pearson. 2007

Modern Compiler Design, D. Galles, 1st Ed., Pearson Education,2004

Advanced Compiler Design & Implementation, S. S. Muchnick, Morgan Kaufmann, 1997

Sign Up

Index

Notes

Highlighted

Underlined

Browse by Topics

Notes

Highlighted

Underlined