Unit – 3
Code Generation
Q1) What factors are involved?
A1) Factors involved
The final phase in the compiler model is the code generator. It takes as input an intermediate representation of the source program and produces as output an equivalent target program. Designing of code generators should be done in such a way so that it can be easily implemented, tested and maintained.
The different factors affecting target code generation includes:
1. Input to code generator:
The input to the code generation consists of the intermediate representation of the source program produced by the front end, together with information in the symbol table to determine run-time addresses of the data objects denoted by the names in the intermediate representation.
2. Target program:
The target program is the output of the code generator.
The output can be:
a) Assembly language: It allows subprograms to be separately compiled.
b) Relocatable machine language: It makes the process of code generation easier.
c) Absolute machine language: It can be placed in a fixed location in memory and can be executed immediately.
3. Memory management
● During the code generation process the symbol table entries have to be mapped to actual p addresses and levels have to be mapped to instruction addresses.
● Mapping names in the source program to address data is co-operating done by the front end and code generator.
● Local variables are stack allocation in the activation record while global variables are in static areas.
4. Instruction selection
● Nature of the instruction set of the target machine should be complete and uniform.
● When you consider the efficiency of the target machine then the instruction speed and machine idioms are important factors.
● The quality of the generated code can be determined by its speed and size.
5. Register allocation
● Registers can be accessed faster than memory. The instructions involving operands in register are shorter and faster than those involving memory operands.
● The following sub problems arise when we use registers:
● Register allocation: In register allocation, we select the set of variables that will reside in the register.
● Register assignment: In Register assignment, we pick the register that contains variables.
● Certain machines require even-odd pairs of registers for some operands and results.
6. Evolution order
The efficiency of the target code can be affected by the order in which the computations are performed. Some computation orders need fewer registers to hold results than others.
Q2) Explain register allocation?
A2) Register Allocation
The quickest positions in the hierarchy of memory are registers. But this resource is, unfortunately, small. It falls under the target processor's most restricted tools. Allocation of the register is an NP-complete problem. However, to achieve allocation and assignment, this issue can be reduced to graph colouring. An effective approximate solution to a difficult problem is therefore calculated by a good register allocator.
The allocator of the register specifies the values that will reside in the register and the register that will contain each of those values. It takes a programme with an arbitrary number of registers as its input and generates a programme that can fit into the target machine with a finite register set.
Allocation Vs Assignment
Allocation
Map an infinite namespace to the target machine's register set.
● Reg. to reg. Model: Digital records are mapped to actual registers, but excess memory spills out.
● Mem. to mem. Model: Maps any memory location sub-set to a set of names that models the set of physical registers.
Allocation ensures that the code matches the register of the target computer. Put on each instruction.
Assignment
Maps the assigned name set to the target machine physical register set.
● Assumes that the allocation was performed so that the code fits into the physical register set.
● In the registers, no more than 'k' values are specified, where 'k' is the number of physical registers.
Q3) Describe stack allocation?
A3) Stack allocation
Stack allocation is a procedure in which a stack is used to organize the storage. The stack used in stack allocation is known as control stack. In this type of allocation, creation of data objects is performed dynamically.
In stack allocation, activation records are created for the allocation of memory. These activation records are pushed onto the stack using Last In First Out (LIFO) method. Locals are stored in the activation records at run time and memory addressing is done by using pointers and registers.
In static storage allocation, storage is organized as a stack. An activation record is pushed into the stack when activation begins and it is popped when the activation ends.
Activation record contains the locals so that they are bound to fresh storage in each activation record. The value of locals is deleted when the activation ends. It works on the basis of last-in-first-out (LIFO) and this allocation supports the recursion process.
Q4) What is the basic graph?
A4) Basic graph
The Basic Block includes a declaration sequence. The control flow enters at the beginning of the sentence and leaves without any halt at the end (except may be the last instruction of the block).
A fundamental block is generated by the following sequence of three address statements:
Basic block construction:
Algorithm: Partition into basic blocks
Input: It comprises the series of three sentences of address
Output: It comprises a list of basic blocks in exactly one block with three address statements each.
Method: Identify the leader in the code first. The guidelines are as follows for finding leaders:
● The first statement is a leader.
● Statement L is a leader if the goto statement is conditional or unconditional, such as: if....goto L or goto L
● Instruction L is a leader if a goto or conditional goto sentence like: if goto B or goto B is followed immediately.
The basic block for each leader consists of the leader and all statements up to that. The next leader and the end of the curriculum are not included.
For the dot product of two vectors a and b of length 10, consider the following source code:
begin
prod: =0;
i: =1;
do begin
prod: =prod+ a[i] * b[i];
i: =i+1;
end
while i <= 10
end
The three address codes for the source programme above are given below:
B1
B2
The basic block B1 contains the declaration (1) to (2)
The basic block B2 contains a declaration (3) to (12)
Q5) Define flow graph?
A5) Flow graph
A directional graph is a flow graph. It contains the control information stream for the basic block collection.
A control flow graph is used to show how the control of the programme between the blocks is parsed. It is helpful in the optimization of the loop.
The vector dot product's flow graph is given as follows:
Fig: flow graph
● The initial node is block B1. Block B2 follows B1 immediately, so an edge occurs from B2 to B1.
● The jump goal from the last statement of B1 is the first statement of B2, so there is an edge from B1 to B2.
● B2 is a successor to B1 and the precursor to B2 is B1.
Q6) Describe simple code generation?
A6) Simple code generation
A code generator generates target code for a sequence of three- address statements and effectively uses registers to store operands of the statements.
For example:
Consider the three-address statement a: = b+c
It can have the following sequence of codes:
ADD Rj, Ri Cost = 1 // if Ri contains b and Rj contains c
(or)
ADD c, Ri Cost = 2 // if c is in a memory location
(or)
MOV c, Rj Cost = 3 // move c from memory to Rj and
add ADD Rj, Ri
Register and Address Descriptors:
● A register descriptor is used to keep track of what is currently in each register. The register descriptors show that initially all the registers are empty.
● An address descriptor stores the location where the current value of the name can be found at run time.
A code-generation algorithm:
The algorithm takes as input a sequence of three -address statements constituting a basic block. For each three-address statement of the form x: = y op z, perform the following actions:
1. Invoke a function getreg to determine the location L where the result of the computation y op z should be stored.
2. Consult the address descriptor for y to determine y’, the current location of y. Prefer the register for y’ if the value of y is currently both in memory and a register. If the value of y is not already in L, generate the instruction MOV y’, L to place a copy of y in L.
3. Generate the instruction OP z’, L where z’ is a current location of z. Prefer a register to a memory location if z is in both. Update the address descriptor of x to indicate that x is in location L. If x is in L, update its descriptor and remove x from all other descriptors.
4. If the current values of y or z have no next uses, are not live on exit from the block, and are in registers, alter the register descriptor to indicate that, after execution of x: = y op z, those registers will no longer contain y or z.
The algorithmic sequence of getreg function can be,
1. if x value is in the register that register is returned.
2. If (1) fails, the new register is returned.
3. If (2) fails, and the operation needs a special register, that register value is temporarily moved to the memory and the register is returned.
4. If (3) fails, finally memory location is returned
Q7) What is code optimization?
A7) Code optimization
Optimization is a program transformation technique, which tries to improve the code by making it consume less resources (i.e., CPU, Memory) and deliver high speed.
In optimization, high-level general programming constructs are replaced by very efficient low-level programming codes. A code optimizing process must follow the three rules given below:
● The output code must not, in any way, change the meaning of the program.
● Optimization should increase the speed of the program and if possible, the program should demand less resources.
● Optimization should itself be fast and should not delay the overall compiling process.
Efforts for an optimized code can be made at various levels of compiling the process.
● At the beginning, users can change/rearrange the code or use better algorithms to write the code.
● After generating intermediate code, the compiler can modify the intermediate code by address calculations and improving loops.
● While producing the target machine code, the compiler can make use of memory hierarchy and CPU registers.
Optimization can be categorized broadly into two types: machine independent and machine dependent.
Q8) Explain peephole organization?
A8) Peephole organization
A code-generation statement-by-statement strategy also generates target code containing redundant instructions and sub-optimal constructs. It is possible to boost the consistency of such a target code by adding "optimizing" transformations to the target programme.
Peephole optimization, a method for trying to improve the performance of the target programme by analyzing a short series of target instructions (called the peephole) and replacing these instructions, wherever possible, with a shorter or faster sequence, is an easy but successful technique for improving the target code.
A small, moving window on the target programme is the peephole. The code in the peephole does not need to be contiguous, though this is needed by some implementations. It is typical of peephole optimization that additional improvements can be generated by each enhancement.
A bunch of statements are evaluated and the following potential optimization is checked:
Q9) Redundant instruction elimination
A9) The following can be performed by the user at the source code level:
The compiler looks for instructions at the compilation stage that are redundant in nature. Even if some of them are omitted, multiple loading and storing instructions can carry the same significance.
For instance:
● MOV x, R0
● MOV R0, R1
The first instruction can be omitted and the sentence re-written as
MOV x, R1
Unreachable code is a portion of the programme code that, because of programming constructs, is never accessed. Programmers may have written a piece of code unintentionally that can never be reached.
Example:
void add_ten(int x)
{
return x + 10;
printf(“value of x is %d”, x);
}
2. Flow of control optimization
There are occasions in a code where, without performing any significant task, the programme control bounces back and forth. It is possible to delete these hops.
Find the following code chunk:
...
MOV R1, R2
GOTO L1
...
L1: GOTO L2
L2: INC R1
3. Algebraic expression simplification
There are times when it is possible to make algebraic expressions simple. For instance, it is possible to replace the expression a = a + 0 with an expression itself, and the expression a = a + 1 may simply be replaced with INC a.