Unit - 5
Metrics and cost estimation
Q1. State and explain lines of code with measurement methods?
A 1)
1. The phrase “lines of code” (LOC) is a metric generally used to evaluate a software program or codebase according to its size.
2. It is a general identifier taken by adding up the number of lines of code used to write a program. LOC is used in various ways to assess a project, and there is a debate on how effective this measurement is.
3. Source lines of code (SLOC), also known as lines of code (LOC), is a software metric used to measure the size of a computer program by counting the number of lines in the text of the program's source code.
4. SLOC is typically used to predict the amount of effort that will be required to develop a program, as well as to estimate programming productivity or maintainability once the software is produced.
Measurement methods
1. Many useful comparisons involve only the order of magnitude of lines of code in a project. Using lines of code to compare a 10,000-line project to a 100,000-line project is far more useful than when comparing a 20,000-line project with a 21,000-line project.
2. While it is debatable exactly how to measure lines of code, discrepancies of an order of magnitude can be clear indicators of software complexity or man-hours.
3. There are two major types of SLOC measures: physical SLOC (LOC) and logical SLOC (LLOC). Specific definitions of these two measures vary, but the most common definition of physical SLOC is a count of lines in the text of the program's source code excluding comment lines.
4. Logical SLOC attempts to measure the number of executable "statements", but their specific definitions are tied to specific computer languages (one simple logical SLOC measure for C-like programming languages is the number of statement-terminating semicolons).
5. It is much easier to create tools that measure physical SLOC, and physical SLOC definitions are easier to explain. However, physical SLOC measures are sensitive to logically irrelevant formatting and style conventions, while logical SLOC is less sensitive to formatting and style conventions.
6. However, SLOC measures are often stated without giving their definition, and logical SLOC can often be significantly different from physical SLOC.
7. Consider this snippet of C code as an example of the ambiguity encountered when determining SLOC:
for (i = 0; i < 100; i++) printf("hello"); /* How many lines of code is this? */
8. In this example we have:
8.1 1 physical line of code (LOC),
8.2 2 logical lines of code (LLOC) (for statement and printf statement),
8.3 1 comment line.
Q2. State and explain Mc cabe’s metrics in detail?
A 2)
1. Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976.
2. Cyclomatic complexity is computed using the control flow graph of the program: the nodes of the graph correspond to indivisible groups of commands of a program, and a directed edge connects two nodes if the second command might be executed immediately after the first command.
3. Cyclomatic complexity may also be applied to individual functions, modules, methods or classes within a program.
4. One testing strategy, called basis path testing by McCabe who first proposed it, is to test each linearly independent path through the program; in this case, the number of test cases will equal the cyclomatic complexity of the program.
1. A control flow graph of a simple program. The program begins executing at the red node, then enters a loop (group of three nodes immediately below the red node). On exiting the loop, there is a conditional statement (group below the loop), and finally the program exits at the blue node. This graph has 9 edges, 8 nodes, and 1 connected component, so the cyclomatic complexity of the program is 9 - 8 + 2*1 = 3.
2. The cyclomatic complexity of a section of source code is the number of linearly independent paths within it—where "linearly independent" means that each path has at least one edge that is not in one of the other paths.
3. For instance, if the source code contained no control flow statements (conditionals or decision points), the complexity would be 1, since there
would be only a single path through the code.
4. If the code had one single-condition IF statement, there would be two paths through the code: one where the IF statement evaluates to TRUE and another one where it evaluates to FALSE, so the complexity would be 2. Two nested single-condition IFs, or one IF with two conditions, would produce a complexity of 3.
5. Mathematically, the cyclomatic complexity of a structured program[a] is defined with reference to the control flow graph of the program, a directed graph containing the basic blocks of the program, with an edge between two basic blocks if control may pass from the first to the second. The complexity M is then defined as:
M = E − N + 2P,
where
E = the number of edges of the graph.
N = the number of nodes of the graph.
P = the number of connected components.
6. The same function as above, represented using the alternative formulation, where each exit point is connected back to the entry point. This graph has 10 edges, 8 nodes, and 1 connected component, which also results in a cyclomatic complexity of 3 using the alternative formulation (10 - 8 + 1 = 3).
7. An alternative formulation is to use a graph in which each exit point is connected back to the entry point. In this case, the graph is strongly connected, and the cyclomatic complexity of the program is equal to the cyclomatic number of its graph (also known as the first Betti number), which is defined as
8. M = E − N + P.
9. This may be seen as calculating the number of linearly independent cycles that exist in the graph, i.e. those cycles that do not contain other cycles within themselves. Note that because each exit point loops back to the entry point, there is at least one such cycle for each exit point.
10 For a single program (or subroutine or method), P is always equal to 1. So a simpler formula for a single subroutine is
M = E − N + 2.
10. Applications
10.1 Limiting complexity during development
10.2 Measuring the "structuredness" of a program
10.3 Implications for software testing
10.4 Correlation to number of defects
Q3. State and explain Halsted’s metric method in detail?
A 3)
1. Halstead complexity measures are software metrics introduced by Maurice Howard Halstead in 1977as part of his treatise on establishing an empirical science of software development.
2. Halstead made the observation that metrics of the software should reflect the implementation or expression of algorithms in different languages, but be independent of their execution on a specific platform. These metrics are therefore computed statically from the code.
3. Halstead's goal was to identify measurable properties of software, and the relations between them. This is similar to the identification of measurable properties of matter (like the volume, mass, and pressure of a gas) and the relationships between them (analogous to the gas equation). Thus his metrics are actually not just complexity metrics.
4. Halstead metrics are:
4.1 Program Volume (V)
The unit of measurement of volume is the standard unit for size "bits." It is the actual size of a program if a uniform binary encoding for the vocabulary is used.
V=N*log2n
4.2 Program Level (L)
The value of L ranges between zero and one, with L=1 representing a program written at the highest possible level (i.e., with minimum size).
L=V*/V
4.3 Program Difficulty
The difficulty level or error-proneness (D) of the program is proportional to the number of the unique operator in the program.
D= (n1/2) * (N2/n2)
4.4 Programming Effort (E)
The unit of measurement of E is elementary mental discriminations.
E=V/L=D*V
4.5 Estimated Program Length
According to Halstead, The first Hypothesis of software science is that the length of a well-structured program is a function only of the number of unique operators and operands.
N=N1+N2
And estimated program length is denoted by N^
N^ = n1log2n1 + n2log2n2
The following alternate expressions have been published to estimate program length:
NJ = log2 (n1!) + log2 (n2!)
NB = n1 * log2n2 + n2 * log2n1
NC = n1 * sqrt(n1) + n2 * sqrt(n2)
NS = (n * log2n) / 2
4.6 Potential Minimum Volume
The potential minimum volume V* is defined as the volume of the most short program in which a problem can be coded.
V* = (2 + n2*) * log2 (2 + n2*)
Here, n2* is the count of unique input and output parameters
4.7 Size of Vocabulary (n)
The size of the vocabulary of a program, which consists of the number of unique tokens used to build a program, is defined as:
n=n1+n2
where
n=vocabulary of a program
n1=number of unique operators
n2=number of unique operands
4.8 Language Level - Shows the algorithm implementation program language level. The same algorithm demands additional effort if it is written in a low-level program language. For example, it is easier to program in Pascal than in Assembler.
L' = V / D / D
lambda = L * V* = L2 * V
Q4. Write a short note on Function point in software engineering ?
A 4)
1. The function point is a "unit of measurement" to express the amount of business functionality an information system (as a product) provides to a user. Function points are used to compute a functional size measurement (FSM) of software. The cost (in dollars or hours) of a single unit is calculated from past projects
2. Function points were defined in 1979 in Measuring Application Development Productivity by Allan Albrecht at IBM.
3. The functional user requirements of the software are identified and each one is categorized into one of five types: outputs, inquiries, inputs, internal files, and external interfaces.
4. Once the function is identified and categorized into a type, it is then assessed for complexity and assigned a number of function points. Each of these functional user requirements maps to an end-user business function, such as a data entry for an Input or a user query for an Inquiry.
5. This distinction is important because it tends to make the functions measured in function points map easily into user-oriented requirements, but it also tends to hide internal functions (e.g. algorithms), which also require resources to implement.
6. There is currently no ISO recognized FSM Method that includes algorithmic complexity in the sizing result. Recently there have been different approaches proposed to deal with this perceived weakness, implemented in several commercial software products
Q5. State and explain different variations in function points in software engineering?
A 5)
The variations of the Albrecht-based IFPUG method designed to make up for this (and other weaknesses) include:
1. Early and easy function points – Adjusts for problem and data complexity with two questions that yield a somewhat subjective complexity measurement; simplifies measurement by eliminating the need to count data elements.
2. Engineering function points – Elements (variable names) and operators (e.g., arithmetic, equality/inequality, Boolean) are counted. This variation highlights computational function. The intent is similar to that of the operator/operand-based Halstead complexity measures.
3. Bang measure – Defines a function metric based on twelve primitive (simple) counts that affect or show Bang, defined as "the measure of true function to be delivered as perceived by the user." Bang measure may be helpful in evaluating a software unit's value in terms of how much useful function it provides, although there is little evidence in the literature of such application. The use of Bang measure could apply when re-engineering (either complete or piecewise) is being considered, as discussed in Maintenance of Operational Systems—An Overview.
4. Feature points – Adds changes to improve applicability to systems with significant internal processing (e.g., operating systems, communications systems). This allows accounting for functions not readily perceivable by the user, but essential for proper operation.
5. Weighted Micro Function Points – One of the newer models (2009) which adjusts function points using weights derived from program flow complexity, operand and operator vocabulary, object usage, and algorithm.
6. Fuzzy Function Points - Proposes a fuzzy and gradative transition between low x medium and medium x high complexities
Q6. Write a short note on feature points and state its calculation table?
A 6)
Feature Points
1. Feature point is the superset of function point measure that can be applied to systems and engineering software applications.
2. The feature points are used in those applications in which the algorithmic complexity is high like real-time systems where time constraints are there, embedded systems, etc.
3. Feature points are computed by counting the information domain values and are weighed by only single weight.
4. Feature point includes another measurement parameter-ALGORITHM.
The table for the computation of feature point is as follows:
Feature Point Calculations
Measurement Parameter Count Weighing factor
1. Number of external inputs (EI) - * 4 -
2. Number of external outputs (EO) - * 5 -
3. Number of external inquiries (EQ) - * 4 -
4. Number of internal files (ILF) - * 7 -
5. Number of external interfaces (EIF) - * 7 -
6.Algorithms used Count total → - * 3 -
Q7. State and explain Metric for object oriented software in detail?
A 7)
1 McCabe Cyclomatic Complexity (CC) – This complexity metric is probably the most popular one and calculates essential information about constancy and maintainability of software system from source code. It gives insight into the complexity of a method
2 Weighted Method per Class (WMC) – This metric indicates the complexity of a class. One way of calculating the complexity of a class is by using cyclomatic complexities of its methods. One should aim for a class with lower value of WMC as a higher value indicates that the class is more complex.
3 Depth of Inheritance Tree (DIT) – DIT measures the maximum path from node to the root of tree. This metric indicates how far down a class is declared in the inheritance hierarchy. The following figure shows the DIT value for a simple class hierarchy.
4 Number of Children (NOC) – This metric indicates how many sub-classes are going to inherit the methods of the parent class. As shown in above figure, class C2 has 2 children, subclasses C21, C22. The value of NOC indicates the level of reuse in an application. If NOC increases it means reuse increases.
5 Coupling between Objects (CBO) – The rationale behind this metric is that an object is coupled to another object if two object acts upon each other. If a class uses the methods of other classes, then they both are coupled. An increase in CBO indicates an increase in responsibilities of a class. Hence, the CBO value for classes should be kept as low as possible.
6 Lack of Cohesion in Methods (LCOM) – LCOM can be used to measure the degree of cohesiveness present. It reflects on how well a system is designed and how complex a class is. LCOM is calculated by ascertaining the number of method pairs whose similarity is zero, minus the count of method pairs whose similarity is not zero.
7 Method Hiding Factor (MHF) – MHF is defined as the ratio of sum of the invisibilities of all methods defined in all classes to the total number of methods defined in the system. The invisibility of a method is the percentage of the total classes from which this method is not visible.
8 Attribute Hiding Factor (AHF) – AHF is calculated as the ratio of the sum of the invisibilities of all attributes defined in all classes to the total number of attributes defined in the system.
9 Method Inheritance Factor (MIF) – MIF measures the ratio of the sum of the inherited methods in all classes of the system to the total number of available method for all classes.
10 Attribute Inheritance Factor (AIF) – AIF measures the ratio of sum of inherited attributes in all classes of the system under consideration to the total number of available attributes.
Q8. Explain what is Fault tolerance in detail?
A 8)
1. Software fault tolerance is the ability for software to detect and recover from a fault that is happening or has already happened in either the software or hardware in the system in which the software is running in order to provide service in accordance with the specification.
2. Software fault tolerance is a necessary component in order to construct the next generation of highly available and reliable computing systems from embedded systems to data warehouse systems.
3. Software fault tolerance is not a solution unto itself however, and it is important to realize that software fault tolerance is just one piece necessary to create the next generation of systems.
4. In order to adequately understand software fault tolerance it is important to understand the nature of the problem that software fault tolerance is supposed to solve. Software faults are all design faults.
5. Software manufacturing, the reproduction of software, is considered to be perfect. The source of the problem being solely design faults is very different than almost any other system in which fault tolerance is a desired property.
6. This inherent issue, that software faults are the result of human error in interpreting a specification or correctly implementing an algorithm, creates issues which must be dealt with in the fundamental approach to software fault tolerance.
7. Fault tolerance is defined as how to provide, by redundancy, service complying with the specification in spite of faults having occurred or occurring. (Laprie 1996).
8. There are some important concepts buried within the text of this definition that should be examined.
9. Primarily, Laprie argues that fault tolerance is accomplished using redundancy. This argument is good for errors which are not caused by design faults, however, replicating a design fault in multiple places will not aide in complying with a specification.
10. It is also important to note the emphasis placed on the specification as the final arbiter of what is an error and what is not.
11. Design diversity increases pressure on the specification creators to make multiple variants of the same specification which are equivalent in order to aide the programmer in creating variations in algorithms for the necessary redundancy.
12 .The definition itself may no longer be appropriate for the type of problems that current fault tolerance is trying to solve, both hardware and software.
Q9. State the initial process in Cost estimation using COCOMO method?
A 9)
The necessary steps in this model are:
2.1 Get an initial estimate of the development effort from evaluation of thousands of delivered lines of source code (KDLOC).
2.2 Determine a set of 15 multiplying factors from various attributes of the project.
2.3 Calculate the effort estimate by multiplying the initial estimate with all the multiplying factors i.e., multiply the values in step1 and step2.
3. The initial estimate (also called nominal estimate) is determined by an equation of the form used in the static single variable models, using KDLOC as the measure of the size. To determine the initial effort Ei in person-months the equation used is of the type is shown below
Ei=a*(KDLOC)b
Q10. State the types used to categorize values of constants in COCOMO method?
A 10)
The value of the constant a and b are depends on the project type.
In COCOMO, projects are categorized into three types:
1. Organic
2. Semidetached
3. Embedded
1. Organic: A development project can be treated of the organic type, if the project deals with developing a well-understood application program, the size of the development team is reasonably small, and the team members are experienced in developing similar methods of projects. Examples of this type of projects are simple business systems, simple inventory management systems, and data processing systems.
2. Semidetached: A development project can be treated with semidetached type if the development consists of a mixture of experienced and inexperienced staff. Team members may have finite experience in related systems but may be unfamiliar with some aspects of the order being developed. Example of Semidetached system includes developing a new operating system (OS), a Database Management System (DBMS), and complex inventory management system.
3. Embedded: A development project is treated to be of an embedded type, if the software being developed is strongly coupled to complex hardware, or if the stringent regulations on the operational method exist. For Example: ATM, Air Traffic control.
Q11. State the stages for software cost estimation according to boehm method?
A 11)
According to Boehm, software cost estimation should be done through three stages:
1. Basic Model
2. Intermediate Model
3. Detailed Model
1. Basic COCOMO Model:
The basic COCOMO model provide an accurate size of the project parameters. The following expressions give the basic COCOMO estimation model:
Effort=a1*(KLOC) a2 PM
Tdev=b1*(efforts)b2 Months
Where
KLOC is the estimated size of the software product indicate in Kilo Lines of Code,a1,a2,b1,b2 are constants for each group of software products,
Tdev is the estimated time to develop the software, expressed in months,
Effort is the total effort required to develop the software product, expressed in person months (PMs).
2. Estimation of development effort
For the three classes of software products, the formulas for estimating the effort based on the code size are shown below:
Organic: Effort = 2.4(KLOC) 1.05 PM
Semi-detached: Effort = 3.0(KLOC) 1.12 PM
Embedded: Effort = 3.6(KLOC) 1.20 PM
3. Estimation of development time
For the three classes of software products, the formulas for estimating the development time based on the effort are given below:
Organic: Tdev = 2.5(Effort) 0.38 Months
Semi-detached: Tdev = 2.5(Effort) 0.35 Months
Embedded: Tdev = 2.5(Effort) 0.32 Months
Some insight into the basic COCOMO model can be obtained by plotting the estimated characteristics for different software sizes. Fig shows a plot of estimated effort versus product size. From fig, we can observe that the effort is somewhat superliner in the size of the software product. Thus, the effort required to develop a product increases very rapidly with project size.