Unit - 2
Trendiness and Regression Analysis
Data Modelling is the process of analyzing the data objects and their relationship to the other objects. It is used to analyze the data requirements that are required for the business processes. The data models are created for the data to be stored in a database. The Data Model's main focus is on what data is needed and how we have to organize data rather than what operations we have to perform.
Data Model is basically an architect's building plan. It is a process of documenting complex software system design as in a diagram that can be easily understood. The diagram will be created using text and symbols to represent how the data will flow. It is also known as the blueprint for constructing new software or re-engineering any application.
Types of data model
1. Conceptual Model
The conceptual data model is a view of the data that is required to help business processes. It also keeps track of business events and keeps related performance measures. The conceptual model defines what the system contains. This type of Data Modelling focuses on finding the data used in a business rather than processing flow. The main purpose of this data model is to organize, define business rules and concepts. For example, it helps business people to view any data like market data, customer data, and purchase data.
2. Logical Model
In the logical data model, the map of rules and data structures includes the data required, such as tables, columns, etc. Data architects and Business Analysts create the Logical Model. We can use the logical model to transform it into a database. This type of Data Modelling is always present in the root package object. This data model helps to form the base for the physical model. In this model, there is no secondary or primary key is defined.
3. Physical Data Model
In a physical data model, the implementation is described using a specific database system. It defines all the components and services that are required to build a database. It is created by using the database language and queries. The physical data model represents each table, column, constraints like primary key, foreign key, NOT NULL, etc. The main work of the physical data model is to create a database. This model is created by the Database Administrator (DBA) and developers. This type of Data Modelling gives us the abstraction of the databases and helps to create the schema. This model describes the particular implementation of the data model. The physical data model helps us to have database column keys, constraints, and RDBMS features.
Data modeling techniques
1. Hierarchical Technique
The hierarchical model is a tree-like structure. There is one root node, or we can say one parent node and the other child nodes are sorted in a particular order. But, the hierarchical model is very rarely used now. This model can be used for real-world model relationships.
2. Object-oriented Model
The object-oriented approach is the creation of objects that contains stored values. The object-oriented model communicates while supporting data abstraction, inheritance, and encapsulation.
3. Network Technique
The network model provides us with a flexible way of representing objects and relationships between these entities. It has a feature known as a schema representing the data in the form of a graph. An object is represented inside a node and the relation between them as an edge, enabling them to maintain multiple parent and child records in a generalized manner.
4. Entity-relationship Model
ER model (Entity-relationship model) is a high-level relational model which is used to define data elements and relationship for the entities in a system. This conceptual design provides a better view of the data that helps us easy to understand. In this model, the entire database is represented in a diagram called an entity-relationship diagram, consisting of Entities, Attributes, and Relationships.
5. Relational Technique
Relational is used to describe the different relationships between the entities. And there are different sets of relations between the entities such as one to one, one to many, many to one, and many to many.
Simple linear regression
Simple linear regression is a model that assesses the relationship between a dependent variable and an independent variable.
Y = a + bX + ϵ
Where:
Y – Dependent variable
X – Independent (explanatory) variable
a – Intercept
b – Slope
ϵ – Residual (error)
With the help of simple linear regression model we have the following two regression lines
1. Regression line of Y on X: This line gives the probable value of Y (Dependent variable) for any given value of X (Independent variable).
Regression line of Y on X: Y – Ẏ = byx (X – Ẋ)
OR: Y = a + bX
2. Regression line of X on Y: This line gives the probable value of X (Dependent variable) for any given value of Y (Independent variable).
Regression line of X on Y: X – Ẋ = bxy (Y – Ẏ)
OR: X = a + bY
Multiple linear regressions
Multiple linear regression analysis is essentially similar to the simple linear model, with the exception that multiple independent variables are used in the model.
Y = a + bX1 + cX2 + dX3 + ϵ
Where:
Y – Dependent variable
X1, X2, X3 – Independent (explanatory) variables
a – Intercept
b, c, d – Slopes
ϵ – Residual (error)
Example
How to find a linear regression equation
Subject | X | Y |
1 | 43 | 99 |
2 | 21 | 65 |
3 | 25 | 79 |
4 | 42 | 75 |
5 | 57 | 87 |
6 | 59 | 81 |
|
|
|
Solution
Subject | X | Y | Xy | X2 | Y2 |
1 | 43 | 99 | 4257 | 1849 | 9801 |
2 | 21 | 65 | 1365 | 441 | 4225 |
3 | 25 | 79 | 1975 | 625 | 6241 |
4 | 42 | 75 | 3150 | 1764 | 5625 |
5 | 57 | 87 | 4959 | 3249 | 7569 |
6 | 59 | 81 | 4779 | 3481 | 6521 |
Total | 247 | 486 | 20485 | 11409 | 40022 |
To find a and b, use the following equation
Find a:
((486 × 11,409) – ((247 × 20,485)) / 6 (11,409) – 247*247)
484979 / 7445
=65.14
Find b:
(6(20,485) – (247 × 486)) / (6 (11409) – 247*247)
(122,910 – 120,042) / 68,454 – 2472
2,868 / 7,445
= .385225
y’ = a + bx
y’ = 65.14 + .385225x
Example
Calculate linear regression analysis
Students | X | Y |
1 | 95 | 85 |
2 | 85 | 95 |
3 | 80 | 70 |
4 | 70 | 65 |
5 | 60 | 70 |
Solution
Students | X | Y | X2 | y2 | Xy |
1 | 95 | 85 | 9025 | 7225 | 8075 |
2 | 85 | 95 | 7225 | 9025 | 8075 |
3 | 80 | 70 | 6400 | 4900 | 5600 |
4 | 70 | 65 | 4900 | 4225 | 4550 |
5 | 60 | 70 | 3600 | 4900 | 4200 |
Total | 390 | 385 | 31150 | 30275 | 30500 |
To find a and b, use the following equation
Find a:
((385 × 31150) – ((390 × 30500)) / 5 (31150) – 152100)
97750 / 3650
=26.78
Find b:
(5(30500) – (390 × 385)) / (5 (31150) – 152100)
2,350 / 3650
= .0.64
y’ = a + bx
y’ = 26.78 + .0.64x
Key takeaways –
- Data Modelling is the process of analyzing the data objects and their relationship to the other objects.
- Simple linear regression is a model that assesses the relationship between a dependent variable and an independent variable
Important resources
Companies are using business analytics (BA) to make data-driven decisions that help them to automate and optimize their business processes. Business analytics–which is defined as “the study of data through statistical and operations analysis, the formation of predictive models, application of optimization techniques, and the communication of these results to customers, business partners, and college executives”–requires quantitative methods and evidence-based data for business modeling and decision making. Companies that implement business analytics best practices gain a competitive advantage, as they are able to use the insights gained through BA to conduct data mining, complete statistical analysis and quantitative analysis to explain why certain results occur, test previous decisions using A/B and multivariate testing, and make use of predictive modeling and predictive analytics to forecast future results.
Business analytics personnel
Business analytics focuses on data, statistical analysis and reporting to help investigate and analyze business performance, provide insights, and drive recommendations to improve performance.
They may also work with internal or external clients, but their focus is to improve the product, marketing or customer experience by using insights from data, rather than analyzing processes and functions.
A great business analytics professional could be described as:
A good communicator
Being able to present findings in a clear and concise manner is fundamental to making sure that all players understand insights and can put recommendations into practice. People working in analysis must be able to tell a story with data through strong writing and presentation skills.
Inquisitive
People in this field should have natural curiosity and drive to continue learning and figuring out how things fit together. Even as analysts become managers, it’s important to stay in touch with the industry and its changes.
A problem solver
Professionals in analytics use a combination of logical thinking, predictive analytics and statistics to make recommendations that will solve problems and propel a business forward. In a profession that seeks to turn data into solutions, being a natural problem solver helps connect the dots.
A critical thinker
Business analytics professionals need to think critically about not only the implications of the data they collect, but about what data they should be collecting in the first place. They are expected to analyze and highlight only the data that can be helpful in making decisions.
A visualizer
Disorganized data doesn’t help anyone. To create worth from data, analytics professionals need to be able to translate and visualize data in a concise and accurate way that’s easy to digest.
Both detail-oriented and a big picture thinker
While business analytics professionals have to be able to handle complex data, they also need to understand how their recommendations will affect the bottom line of a business. There’s no point in having access to large quantities of information without knowing how it can be harnessed to analyze and improve tactics, processes and strategies.
Below are some of the top tools for business analytics professionals:
SQL
SQL is the coding language of databases and one of the most important tools in an analytics professional’s toolkit. Professionals write SQL queries to extract and analyze data from the transactions database and develop visualizations to present to stakeholders.
Statistical languages
The two most common programming languages in analytics are R, for statistical analysis, and Python, for general programming. Knowledge in either of these languages can be beneficial when analyzing big data sets, but is not vital.
Statistical software
While the ability to program is helpful for a career in analytics, being able to write code isn’t necessarily required to work as an analytics professional. Apart from the above languages, statistical software such as SPSS, SAS, Sage, Mathematica, and even Excel can be used when managing and analyzing data.
Data and models of business analytics
Data analysis is a technique to gain insight into an organisation’s data. A data analyst might have the following responsibilities:
- To create and analyse important reports (possibly using a third-party reporting, data warehousing, or business intelligence system) to help the business make better decisions.
- To merge data from multiple data sources together, as part of data mining, so it can be analysed and reported on.
- To run queries on existing data sources to evaluate analytics and analyse trends.
Data analysts will have hands-on access to the organisation’s data repositories and use their technical skills to query and manipulate the data. They may also be skilled in statistical analysis, having a high-level of mathematical experience
Data modeling is a set of tools and techniques used to understand and analyse how an organisation should collect, update, and store data. It is a critical skill for the business analyst who is involved with discovering, analysing, and specifying changes to how software systems create and maintain information.
- They create an entity relationship diagram to visualise relationships between key business concepts.
- They create a conceptual-level data dictionary to communicate data requirements that are important to business stakeholders.
- They create a data map to resolve potential data issues for a data migration or integration project.
A data modeller would not necessarily query or manipulate data or become involved in designing or implementing databases or data repositories.
Data Modeling sometimes needs Data Analysis
BA's often need to analyse data as part of making data modeling decisions, and this means that data modeling can include some amount of data analysis. A lot can be accomplished with very basic technical skills, such as the ability to run simple database queries. This is why you may see a technical skill like SQL in a business analyst job description.
Many BA's succeed without knowing these more technical skills, instead, they rely on their ability to collaborate with technical professionals and other knowledgeable stakeholders to ensure the data is understood well enough to make the right modelling decisions.
The non-technical BA can also evaluate sample data, interview stakeholders to discover possible data-related issues, review current state database models, and analyse exception reports.
While data analysis skills are valuable for the business analyst, they are not essential. However, data modelling falls squarely within the business analyst’s domain.
Problem solving
As business analysis professionals and change agents, one of our most important skills is problem solving. Problems present an opportunity to bring value to our customers and organization. Without the crucial skill of problem solving, we’re limited in our contribution to the organization as well as career growth. Many people conduct problem solving intuitively simply by “thinking” without pausing to consider what the underlying process and mechanism for problem solving is. Problem solving is a discipline; a science of applying logical and analytical techniques to identify the underlying cause and recommend solutions that address the root cause. Matt’s Recommended 6 Stage Problem Solving Approach The problem solving approach that Matt uses is a simple six stage process. The staged do not need to be completed sequentially; the individual stages may repeat and be completed in iterations. The stages consist of:
- Defining the problem statement
- Defining scope
- Elicit information & resolving ambiguity
- Identifying associations and relationships
- Root cause analysis
- Solution proposal
The Problem Solving Process Start by creating the problem statement. The problem statement is a well-defined statement or question to frame the context. After you have a clear and unambiguous problem statement, define the scope of the effort. The scope definition is probably the most important stage since it basically whether or not the problem can be solved satisfactorily. Scope is defined to apply constraints to the domain of consideration. When we have scope we know what to consider and what not to consider. Therefore, all possible solutions are directly dependant on the information within the scope. Once the scope is defined, you can move on to eliciting information & resolving ambiguity. Perform a stakeholder analysis and elicit information from all known stakeholders/sources as a basis for investigation. You can use workshops, focus groups, interviews, document analysis, and other approaches to elicit information. When we elicit information, we try to remove ambiguity as ambiguity represents the unknown, liability, and risk. To reduce ambiguity, we need to consider the taxonomy of ambiguity to provide a frame of reference to how we will resolve it. Ambiguity may be:
- Missing information
- Incorrect information
- Duplicate information
- Conflicting information
- Incomplete information
The above provide a basis to ask questions concerning all information that is within scope, to challenge this information to be reliable and suitable for use. Context diagrams and domain diagram can help resolve ambiguity. Next, we identify associations and relationships to organize the information so we can derive meaning from it. Information needs to be structured, aligned, and associated that provides an additional level of meaning. This is the basis for traceability. The linking of concepts. It’s not just solely used for requirements. Once we thoroughly understand the information, we can move on to performing a root cause analysis. A root cause analysis helps you to understand the underlying cause of the problem so you can address it instead of addressing a symptom of a greater issue. There are many techniques for root cause analysis including 5 Whys and Fishbone diagrams. Now that we understand the real root cause, we can propose solutions that will address that root cause. When identifying proposed solutions, consider the scope, constraints, and relative cost and value of each option.
Visualizing and Exploring Data
Data visualization is a graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.
In the world of Big Data, data visualization tools and technologies are essential to analyze massive amounts of information and make data-driven decisions.
The uses of Data Visualization as follows.
- Powerful way to explore data with presentable results.
- Primary use is the pre-processing portion of the data mining process.
- Supports the data cleaning process by finding incorrect and missing values.
- For variable derivation and selection means to determine which variable to include and discarded in the analysis.
- Also play a role in combining categories as part of the data reduction process.
Importance
- Helping decision makers understand how the business data is being interpreted to determine business decisions.
- Leading the target audience to focus on business insights to discover areas that require attention.
- Handling large amounts of data in a pictorial format to provide a summary of unseen patterns in the data, revealing insights and the story behind the data to establish a business goal.
- Visualizing business data to manage growth and converting trends into business strategies by making sense of your information.
- Revealing previously unnoticed key points about the data sources to help decision makers compose data analysis reports.
Data exploration is the first step of data analysis used to explore and visualize data to uncover insights from the start or identify areas or patterns to dig into more. Using interactive dashboards and point-and-click data exploration, users can better understand the bigger picture and get to insights faster.
Starting with data exploration helps users to make better decisions on where to dig deeper into the data and to take a broad understanding of the business when asking more detailed questions later. With a user-friendly interface, anyone across an organization can familiarize themselves with the data, discover patterns, and generate thoughtful questions that may spur on deeper, valuable analysis.
Data exploration and visual analytics tools build understanding, empowering users to explore data in any visualization. This approach speeds up time to answers and deepens users’ understanding by covering more ground in less time. Data exploration is important for this reason because it democratizes access to data and provides governed self-service analytics. Furthermore, businesses can accelerate data exploration by provisioning and delivering data through visual data marts that are easy to explore and use.
Use cases of data exploration
Data exploration can help businesses explore large amounts of data quickly to better understand next steps in terms of further analysis. This gives the business a more manageable starting point and a way to target areas of interest. In most cases, data exploration involves using data visualizations to examine the data at a high level. By taking this high-level approach, businesses can determine which data is most important and which may distort the analysis and therefore should be removed. Data exploration can also be helpful in decreasing time spent on less valuable analysis by selecting the right path forward from the start.
Business Analytics Technology
Business Analytics is the process by which businesses use statistical methods and technologies for analyzing historical data in order to gain new insight and improve strategic decision-making.
Business analytics, a data management solution and business intelligence subset, refers to the use of methodologies such as data mining, predictive analytics, and statistical analysis in order to analyze and transform data into useful information, identify and anticipate trends and outcomes, and ultimately make smarter, data-driven business decisions.
The main components of a typical business analytics dashboard include:
- Data Aggregation: prior to analysis, data must first be gathered, organized, and filtered, either through volunteered data or transactional records
- Data Mining: data mining for business analytics sorts through large datasets using databases, statistics, and machine learning to identify trends and establish relationships
- Association and Sequence Identification: the identification of predictable actions that are performed in association with other actions or sequentially
- Text Mining: explores and organizes large, unstructured text datasets for the purpose of qualitative and quantitative analysis
- Forecasting: analyzes historical data from a specific period in order to make informed estimates that are predictive in determining future events or behaviors
- Predictive Analytics: predictive business analytics uses a variety of statistical techniques to create predictive models, which extract information from datasets, identify patterns, and provide a predictive score for an array of organizational outcomes
- Optimization: once trends have been identified and predictions have been made, businesses can engage simulation techniques to test out best-case scenarios
- Data Visualization: provides visual representations such as charts and graphs for easy and quick data analysis
Business Analytics and Information Technology (BAIT) focuses on three levels of using information which are becoming more strongly intertwined and are essential components of the modern enterprise:
- Information Technology - developing skills to capture, store, organize, and search your data
- Data Analysis - discovering and understanding patterns of data
- Decision Modeling - using data to make better decisions and formulate complex plans of action
Key takeaways –
- Data modeling is a set of tools and techniques used to understand and analyse how an organisation should collect, update, and store data
- Business Analytics is the process by which businesses use statistical methods and technologies for analyzing historical data in order to gain new insight and improve strategic decision-making.
References:
1. Business analytics Principles, Concepts, and Applications by Marc J. Schniederjans, Dara G. Schniederjans, Christopher M. Starkey, Pearson FT Press.
2. Business Analytics by James Evans, persons Education