Unit - 6
Advanced topics
Q1) Describe data mining with their types?
A1) Data Mining
Data mining, in simple terms, is defined as a method used to extract useful data from a larger collection of raw data. It means analysing data patterns using one or more tools in large batches of data. In many areas, including science and analysis, data mining has applications.
As a data mining application, organisations can learn more about their clients and develop more efficient strategies related to different business functions and, in turn, exploit resources in a more optimal and informative way. This helps organisations get closer to their targets and make better choices. Successful data collection and warehousing as well as information processing include data mining.
Data mining uses advanced mathematical algorithms for segmenting the data and determining the likelihood of future events. Often known as Knowledge Exploration in Data Mining, data mining (KDD).
Main data mining features:
● Automatic forecasts of patterns based on study of trends and behaviours.
● Prediction on the basis of probable results.
● Creation of knowledge that is decision-oriented.
● Focus on broad data sets and research databases.
● Clustering centred on identifying groups of facts not previously recognised and visually registered.
Types of data mining
It is possible to conduct data mining on the following data types:
- Relational database
- Data warehouse
- Data repositories
- Object relational database
- Transactional database
Q2) What do you mean by object oriented DB?
A2) Object oriented database
An object-oriented database is a set of relational databases and object-oriented programming. There are different items that can be saved in relational databases using object-oriented programming languages such as C++, Java, but object-oriented databases are well-suited for those items.
An object-oriented database is structured rather than behaviour, and data rather than logic, around objects. In a relational database, for example, a multimedia record may be a definable data object, as opposed to an alphanumeric value.
Advantages
❏ It is possible to store and retrieve complex data sets quickly and easily.
❏ Object IDs are automatically allocated.
❏ It works well with programming languages directed towards objects.
Disadvantages
- Databases for objects are not as common as RDBMS. Object DB developers are hard to find.
- Not many programming languages support databases of objects.
- As a regular query language, RDBMS has SQL. Databases of artefacts have no norm.
- For non-programmers, object databases are hard to understand.
Q3) What do you mean by distributed database?
A3) Distributed database
Basically, a distributed database is a database that is not restricted to one machine, spread over different locations, i.e. on several computers or over a computer network. On separate sites that do not share physical elements, there is a distributed database structure. This could be needed when different users worldwide need to access a specific database. It needs to be handled in such a way that it looks like one unified database for the users.
A common example of a distributed database system is the below diagram, in which the communication channel is used to communicate with the various locations and each system has its own memory and database.
Distributed Database System Requirements.
The definition of distributed databases was developed with the aim of improving:
Reliability : In a distributed database system, another system may complete the task if one system fails or stops running for some time.
Availability : Reliability in the distributed database system can be achieved even though serious failures occur. To satisfy a client request, another framework is available.
Performance : By spreading databases across multiple locations, efficiency can be achieved. The databases are therefore available for any easy-to-maintain venue.
Types
1. Homogeneous Database:
All the different sites store databases identically in a homogeneous database. For all locations, the operating system, database management system and the data systems used are all the same. They're easy to handle, therefore.
Example: Remember that we use Oracle-9i for DBMS to have three departments. If any improvements in one department were made, the other department will also be modified.
2. Heterogeneous Database:
Different sites may use various schemes and applications in a heterogeneous distributed database that can lead to query processing and transaction problems. A unique site may also be totally unaware of the other pages. A separate operating system, a distinct database programme, may be used by different computers. For the database, they can also use various data models. Therefore, for different sites to interact, translations are needed.
Example: In the following diagram, ODBC and JDBC are used to render different DBMS applications available to each other.
Distributed data storage
There are 2 ways in which information can be stored on various sites. They are:
- Replication
- Fragmentation
Q4) Explain about a web database?
A4) Web database
A web database is basically a database that can be accessed instead of one that has its data stored on a desktop or its connected storage, from a local network or the internet. Used both for technical and personal use, they are hosted on websites and are products of Software as a Service (SaaS), which means that access is made accessible via a web browser.
A relational database is one of the kinds of online databases that you might be more familiar with. Relational databases, through their ability to connect records together, allow you to store data in groups (known as tables). To find information fields stored in the database, it uses indexes and keys which are applied to the data, allowing you to quickly retrieve information.
A web database is a general concept used to manage data online. Without being a database guru or even a technical person, a web database gives you the opportunity to create your own databases / data storage.
Instances: banks, reservations for airlines and rental vehicles, enrollment for university courses, etc.
● The Web is a hypertext-based online information system.
● The bulk of Web documents are HTML-formatted hypertext documents.
● Contain HTML Documents
● Text along with font details, and other guidelines for formatting
● Hypertext connects to other documents that can be connected with the text field.
Web Database Includes:
● Save Money : One of the benefits of software for online databases is that it can save money for your business. Overall, if you don't need to purchase a software programme for your business, this might result in considerable savings. In most cases, for each computer that uses it, corporations pay for a software programme and then pay for a licence fee.
● Flexible use : Another advantage of using an online database programme is that it provides versatility for your business. You're paying just for the amount of room you need. When they are no longer needed, you do not need to think about buying servers as you go or eliminating them.
● Technical support : Another benefit of using a Web-based database programme is that the technical support responsibility can be transferred to someone else. Technical support is included in paying an organisation for access to an online database. If there are issues with the database, you simply contact the organisation and it is addressed by the workers.
● Access : Another big benefit of this type of database is providing access to the database at all times from different locations. For an online database, from any machine, you could potentially access the information in the database. The details are also available 24 hours a day, seven days a week.
● Typically, web database systems come with their own technical support team so that staff from the IT department can concentrate on more pressing business issues.
● It's easy: web databases allow users to update data, so all you have to do is build simple web forms.
● The data is accessible from nearly any computer. Getting stuff stored in a cloud ensures that one machine is not stuck with it. You can theoretically get a hold of the data from just about any compatible computer as long as you are given access.
Q5) What is the object relational database ? write some merit and demerit ?
A5) Object relational database
An object relational database (ORD) is a database management system (DBMS) consisting of a relational database (RDBMS) as well as an object-oriented database (ORD) (OODBMS). In its schemas and the query language used, ORD supports the basic components of any object-oriented database model, such as objects, classes and inheritance.
An object-relational database can also be referred to as a relational object database management system (ORDBMS).
The middleman between relational and object-oriented databases is said to be the object-relational database (ORD) since it incorporates features and characteristics of both models.
In ORD, as the data is stored in a standard database, the basic method is based on RDB, and manipulated and accessed using queries written in a query language like SQL. ORD, however, often demonstrates an object-oriented aspect in that the database, typically for software that is written in an object-oriented programming language, is called an object store.
One of the ORD's objectives is to close the gap between relational and object-oriented database conceptual data modelling techniques such as the entity-relationship diagram (ERD) and object-relational mapping (ORM). It also aims to connect the gap between relational databases and the methods of object-oriented modelling that are commonly used in programming languages such as Java, C# and C++.
Advantages of object relational database
The benefits of the Object Relational model are −
- Inheritance
The Object Relational data model makes it possible for its users to inherit structures, tables, etc. so that their functionality can be expanded. Inherited objects includes new attributes as well as the attributes that were inherited.
- Complex data type
Using existing data types, complex data types can be created. This is useful as complex data types allow better data manipulation in the Object Relational Data Model.
- Extensibility
In the Object relational data model, the functionality of the framework can be expanded. Using complex data types, as well as advanced object-oriented model principles such as inheritance, this can be accomplished.
Disadvantages
At times, the object relational data model can become very complex and difficult to manage since it is a mixture of the object-oriented data model and the relationship data model and uses the features of both of them.
Q6) Write down the data mining technique?
A6) Data mining technique :
- Metadata that is significant and relevant. This method of data mining helps to classify data into various groups.
- Clustering : Analysis of clustering is a technique of data mining to classify data that is like each other. The variations and similarities between the data help to explain this method.
- Regression : The data mining method of defining and evaluating the relationship between variables is regression analysis. It is used, given the existence of other variables, to classify the probability of a particular variable.
- Association rules : This method of data mining helps to find the connection between two or more objects. In the data set, it discovers a secret pattern.
- Outer prediction : This type of technique for data mining refers to the discovery of data items in the dataset that do not conform to an anticipated pattern or actions. In a number of domains, such as intrusion, tracking, fraud or fault detection, etc., this approach may be used. Outer identification is also referred to as Outlier Analysis or mining Outlier.
- Sequential patterns : For some times, this data mining technique helps to discover or recognise similar patterns or trends in transaction data.
- Prediction : A variation of other data mining strategies such as trends, temporal patterns, clustering, grouping, etc. has been used by Prediction. To predict a future occurrence, it analyses previous events or instances in the correct sequence.
Q7) What is the logical database?
A7) Logical database
The Logical Database is a special ABAP programme that retrieves data from different interrelated tables and offers a read-only view of the data. We use a logical database to read data from database tables. A Logical Database is a formal hierarchy of tables. There are Free SQL statements in logical databases that read data from the database. Hence, you do not need to use SQL in your own programmes.
The logical database reads the programme, stores it if necessary in the programme, and then transfers it to the application programme or the function module LDB PROCESS line by line.
- To process Logical Databases, use the GET expression. The LDB consists of logically connected, clustered tables used for data reading and processing.
- The L.D.B preparation of the data records and the reading of the data records in the actual report are carried out with the Put and Get command pair.
- Structure, Choices, Database Software, are the three key elements of LDB.
The process of determining how to organise the attributes of the entities in a given business environment into database structures, such as the tables of a relational database, is logical database design.
The aim of logical database design is to create well organised tables that represent the business environment of the organisation properly. The tables will be able to store non-redundant data about the entities of the organisation and international keys will be stored in the tables so that all the relationships between the entities are supported.
It is an information-gathering, iterative method to construct a rational data template. It contains the steps that follow:
- Define the tables you need based on the information needed by your organisation.
- Determine the table relationships.
- Determine the contents of each table (or its columns).
- Normalize the tables to the third standard form at least.
- Determine the column domain and the primary keys. For each column, a domain is the set of valid values. For instance, all positive numbers should include the domain for the customer number.
Q8) What are the advantages and disadvantages of logical database ?
A8) Advantages of logical database
➢ No need for recovery programming, meaning data collection,
➢ Simple to use the regular user interface, have user input completeness tested.
➢ It includes an easy-to-use screen for selection.
The pre-generated selection screen can be changed according to your needs. To verify if user input is complete, accurate, and plausible, it provides check functions.
➢ It offers a fair choice of data.
➢ It requires central authorization tests for accessing databases.
➢ For all report systems that use the logical database, modifications such as increased output automatically apply.
➢ Compared to standard internal tables, less coding is needed to retrieve data.
➢ Check functions that verify the completeness, correctness, and plausibility of user input.
➢ Good output of read access while maintaining the hierarchical data view defined by the logic of the programme.
Disadvantages of logical database
➢ Quick in the case of smaller tables, but if the table is at the lowest hierarchy level, all tables at the upper level should be read so that output is slower.
➢ If the programme attributes do not indicate a logical database, the GET events will never occur.
➢ There is no ENDGET instruction, so the event-associated code block will terminate with the next event statement (such as another GET or an END-OF-SELECTION).
Q9) Describe data warehouse?
A9) Data Warehouse
In order to provide useful business insights, a Data Warehousing (DW) method is used to collect and handle data from different sources. A data warehouse is usually used for linking and analysing heterogeneous sources of business data. The data warehouse is the cornerstone of the data collection and reporting framework developed for the BI system.
It is a blend of technologies and elements that promotes the strategic use of information. Instead of transaction processing, it is electronic storage of a vast volume of information by an organisation that is intended for question and review. It is a method of converting data into data and making it accessible to users to make a difference in a timely manner.
A Data Warehouse with the following attributes can be described as a data system:
● It is a database, using data from different applications, developed for investigative tasks.
● It serves a relatively limited number of clients with relatively long interactions.
● In order to have a historical view of knowledge, it requires current and historical data.
● Its application is read-intensive.
● It involves a few large tables.
"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in support of management's decisions."
Characteristics of data warehouse
- Subject oriented
- Integrated
- Time variant
- Non volatile
Q10) What are the type and component of a data warehouse?
A10) Types of data warehouse
Three main types of data warehouse are as follow:
- Enterprise Data Warehouse (EDW)
- Operational data store
- Data Mart
- Enterprise data Warehouse
A centralised warehouse is the Enterprise Data Warehouse (EDW). It offers enterprise-wide decision support services. It provides a cohesive plan for data organisation and representation. It also offers the opportunity to identify and provide access to data by topic according to certain divisions.
- Operational data store
Operational Data Store, also known as ODS, includes nothing but data storage when neither Data Warehouse nor OLTP systems support the reporting needs of organisations. The data warehouse in ODS is updated in real time. It is also commonly preferred for routine tasks, such as the preservation of employee information.
- Data mart
A data mart is a subset of a warehouse of data. It has been developed specifically for a specific line of operation, such as sales, insurance, sales or finance. Data can be obtained directly from sources in an independent data sector.
Components of data warehouse
Four components of DW
- Load manger : The Front Part is also called the Load Manager. It conducts all the tasks related to the extraction and loading of data into the warehouse. These tasks include improvements in the processing of data for incorporation into the Data Warehouse.
2. Warehouse manager : The warehouse manager manages activities related to warehouse data processing. It performs operations such as data analysis to ensure consistency, index and view construction, denormalization and aggregation generation, source data transformation and merging, and data archiving and baking-up.
3. Query manager : The query manager is also known as a part of the backend. It performs all the operations related to user query management. The operations of the components of this Data Warehouse are direct queries to the necessary tables for scheduling query execution.
4. End user access tool : This is categorized into five different groups like 1. Data Reporting 2. Query Tools 3. Application development tools 4. EIS tools, 5. OLAP tools and data mining tools.