undefined | unit 5 databases on the web and semi structured data

Back to Study material

ADS

Unit - 5

Databases on the Web and Semi-structured data

Q1) Explain XML with its features?

A1)

Xml means Extensible Mark-up Language is a mark-up language.

XML is designed to store and transport data.

Xml was released in late 90’s and created to provide easy to use and store self-describing data.

XML became a W3C Recommendation on February 10, 1998.

XML is not a replacement for HTML.

XML is designed to be self-descriptive.

XML is designed to carry data, not to display data.

XML tags are not predefined. You must define your own tags.

XML is platform independent and language independent.

Need of xml

XML may be a Platform Independent and Language Independent. Therefore the most advantage of xml is that you simply can use it to require data from a program like Microsoft SQL, convert it into XML then share that XML with other programs and platforms. You’ll communicate between two platforms which are generally very difficult.

XML truly powerful because it’s international acceptance. Multiple organizations use XML interfaces for databases, programming, office application, mobile phones and other areas because of its platform independent feature.

Features and Advantages of XML

XML is mostly used in the time of web development. It’s also used to simplify data storage and data sharing.

The main features or advantages of XML are given below.

1) XML separates data from HTML

If you would like to display dynamic data in your HTML document, it'll take more of work to edit the HTML whenever the data changes.

With XML, data are often stored in separate XML files. This manner you'll concentrate on using HTML/CSS for display and layout, and make certain that changes within the underlying data won't require any changes to the HTML.

With some lines of JavaScript code, you'll read an external XML file and update the data content of your website.

2) XML simplifies data sharing

In the world, computer systems and databases contain data in incompatible formats.

XML data is stored in plain text format. This provides a software and hardware independent way of storing data.

This makes it much easier to form data which can be shared by different applications.

3) XML simplifies data transport

One of the foremost time-consuming challenges for developers is to exchange data between incompatible systems over the web.

Exchanging data as XML overcome this complexity and the data is read by different incompatible applications.

4) XML simplifies Platform change

Upgrading to new systems (hardware or software platforms), is usually time consuming. Large amounts of information must be converted and incompatible data is usually lost.

XML data is stored in text format. Therefore it provides easy way to expand or upgrade to new operating systems, new applications, or new browsers, without losing data.

5) XML increases data availability

Multiple applications access your data in HTML pages and from XML data sources also.

Using XML, your data is offered to some forms of "reading machines" as handheld computers, voice machines, and news feeds which are available for blind people.

6) Create new internet languages using XML

Multiple new Internet languages are created with XML. Following are some examples:

XHTML

WSDL to display available web services

WAP and WML as markup languages for handheld devices

RSS languages for news feeds

RDF and OWL for describing resources and ontology

SMIL for shows multimedia for the web

XML Example

XML documents create a hierarchical data structure seems like a tree so it's referred to as XML Tree that starts at "the root" and branches to "the leaves".

Example of XML: Books

books.xml

<bookcategory="COOKING">

<titlelang="en">Everyday Italian</title>

<author>Giada De Laurentiis</author>

</book>

<title lang="en">Harry Potter</title>

<author>J K. Rowling</author>

</book>

<title lang="en">Learning XML</title>

</book>

</bookstore>

Q2) Define structure of XML Data?

A2)

Structure of XML Data

The fundamental construct in an XML document is the element. An element is a pair of matching start-tag and end-tags, and all the text that appears between them. XML documents must have a single root element that encompasses all other elements in the document. In the example in Figure the <bank> element forms the root element. Further, elements in an XML document must nest properly. For instance,

is properly nested, whereas

is not properly nested. While proper nesting is an intuitive property, we may define it more formally. Text is said to appear within the context of an element if it appears between the start-tag and end-tag of that element. Tags are properly nested if every start-tag features a unique matching end-tag that's within the context of an equivalent parent element.

The text could also be mixed with the sub elements of an element, as in Figure like several other features of XML, this freedom makes more sense during a document processing context than during a data-processing context, and isn't particularly useful for representing more structured data like database content in XML.

The ability to nest elements within other elements provides an alternate way to represent information. Figure shows a representation of the bank information from Figure, but with account elements nested within customer elements. The nested representation makes it easy to search out all accounts of a customer, although it might store account elements redundantly if they're owned by multiple customers.

Nested representations are widely used in XML data interchange applications to avoid joins. As an example, a shipping application would store the complete address of sender and receiver redundantly on a shipping document related to each shipment, whereas a normalized representation may require a join of shipping records with a company-address reference to get address information.

Additionally to elements, XML specifies the notion of an attribute. as an example , the sort of an account can represented as an attribute, as in Figure The attributes of a component appear as name=value pairs before the closing “>” of a tag. Attributes are strings, and don't contain markup. Therefore attributes can appear only once during a given tag, unlike sub elements, which can be repeated.

. . .

This account is seldom used any more.

<account-number> A-102 </account-number>

<branch-name> Perryridge </branch-name>

</account>

. . .

Figure 5.2(I) Mixture of text with subelements.

<bank-1>

<customer-name> Johnson </customer-name>

<customer-street> Alma </customer-street>

<customer-city> Palo Alto </customer-city>

<account-number> A-101 </account-number>

<branch-name> Downtown </branch-name>

</account>

<account-number> A-201 </account-number>

<branch-name> Brighton </branch-name>

</account>

</customer>

<customer-name> Hayes </customer-name>

<customer-street> Main </customer-street>

<customer-city> Harrison </customer-city>

<account-number> A-102 </account-number>

<branch-name> Perryridge </branch-name>

</account>

</customer>

</bank-1>

Figure 5.2(II) Nested XML representation of bank information.

In a document construction context, the distinction between sub element and attribute is vital an attribute is implicitly text that doesn't appear within the printed or displayed document. However, in database and data exchange applications of XML, this distinction is a smaller amount relevant, and therefore the choice of representing data as an attribute or a sub element is usually arbitrary.

An element of the form<element></element>, which contains no sub elements or text, are often abbreviated as<element/>; abbreviated elements may, however, contain attributes. Since XML documents are designed to be exchanged between applications, a namespace mechanism has been introduced to permit organizations to specify globally unique names to be used as element tags in documents. The thought of a namespace is to prepend each tag or attribute with a universal resource identifier. Thus, for instance, if First Bank wanted to make sure that XML documents

. . .

<account-number> A-102 </account-number>

<branch-name> Perryridge </branch-name>

</account>

. . .

Figure 5.2(III) Use of attributes.

It created wouldn't duplicate tags used by any business partner’s XML documents, it can prepend a unique identifier with a colon to every tag name. The bank may use a web URL like a unique identifier. Using long unique identifiers in every tag would be rather inconvenient, therefore the namespace standard provides how to define an abbreviation for identifiers.

In Figure, the root element (bank) contains attribute xmlns:FB, which declares that FB is defined as abbreviation for the URL given as above. The abbreviation can then be used in various element tags, as illustrated within the figure. A document can have quite one namespace, declared as a part of the root element. Different elements can then be related to different namespaces.

A default namespace are often defined, by using the attribute xmlns rather than xmlns:FB within the root element. Elements without a particular namespace prefix would then belong to the default namespace. Sometimes we'd like to store values containing tags without having the tags interpreted as XML tags. In order that we will do so, XML allows this construct:

<![CDATA[<account>···</account>]]>

Because it is enclosed within CDATA, the text <account> is treated as normal text data, not as a tag. The term CDATA stands for character data.

<bank xmlns:FB=“http://www.FirstBank.com”>
. . .
<FB:branch>
<FB:branchname> Downtown </FB:branchname>
<FB:branchcity> Brooklyn </FB:branchcity>
</FB:branch>
. . .
</bank>

Figure 5.2(IV) unique tag names through the use of namespaces.

Q3) Write a short note on DTD?

A3)

Document Type Definition

The document type definition (DTD) is an optional a part of an XML document. The purpose of a DTD is like schema: to constrain and type of the information present within the document. However, the DTD doesn't actually constrain types within the sense of basic types like integer or string. Instead, it only constrains the looks of sub elements and attributes within an element. The DTD is primarily a listing of rules for what pattern of sub elements present within an element. Figure shows a part of an example DTD for a bank information document in that the XML document conforms to the present DTD.

Each declaration is within the sort of a regular expression for the sub elements of an element. Thus, within the DTD a bank element consists of 1 or more account, customer, or depositor elements; the | operator specifies “or” while the + operator specifies “one or more.” The * operator is used to specify “zero or more,” while the? Operator is used to specify an optional element that's 0 or 1.

<!DOCTYPE bank [

<!ELEMENT bank ( (account—customer—depositor)+)>

<!ELEMENT account ( account-number branch-name balance )>

<!ELEMENT customer ( customer-name customer-street customer-city )>

<!ELEMENT depositor ( customer-name account-number )>

<!ELEMENT account-number ( #PCDATA )>

<!ELEMENT branch-name ( #PCDATA )>

<!ELEMENT balance( #PCDATA )>

<!ELEMENT customer-name( #PCDATA )>

<!ELEMENT customer-street( #PCDATA )>

<!ELEMENT customer-city( #PCDATA )>

] >

Figure 5.3(I)Example of a DTD

The account element has sub elements like account-number, branchname and balance. Similarly, customer and depositor have the attributes in their schema defined as sub elements. Finally, the elements account-number, branch-name, balance, customer-name, customer- street, and customer-city are all declared as type of #PCDATA.

The keyword #PCDATA shows text data; it derives name from “parsed character data.” Two other special type declarations are empty, which represents the element has no contents, which says that there's no constraint on the sub elements of the element; that's , any elements, even those not mentioned within the DTD, can occur as sub elements of the element. The absence of a declaration for an element is like explicitly declaring the type as any.

The allowable attributes for every element declared within the DTD. Unlike sub elements, no order is imposed on attributes. Attributes may specified to be of type CDATA, ID, IDREF, or IDREFS; the type CDATA simply says that the attribute contains character data, while the other three are not simple. As an example, the following line from a DTD specifies that element account has an attribute of type acct-type, with default value checking.

<!ATTLIST account acct-type CDATA “checking” >

Attributes must have a kind declaration and a default declaration. The default declaration can contains a default value for the attribute or #REQUIRED, meaning that a value must be specified for the attribute in each element, or #IMPLIED, that means no default value has been provided.

If an attribute features a default value, for each element that doesn't specify a value for the attribute, the default value is filled in automatically when the XML document is read An attribute of type ID provides a unique identifier for the element; a value present in an ID attribute of an element must not occur in the other element within the same document. At the most one attribute of an element is permitted to be of type ID.

<!DOCTYPE bank-2 [

<!ELEMENT account ( branch, balance )>

<!ATTLIST account

account-number ID #REQUIRED

owners IDREFS #REQUIRED >

<!ELEMENT customer ( customer-name, customer-street, customer-city )>

<!ATTLIST customer

customer-id ID #REQUIRED

accounts IDREFS #REQUIRED >

··· declarations for branch, balance, customer-name,

customer-street and customer-city ···

] >

Figure 5.3(II) DTD with ID and IDREF attribute types

An attribute of type IDREF may be a respect to an element; the attribute contain a value that appears within the ID attribute of some element within the document. The type IDREFS allows a listing of references, separated by spaces. Figure shows an example DTD during which customer account relationships are represented by ID and IDREFS attributes, rather than depositor records.

The account elements use account-number as their identifier attribute; to do so, account-number has been made an attribute of account rather than a sub element. The customer elements have a new identifier attribute called customer-id. Additionally, each customer element contains an attribute accounts, of type IDREFS, which may be a list of identifiers of accounts that are owned by the customer. Each account element has an attribute owners, of type IDREFS, which may be a list of owners of the account.

We use a special set of accounts and customers from earlier example, so for example the IDREFS feature better. The ID and IDREF attributes serve a similar role as reference in object-oriented and object-relational databases, supports the development of complex data relationships.

<bank-2>

<branch-name> Downtown </branch-name>

</account>

<branch-name> Perryridge </branch-name>

</account>

<customer-name>Joe</customer-name>

<customer-street> Monroe </customer-street>

<customer-city> Madison </customer-city>

</customer>

<customer-name>Lisa</customer-name>

<customer-street> Mountain </customer-street>

<customer-city> Murray Hill </customer-city>

</customer>

<customer-name>Mary</customer-name>

<customer-street> Erin </customer-street>

<customer-city> Newark </customer-city>

</customer>

</bank-2>

Figure 5.3(III) XML data with ID and IDREF attributes.

Document type definitions are strongly connected to the document formatting heritage of XML. Due to this, they're unsuitable in some ways for serving because the type structure of XML for processing applications. Nevertheless, a tremendous number of data exchange formats are being defined in terms of DTDs, since they were a part of the initial standard. Here are a number of the restrictions of DTDs as a schema mechanism.

• Individual text elements and attributes can't be further typed. As an example, the element balance can't be constrained to be a positive number. The lack of such constraints is problematic for data processing and exchange applications, which must then contain code to verify the kinds of elements and attributes.

• It is difficult to use the DTD mechanism to specify unordered sets of sub elements. Order is seldom important for data exchange. While the combination of alternation (the | operation) and therefore the * operation as in Figure permits the specification of unordered collections of tags, it's far more difficult to specify that every tag may only appear once.

• There may be a lack of typing in IDs and IDREFs. Thus, there's no way to specify the type of element to which an IDREF or IDREFS attribute should refer. As a result, the DTD in Figure doesn't prevent the “owners” attribute of an account element from pertaining to other accounts, although this makes no sense.

Q4) Define XML Schema?

A4)

XML Schema

An effort many of those DTD deficiencies resulted during a more sophisticated schema language, XMLSchema. We describe here an example of XMLSchema, and list some areas during which it improves DTDs, without giving full details of XMLSchema’s syntax.

Figure shows how the DTD in Figure are often represented by XMLSchema. The first element is that the root element bank, whose type is declared later. The instance then defines the kinds of elements account, customer, and depositor. Observe the use of types xsd:string and xsd:decimal to constrain the kinds of data elements. Finally the instance defines the type BankType as containing zero or more occurrences of every of account, customer and depositor.

The default for both minimum and maximum occurrences is 1, so these need to be explicitly specified to permit zero or more accounts, deposits, and customers. Among the advantages that XMLSchema offers over DTDs are these:

• It allows user-defined types to be created.

• It allows the text that appears in elements to be constrained to specific types, like numeric types in specific formats or maybe more complicated types like lists or union.

<xsd:schema xmlns:xsd=“http://www.w3.org/2001/XMLSchema”>

<xsd:element name=“bank” type=“BankType” />

<xsd:element name=“account”>

<xsd:complexType>

<xsd:sequence>

<xsd:element name=“account-number” type=“xsd:string”/>

<xsd:element name=“branch-name” type=“xsd:string”/>

<xsd:element name=“balance” type=“xsd:decimal”/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:element name=“customer”>

<xsd:element name=“customer-number” type=“xsd:string”/>

<xsd:element name=“customer-street” type=“xsd:string”/>

<xsd:element name=“customer-city” type=“xsd:string”/>

</xsd:element>

<xsd:element name=“depositor”>

<xsd:complexType>

<xsd:sequence>

<xsd:element name=“customer-name” type=“xsd:string”/>

<xsd:element name=“account-number” type=“xsd:string”/>

</xsd:sequence>

</xsd:complexType>

</xsd:element>

<xsd:complexType name=“BankType”>

<xsd:sequence>

<xsd:element ref=“account” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“customer” minOccurs=“0” maxOccurs=“unbounded”/>

<xsd:element ref=“depositor” minOccurs=“0” maxOccurs=“unbounded”/>

</xsd:sequence>

</xsd:complexType>

</xsd:schema>

XMLSchema version of DTD

• It allows types to be restricted to make specialized types, as an example by specifying minimum and maximum values.

• It allows complex types to be extended by using a type of inheritance.

• It may be a superset of DTDs.

• It allows uniqueness and foreign key constraints.

• It is integrated with namespaces to permit different parts of a document to conform to different schema.

• It is itself specified by XML syntax, as Figure shows.

However, the price paid for these features is that XMLSchema is significantly more complicated than DTDs.

Q5) Write a note on XPATH and XSLT?

A5)

XPath

XPath addresses parts of an XML document by means of path expressions. The language are often viewed as an extension of the easy path expressions in object-oriented and object-relational databases. A path expression in XPath may be a sequence of location steps separated by “/”. The results of a path expression may be a set of values. As an example, on the document in Figure, the XPath expression /bank-2/customer/name would return these elements:

The expression /bank-2/customer/name/text() would return an equivalent names, but without the enclosing tags. like a directory hierarchy, the initial ’/’ indicates the root of the document. Path expressions are evaluated from left to right. As a path expression is evaluated, the results of the path at any point consists of a group of nodes from the document.

When an element name, like customer, appears before subsequent ’/’, it refers to all or any elements of the required name that are children of elements within the current element set. Since multiple children can have an equivalent name, the number of nodes within the node set can increase or decrease with each step. Attribute values can also be accessed, using the “@” symbol. as an example , /bank-2/account/@account-number returns a group of all values of account-number attributes of account elements.

XPath supports a number of other features:

Selection predicates follow any step in a path, and are contained in square brackets. For example, /bank-2/account[balance > 400] returns account elements with a balance value greater than 400, while /bank-2/account[balance > 400]/@account-number returns the account numbers of those accounts.

We can test the existence of a subelement by listing it without any comparison operation; for example, if we removed just “> 400” from the above, the expression return account numbers of all accounts that have a balance subelement, instead of its value.

• XPath provides several functions which will be used as a part of predicates, including testing the position of the present node within the sibling order and counting the amount of nodes matched. for instance , the path expression /bank-2/account/[customer/count()> 2] returns accounts with more than 2 customers. Boolean connectives and and or are often used in predicates, while the function not (. . .) are often used for negation.

• The function id(“foo”) returns the node (if any) with an attribute of type ID and value “foo”. The function id can even be applied on sets of references, or maybe strings containing multiple references separated by blanks, like IDREFS. as an example , the path /bank-2/account/id (@owner) returns all customers mentioned from the owner’s attribute of account elements.

• The | operator allows expression results to be unioned. for instance , if the DTD of bank-2 also contained elements for loans, with attribute borrower of type IDREFS identifying loan borrower, the expression/bank-2/account/id(@owner) | /bank-2/loan/id(@borrower) gives customers with either accounts or loans. Therefore the | operator can't be nested inside other operators.

• An XPath expression can skip multiple levels of nodes by using “//”. As an example, the expression /bank-2//name finds any name element anywhere under the /bank-2 element, no matter the element during which it's contained. This example show the ability to search out required data without full knowledge of the schema.

• Each step within the path needn't select from the children of the nodes within the current node set. In fact, this is often only one of several directions along which a step within the path may proceed, like parents, siblings, ancestors and descendants. We omit details, but note that “//”, described above, may be a short form for specifying “all descendants,” while “...” specifies the parent.

XSLT

A style sheet may be a representation of formatting options for a document, usually stored outside the document itself, in order that formatting is separate from content. For instance, a style sheet for HTML might specify the font to be used on all headers, and thus replace more number of font declarations within the HTML page.

The XML Stylesheet Language (XSL) was designed for generating HTML from XML, and is a logical extension of HTML style sheets. The language includes a general-purpose transformation mechanism, called XSL Transformations (XSLT), which may be used to transform one XML document into another XML document, or to other formats like HTML.1 XSLT transformations are quite powerful, and actually XSLT can even act as a query language.

<xsl:template match=“/bank-2/customer”>

<xsl:value-of select=“customer-name”/>

</customer>

</xsl:template>

<xsl:template match=“.”/>

Figure 5.4(I)shows using XSLT to wrap results in new XML elements.

XSLT transformations are represented as a series of recursive rules, called templates. Templates support the selection of nodes in an XML tree by an XPath expression. Therefore, templates generate new XML content, in order that selection and content generation are often mixed in natural and powerful ways. While XSLT are used as a query language, its syntax and semantics are not same as SQL.

A simple template for XSLT consists of a match part and a select part. Consider this XSLT code:

<xsl:template match=“/bank-2/customer”>

<xsl:value-of select=“customer-name”/>

</xsl:template>

<xsl:template match=“.”/>

The xsl:template match statement contains an XPath expression that selects one or more nodes. The first template matches customer elements which occur as children of the bank-2 root element. The xsl:value-of statement enclosed in the match statement outputs values from the nodes in the result of the XPath expression. The first template outputs the value of the customer-name subelement.

The second template matches all nodes. This is required because the default behavior of XSLT on subtrees of the input document not match any template is to copy the subtrees to the output document. XSLT copies any tag which is not in the xsl namespace unchanged to the output. Following example shows how to use this feature to form every customer name as a subelement of a “<customer>” element, by adding the xsl:value-of statement among <customer> and </customer>.

<xsl:template match=“/bank”>

<xsl:apply-templates/>

</customers>

</xsl:template>

<xsl:template match=“/customer”>

<xsl:value-of select=“customer-name”/>

</customer>

</xsl:template>

<xsl:template match=“.”/>

Figure 5.4(II)Applying rules recursively.

Structural recursion may be a main a part of XSLT. Recall that elements and subelements naturally form a tree structure. the fundamental idea of structural recursion is: When a template matches an element within the tree structure, XSLT use structural recursion to use template rules recursively on subtrees, rather than just outputting a value. It applies rules recursively by the xsl:apply-templates directive, which present in other templates.

For example, the results of previous query is placed in a <customers> element with the help of a rule using xsl:apply-templates. The new rule matches the outer “bank” tag, and constructs a result document by applying all other templates to the subtrees in the bank element, but wrapping the results in the given <customers></customers> element.

Without recursion performed by the <xsl:apply-templates/> clause, the template output <customers></customers>, and then apply the other templates on the subelements. The structural recursion is very difficult to form well-formed XML documents, therefore XML documents need a single top-level element with all other elements in the document.

XSLT supports a feature called keys, which allows lookup of elements by using values of subelements or attributes; the goals are same as id() function in XPath, but allows attributes other than the ID attributes to be used. Keys are defined by an xsl:key directive contains three parts, for example:

<xsl:key name=“acctno” match=“account” use=“account-number”/>

The name attribute is employed to define different keys. The match attribute represents which nodes the key applies to. Finally, the use attribute define the expression to be used because the value of the key. Therein the expression needn't be unique to an element; because more than one element have an equivalent expression value. Within the example, the key named acctno define that the account-number subelement of account should be used as a key for that account.

Keys are often utilized in templates as a part of any pattern via the key function. This function takes the name of the key and a value, and returns the set of nodes that match that value. Therefore, the XML node for account “A-401” referenced as key (“acctno”, “A-401”).

<xsl:key name=“acctno” match=“account”use=“account-number”/>

<xsl:key name=“custno” match=“customer” use=“customer-name”/>

<xsl:template match=“depositor”>

<cust-acct>

<xsl:value-of select=key(“custno”, “customer-name”)/>

<xsl:value-of select=key(“acctno”, “account-number”)/>

</cust-acct>

</xsl:template>

<xsl:template match=“.”/>

Figure 5.4(III) Joins in XSLT.

With the assistance of keys we will implement some forms of joins. The code within the figure are often applied to XML data within the format, the key function joins the depositor elements with matching customer and account elements. The results of the query consists of pairs of customer and account elements enclosed within cust-acct elements. XSLT allows nodes to be sorted. A simple example shows how xsl:sort would be used in our style sheet to return customer elements sorted by name:

<xsl:template match=“/bank”>

<xsl:apply-templates select=“customer”>

<xsl:sort select=“customer-name”/>

</xsl:apply-templates>

</xsl:template>

<xsl:template match=“customer”>

<xsl:value-of select=“customer-name”/>

<xsl:value-of select=“customer-street”/>

<xsl:value-of select=“customer-city”/>

</customer>

</xsl:template>

<xsl:template match=“.”/>

In above example the xsl:apply-template includes a select attribute, which constrains it to be applied only on customer subelements. The xsl:sort directive within the xsl:apply-template element causes nodes to be sorted before they're defined by subsequent set of templates. Options exist to allow sorting on multiple subelements/attributes, by numeric value, and in descending order.

Q6) Explain storage of XML Data?

A6)

Storage of XML Data

Many applications need storage of XML data. One method to store XML data is to store it as documents in a file system and second method is to build a special-purpose database to store XML data. Another way is to convert the XML data to a relational representation and store it in a relational database.

A. Non-relational Data Stores

There are multiple ways for storing XML data in non-relational data-storage systems:

• Store in flat files. XML is primarily a file format, a natural storage method is simply a flat file. This approach has multiple drawbacks of using file systems as the basis for database applications. In particular, it lacks data isolation, atomicity, concurrent access, and security. Therefore the more availability of XML tools work on file data makes it easy to access and query XML data stored in files. This storage format can be sufficient for some applications.

• Create an XML database. XML databases are databases that use XML as the basic data model. Early XML databases implemented the Document Object Model on a C++ based object-oriented database. This allows object-oriented database infrastructure to be used again, while providing a standard XML interface. The addition of XQuery or other XML query languages provides declarative querying. Other implementations have built the entire XML storage and querying infrastructure on top of a storage manager that provides transactional support.

Although several databases designed specifically to store XML data have been built, building a full-featured database system from ground up is a very complex task. Such a database must support not only XML data storage and querying but also other database features such as transactions, security, support for data access from clients, and a variety of administration facilities. It makes sense to instead use an existing database system to provide these facilities and implement XML data storage and querying either on top of the relational abstraction, or as a layer parallel to the relational abstraction.

Q7) List and explain XML Applications?

A7)

There are several applications of XML for storing and communicating (exchanging) data and for accessing Web services (information resources).

1. Storing Data with Complex Structure

Many applications need to store data that are structured, but aren't easily modeled as relations. As an example, user preferences that has to be stored by an application like a browser. There are usually an outsized number of fields, like home page, security settings, language settings, and display settings that has to be recorded. a number of the fields are multivalued, for example, a list of trusted sites, or even ordered lists, for example, a list of bookmarks.

Applications traditionally used some variety of textual representation to store such data. Today, a majority of such applications choose to store such configuration information in XML format. The ad hoc textual representations used earlier require effort to design and effort to make parsers which will read the file and convert the data into a form that a program can use.

The XML representation avoids both these steps. XML-based representations are now widely used for storing documents, spreadsheet data and other data that are a part of office application packages. The Open Document Format (ODF), supported by the Open Office software suite also as other office suites, and therefore the Office Open XML format, supported by the Microsoft Office suite, are document representation standards supported XML.

They are the 2 most generally used formats for editable document representation. XML is additionally used to represent data with complex structure that has to be exchanged between different parts of an application. for instance , a database system may represent a question execution plan by using XML. this enables one a part of the system to get the query execution plan and another part to display it, without employing a shared data structure. for instance , the data could also be generated at a server system and sent to a client system where the data are displayed.

2. Standardized Data Exchange Formats

XML-based standards for representation of data are developed for a range of specialized applications, starting from business applications like banking and shipping to scientific applications like chemistry and biology . Some examples:

• The chemical industry needs information about chemicals, like their molecular structure, and a range of important properties, like boiling and melting points, calorific values, and solubility in various solvents. ChemML may be a standard for representing such information.

• In shipping, carriers of goods and customs and tax officials need shipment records containing detailed information about the products being shipped, from whom and to where they were sent, to whom and to where they're being shipped, the price of the products, and so on.

• a web marketplace during which business can purchase and sell goods a so-called business-to-business (B2B) market needs information like product catalogs, including detailed product descriptions and price information, product inventories, quotes for a proposed sale, and buy orders.

For instance, the RosettaNet standards for e-business applications define XML schemas and semantics for representing data also as standards for message exchange. Using normalized relational schemas to model such complex data requirements would end in a large number of relations that don't correspond on to the objects that are being modeled. The relations would often have large numbers of attributes; explicit representation of attribute/element names beside values in XML helps avoid confusion between attributes.

Nested element representations help reduce the number of relations that has to be represented, also because the number of joins required to get required information, at the possible cost of redundancy. for example, in our university example, listing departments with course elements nested within department elements.

3. Web Services

Applications often require data from outside of the organization, or from another department within the same organization that uses a special database. In many such situations, the outside organization or department isn't willing to permit direct access to its database using SQL, but is willing to supply limited types of information through predefined interfaces.

When the information is to be used directly by a person's , organizations provide Web-based forms, where users can input values and obtain back desired information in HTML form. However, there are many applications where such information must be accessed by software programs, instead of by end users. Providing the results of a query in XML form may be a clear requirement. Additionally, it is sensible to specify the input values to the query also in XML format. In effect, the provider of the knowledge defines procedures whose input and output are both in XML format.

The HTTP protocol is used to communicate the input and output information, since it's widely used and may go through firewalls that institutions use to stay out unwanted traffic from the net. the simple Object Access Protocol (SOAP) defines a standard for invoking procedures, using XML for representing the procedure input and output. SOAP defines a standard XML schema for representing the name of the procedure, and result status indicators like failure/error indicators. The procedure parameters and results are application-dependent XML data embedded within the SOAP XML headers.

Sign Up