What is a Database?
In an organisation, on an everyday basis, a lot of data is generated. If that data is not managed properly, a lot of relevant data related to the organisation might get lost. Data can be generated internally, such as employee data, or can be collected externally, such as customer information or sales data. Database management helps an organisation in effectively managing its data resources.
For an organisation, private or public, whether it is a banking transaction or a rail/air ticket booking application, database plays an important. Till a few years back, databases were managed in a traditional format, where data was usually stored in textual or numeric form. Now, we can store audio, video, and pictures in databases.
Table of Contents
- 1 What is a Database?
- 2 Meaning of Database
- 3 Types of Databases
- 4 Database Models
These databases are generally referred to as multimedia databases. In addition, we can even store maps, weather data, and satellite information in what is called Geographical Information Systems (GISs), which is a combination of cartography, statistical analysis, and database technology.
This chapter starts by introducing the concept of a database and its usage in an organisation. Next, it discusses the types of databases. Thereafter, the chapter explains the various functions and objectives of database management. Further, it discusses the major components of database management. Towards the end, various database models have been discussed in the chapter.
Meaning of Database
Before we study database and database management, we need to look at the meaning of some of the terms that are used frequently while discussing database management i.e., data and information.
Data, in the simplest term, is a collection of raw facts and figures. The term “data” originates from the plural of the Latin word ‘datum’, which means something given. It is the key ingredient for any database system. In fact, data is necessary for a database system to produce any type of information.
Data represent facts, observations, assumptions, and occurrences. To be more specific, data represents facts, observations, assumptions, and occurrences regarding people, processes, functions, and events related to an organization’s internal and external environment.
Data has to be in a structured form, i.e., in a form that some relevant information can be derived by processing it. The information generated by the processing of data helps in supporting business processes in terms of decision-making and improved efficiency.
Data can be of various types from the perspective of an organization. Mainly, there are five types of data, which are depicted in Figure:
The types of data are discussed as follows:
- Text: It refers to data that is in the form of alphabets and numbers, for example, employee identification (ID) number, which is a unique number of identification for employees working in an organization.
- Graphic: It refers to pictorial or any other graphical form of data, for example, a picture of employees working in an organization.
- Audio: It refers to data that is in the form of sound, for example, a recorded audio message from a CEO for the employees.
- Video: It refers to data that is in the form of a combination of picture and sound, for example, a video of the production floor on a particular day.
- Pre-specified Information: It refers to data in any of the above forms, which have been used for a previous process in an information system, for example, details of employee attendance.
In an organization, data may be generated from multiple sources.
However, it must be evaluated to belong to one of the types of data and must be properly validated before it is analyzed for information. Data can be from both internal and external sources.
The following are sources that can be categorized as internal or external:
- Data From Internal Sources: This category includes data for database systems that exist within an organization. Since the sources of such data exist within an organization, they are comparatively easier to collect.
Internal data can be sourced from the following divisions of an organization:
- Accounting and financial details, for example, financial planning of the current year
- Sales reports, for example, daily sales details
- Organizational policy and procedures, for example, a list of working and non-working days
- Business events, for example, minutes of board meeting
- Information from an intranet, for example, an opinion poll about an anticipated change in an organization
- Previously obtained data, for example, monthly income data while calculating profits
- Research and development reports, for example, training needs
- Accounting and financial details, for example, financial planning of the current year
- Data From External Sources: This includes data that is sourced from outside the organization. There are various sources outside an organization that prevail in the business organization and have a significant impact on organizational functioning. These sources provide data that is very significant for the decision-making process of an organization.
External data can be obtained from the following sources outside of an organization:
- Supplier details providing data about raw materials supplied by them
- Competitor details provide data about the competitive environment prevailing in the market
- Customer details provide data about products procured by them and details about consumer behavior
- Market reports, including data about market conditions
Data is viewed from an organizational point of view in the following two ways as depicted in the figure: (insert diagram)
- Supplier details providing data about raw materials supplied by them
- Logical View: This represents an external view of data, i.e., the view presented to a user of data, who is not concerned by how the data is stored internally in the storage. The logical view of data refers to a data format that is meaningful to the user of the data and also to the software programs that process it.
This view allows the user to understand the data from his/her perspective. There can be a number of logical views depending on the user of the system. Every user can view data differently based on his/her need. Different departments of an organization can read the same data and draw different conclusions.
- Physical View: This represents an internal view of data, i.e., how the data is stored internally in the storage. The physical view of data represents the physical structure of data, which signifies where and how the data has been stored in the system.
This view is used by an internal system of computers and by system experts to make efficient usage of storage in the system. There is only one physical view, and in most cases, it does not change. It is of significance to database administrators who manage the database.
For example, ABC University has a huge database of colleges affiliated to it, including undergraduate colleges. The data relating to all the colleges and students enrolled in the university is stored centrally at one location in the university server.
All the data in the database is organized in such a way that the contents can be accessed, managed, and updated for smooth functioning of the university. This centralized storage of data is referred as the physical view of data.
From the database, we can track all the undergraduate colleges that conduct examinations under this university, the departments in each college, the students enrolled in each college, etc. The different conclusions drawn from the same database refer to the logical view of data.
Processed and interpreted data is called information, i.e., data has been evaluated and worked upon, and some conclusions have been drawn from it. Information is created when data is organized into charts, summaries, averages, and ranked lists, which help an organization make decisions.
Decisions based on this acquired information are referred to as “informed decisions”. Information is organized, structured, and derived by processing data collected from various sources. Information has a specific meaning in context with from where the data has been derived.
Collection of information contributes to knowledge. Information can be directly used for decision-making in an organization, for example, the pattern of business transactions in a day.
Information for one purpose can be used as data for another purpose. For example, when you purchase something from a departmental store, a number of data items are put together, such as the name of items purchased, the number of items purchased, price, tax, and the total amount paid. Separately, these are all data items, but collectively, these items represent information about a business transaction from an organizational point of view.
Now, after studying the basic terms of database management, let us discuss databases in the following section.
Database refers to the organized collection of data in a logical and integrated manner. The data in a database is related in a meaningful way. This collection of data forms a basis for data storage, and the data can be accessed for information processing. Thus, the database is organized in a way that it can be easily accessed, managed, and updated with recently collected data. A database provides data for many business applications as and when required.
Examples of databases are as follows:
- Train booking database
- Employee details database
- Sales database
- Airlines booking database
- Cricket database
The main features of a database are as follows:
- It should be well organized.
- It should be related.
- It should be easily accessible/retrievable.
- It should provide an easy data-processing base.
Figure depicts a centralized database providing data to different computers:
A database can also be defined as a repository of data that is of interest to an enterprise and using which one can retrieve or store data efficiently. It reduces the redundancy in terms of storage space, which, in turn, reduces data inconsistency.
It is a storage that implements data integrity, i.e., checks the correctness of data, thereby enhancing the efficiency of the database system. A database system hides the complexity of its internal storage structure from its user by providing a user-friendly interface. It also supports a multiuser environment, wherein multiple users can interact with the interface of a database through simple query language.
Types of Databases
In a real-world scenario, there are different types of databases that are used to store organisational data, depending on different components such as type of data to be stored, organisational environment, and cost of implementation. Based on these criteria, there are some emerging databases that are in demand nowadays.
Figure shows the types of databases:
- Distributed Database
- Object-oriented Database
- Temporal Database
- Multimedia Database
- Deductive Database
- Semantic Database
- Mobile Database
Now, let us discuss these databases in the following section:
It is a database that is stored in different places in a physical network, i.e., it is distributed over a network. It is stored in servers, which are placed at different locations in a network. It can also exist in a distributed manner in different locations of an intranet.
Usually, in this case, databases are replicated and stored at different locations. This leads to the disadvantage that any changes in the master database are not reflected in the distributed copies of the database. Therefore, to make these databases up-to-date, two processes are created, namely, replication and duplication.
In case of replication, the changes in every database are calculated and based on the result, the databases are updated without modifying the existing data. In the duplication process, any changes made in the master database are implemented in other databases in different locations in a timely manner. Replication proves to be more expensive than duplication because manipulation of data in every database involves more resources than that required in duplication.
A distributed database has a huge application in the case where business data needs to be shared using the Internet. It helps to keep a huge set of data in a distributed manner, decreasing the cost of an organization.
Figure depicts a distributed database:
In an object-oriented database, data is defined in terms of objects, i.e., as they exist in the real world. The object concept originates from the concept of class, which is an essential feature of object-oriented programming.
Thus, an object-oriented database implements the concept of object-oriented programming languages such as java, c++, etc. It implements all the functionality of object-oriented programmings such as encapsulation, inheritance, and polymorphism.
The main objective of using an object-oriented database is to store the data that is evolved during the execution of a program as an object. An instance of a class called an object, is created at run time. This object is not a permanent entity, because it is created and stays only during the run time of the program.
To make these objects persistent, an object-oriented database is used, which stores the data in the form of an object during the run time of the program. Thus, an object-oriented database helps to keep track of all the necessary objects that are created during the execution of a program.
Figure show an object-oriented database:
The main aim of an object-oriented database is to reduce the overhead of converting information representation in the database to an application-specific representation. Unlike a traditional database, an object model allows for data persistence and storage by storing objects in the databases. The relationships between various objects are inherent in the structure of the objects.
This is mainly used for complex data structures, for example, 2D and 3D graphics, which must otherwise be flattened before storage in a relational database.
Temporal databases are used in an area where timing is an important factor. It deals with the time factor while storing data in a database. For example, while managing a database for a hospital management system, a patient’s data is recorded.
Time is an important field in this case because it defines the duration for which the patient has been served. Thus, a database developer can convert a simple database to a temporal database by implementing the time field.
Temporal databases can be developed depending on two-timing constraints:
- Valid Time: It describes the time duration for which a fact is true.
- Transaction Time: It defines the time when the fact currently exists in the database.
A temporal database can be a valid time database, where each table contains a field, valid time, which stores the duration of the data valid in a certain context. In these databases, the time field appears as a three-dimension axis.
Tables can also be categorized into:
- Event Table: Event table contains the instant timestamp of an event.
- State Table: State table contains the state of the data. The state table also contains the duration of the data for which the data is valid.
Multimedia database has the ability to store multimedia data, which represents any data other than alphanumeric data. Multimedia data comprises media data such as image files, text files, video files, audio files, etc.
A multimedia database generally stores two types of multimedia data:
- Static Data: Static multimedia data includes text files or image files.
- Dynamic Data: Dynamic multimedia data represents audio or video files. These databases are generally large in size because multimedia data take up large space in storage.
The need for separate multimedia databases arose because other databases failed to handle multimedia data due to their volume constraint. These databases prove to be helpful in fields, such as science projects or any library project, where the data is in media form and large.
Data can be stored in this type of database in either a structured format or an unstructured format. The structured format stores the data in a predefined format, whereas in the case of unstructured data, it follows no format.
In the latter, the format allows for storing different types of data types such as raw data, registering data, and descriptive data. Raw data implies data in an unformatted form that is represented in binary form.
Registering data describes data that follows a certain format to identify a media file, such as its extension “.jpg” and “.png”. The descriptive type of data defines the structure of the multimedia file to make the data retrieval process faster.
There are two types of multimedia databases:
- Linked Multimedia Database: This stores multimedia data on a requirement basis. This type of multimedia database is linked to either the Internet or some storage media such as Compact Disc Read-Only Memory (CD ROM) or magnetic tape.
It fetches data from these storage spaces on user requirements only. Therefore, the database itself requires a small storage space.
- Embedded Multimedia Database: This stores all the media files in the database itself. Here, data retrieval speed is faster as it can be fetched easily from the database itself, but it requires a huge space to store all the media files.
These are cutting-edge databases that incorporate the idea of artificial intelligence in a traditional database model such as the relational database.
These types of databases are developed to merge the idea of logic programming and the relational model that supports the idea of artificial intelligence and can work with a large database. Deductive databases support queries, reasoning, and application development on databases.
Semantic databases represent an object-oriented database model that stores information in a natural way and has an information-handling system that is used for the management of information. It is a type of knowledge database that stores the meaning of information in the form of an object.
In this database, each element is related to every other element, depending on the meaning of the user’s information. It captures the meaning of the user’s information and provides a high-level description of that information.
The benefit of using a semantic database is that it offers information about data, which is called metadata. Metadata of data can be beneficial to an organization. In this database, each piece of data is related to every other data, thus, any complex business-related data can be fetched easily.
This database has some extra features, such as business calculation capability, that allow users to generate a report for an organization easily.
A semantic database supports different types of objects such as concrete objects and abstract objects. A concrete object represents a string or characters, whereas an abstract object is tangible in nature and holds the tangible item or organizational events.
Mobile databases incorporate data from various mobile devices connected through the Internet or wireless net in a centralized server.
Thus, if the user wants data from different wireless or connected servers in the Internet, he/she can have a mobile database that stores data from these network components, so that this data can be accessed offline.
It is similar to a distributed database system, where the centralized servers are updated on the basis of two processes, namely, the replication process and the synchronized process. Similarly, in mobile databases too, data from various mobile devices is replicated on the centralized servers.
On the other hand, the synchronized process matches both the centralized servers and mobile devices to manipulate the data at every side, so that all of them can update themselves without updating the stored information.
Unlike the replication process, synchronization is a two-way process, where updation takes place at both ends.
In mobile databases, mobile computing devices are not fixed at any location. Thus, one needs to provide a cost-effective way to fetch the data from these wireless devices.
Mobile databases consist of four components:
- Hosts whose locations are fixed
- A centralised server whose location is also fixed
- A base station that is stable in a network
- Mobile devices whose location changes randomly
When a user wants to get data from a mobile device, he/she fires a query, which, in turn, is fired by the base station to get the data from the mobile components in its range. After accumulating the data, it stores the data in the centralized server and gives the result to its user. Sybase is one of the applications for mobile databases.
Database modeling refers to the process of designing a database. As mentioned earlier, a database is a collection of relevant and related data, which can be interpreted to result in information significant for an organization.
Database design is the structure of a database in database management. This structure of a database is defined by data modeling. There are various models of databases, which can be used to design a database.
Data modeling involves defining data elements, the structures of data elements, and the relationships among them. For example, in an organization, every employee has a unique employee code, which is generated in a specific manner.
Thus, employee code is a data element, its format is its structure, and the department the employee works in defines one of the relationships besides others such as designation, personal details, etc.
The data necessities, which are required to strengthen the efficiency of an organization, are defined and analyzed using data modeling. Wherever data analysis is required in business processes, standard data modeling is suggested.
In other words, data modeling is used in the following cases:
- To help in managing data as a resource
- To support the integration of a database system
- To provide a basis for designing databases
A database model refers to the structure of the database and defines how the data is structured within the database. It also defines the operations that can be performed on the data present in the database.
There are various database models that are prevalent in DBMS; these are depicted in Figure:
In a hierarchical model, data is organized in a tree structure, in which a single root called the parent record has multiple leaves called child records. A record is a collection of related fields, each of which has an individual value. Thus, a hierarchical model uses a tree structure to represent relationships among different records in a database.
Figure shows a hierarchical model:
In a database using the hierarchical model, records are connected to each other using links. A link relates exactly two records. Here, all the records are organized in a hierarchical order having a parent-child relationship.
Every record in the structure can have only one parent record but can be linked to one or more than one child records. Thus, all the records are linked in a 1:N relationship, i.e., 1 represents one parent record and N represents the number of child records.
The relational model was introduced by E. F. Codd to replace the tree and the network structure approaches for modeling data in databases, as used by hierarchical and network models, respectively.
This model uses operations from relational algebra and relational calculus, such as projections, unions, and joins, to define the relationships between the data entities.
Figure shows a relational model:
In a relational model, data is organized and represented using tables, called relations, consisting of rows and columns. The columns of the relations are termed ‘attributes’ and individual rows are termed as ‘tuples’. Attributes represent properties of entities represented in a relation.
Attributes can be allowed to take a value from a set of values, called the domain for that attribute. Tuples represent a particular instance of the entity represented in the relation.
All the relations in a relational model must possess a set of properties, which are listed as follows:
- All the rows/tuples in a relation should be distinct, which implies that no two rows can have identical values in all the attributes.
- The items in a particular column are of the same kind, which implies that a particular column can hold only the same type of values for different tuples.
- The ordering of rows and columns in the relationship is insignificant.
- Every column in the relationship has a distinct name that uniquely identifies the column in the relation.
The network model is a database model that is considered a flexible way of representing data and their relationships.
A distinguishing feature of this model is that while in a hierarchical database model, data is represented as a tree of records with each record having one parent record and many children, the network model allows each record to have multiple parent and child records.
The network model, in comparison to the hierarchic model, allows more natural modeling of relationships between entities.
Figure shows a network model:
Although the network model was widely implemented and used, it failed to become dominant. The relational model has always been the more favored network model by database administrators due to the extra productivity and flexibility provided by the relational model. This led to the gradual decline of the network model in corporate organization usage.