What is Data Mining?

What is Data Mining?

Data mining is the process of digging through a huge amount of data and analyzing it for extracting useful meaning from the data. Data mining tools analyze the pattern of the customers and predict behaviors and future trends which allow organizations to make practical knowledge-driven decisions.

Data mining tools can be used to answer business questions that are usually time-consuming to resolve. These tools dig up databases for hidden patterns and find predictive information that business experts may miss because it does not fall within their expectations.

For example, one grocery chain used data mining software to analyze buying patterns of local people. They discovered that when men bought bread on Fridays and Saturdays, they also purchased beer.

Further analysis showed that these shoppers typically did their weekly grocery shopping on Saturdays. On Fridays, however, they only bought a few items. This pattern showed that they purchased beer for the upcoming weekend.

The grocery chain could use this newly discovered information in various ways to increase its profit. For instance, they could relocate the beer display closer to the bread display and also ensure that beer and bread were sold at full price on Fridays and Saturdays.

Data mining tools and techniques are used by many companies such as retail, finance, health care, manufacturing transportation, and aerospace to take advantage of historical data.

Using pattern recognition technologies and other useful techniques such as statistical and mathematical techniques to examine the information available in the data warehouse, data mining helps analysts to recognize important facts, relationships, trends, patterns, exceptions, and irregularities that might otherwise go unnoticed.


Data Mining Parameters

Data mining is used to sort through data to recognize patterns and establish relationships between data. Data mining parameters help in establishing a relationship between data from different sources.

Data mining parameters include the following points as depicted in the Figure:

Classes

Stored data is used to discover data in predetermined groups. For example, a restaurant chain could extract customer purchase data to find out when customers visit and what they usually order. This information can be used to increase customers by providing them with daily specials.

Clusters

Data items are grouped on the basis of logical relationships or consumer preferences. For example, data can be mined to find market segments or consumer attractions.

Associations

Associations can be identified using data mining. For example, a customer usually buys bread with milk. Here bread and milk represent an association.

Sequential Patterns

Using data mining we can anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict a backpack will be purchased if a consumer is purchasing sleeping bags and hiking shoes.


How Data Mining Works?

Have you wondered how through data mining business experts are able to predict what is going to happen next? The technique that is used to perform these acts is called modeling.

Modeling is simply an activity in which we build a model based on data from situations where we know the answer of the situation and then apply this model to other situations where the answer is not known. Modeling techniques have been applied for a very long time, but recently the computational power to automate modeling techniques for working directly on the data is available.

Consider the example of building a model, the director of marketing for a telecommunications company wants to focus his marketing and sales efforts on segments of the population that can become big users of long-distance services. He has good information about his customers, but it is impossible to detect the common features of his best customers because of many variables.

From the existing database of customers, he can retrieve information such as age, sex, credit history, income, zip code, occupation, etc. using data mining tools, such as neural networks, he can identify the characteristics of those customers who make many long-distance calls.

For instance, he might learn that his best customers are youngsters between the age of 19 and 28 who make around $45,000 per year. This, information and knowledge is his model for high-value customers, and he can make a budget accordingly and manage his marketing efforts.

There are various types of information about the customers that you can retrieve with the help of data mining.

The figure shows the different types of customer information about a particular customer.


Types of Relationships

Data mining provides the link between transactions and analytical systems. It analyses relationships and patterns in transaction data based on end-user queries. Different types of data mining tools are available such as statistical, machine learning, and neural networks.

Generally, any of four types of relationships are seen:

Regression

Regression creates a relationship between a dependent or outcome variable and a group of predictors. In other words, it maps a data item to a prediction variable. Regression is supervised learning that partitions the data into training and validation data.

Time Series Analysis

It examines the value of an attribute that changes over time. It comprises ways and means for analyzing time series data to extract meaningful statistics and other characteristics of the data.

Prediction

Many data mining tools can predict states of future data based on historical and current data. Prediction machine comprises flooding, machine learning, speech recognition, and pattern recognition.

Summarisation

It can be defined as the abstraction or generalization of data. Data is summarised and abstracted which results in a smaller set that provides the general overview of data. For example, long-distance calls made by a customer can be summarised as total minutes, total calls, total cost, etc.


Architecture of Data Mining

Data is stored in databases or data warehouse systems or both. This raises the need of a data mining system or architecture that connects or disconnects with databases and data warehouse systems.

Figure shows types of data mining architecture

Let us discuss these different possible types of data mining architecture in the following section:

No-coupling

In this architecture, a data mining system does not use any functionality of a database or data warehouse system. A no-coupling data mining system accesses data from specific data sources such as file systems. It uses major data mining algorithms to process data and then it stores the results into the file system.

This data mining architecture does not adopt any advantage of database or data warehouse which are already very efficient in organizing, storing, accessing, and retrieving data. Flat file processing is an example of this architecture.

Semi-tight Coupling

In this data mining architecture, the data mining system not only links the database or data warehouse system but it also has various features of the database or data warehouse to perform some data mining tasks such as sorting, indexing, and aggregation.

In this architecture, some intermediate results can be stored in a database or data warehouse system for improving performance.

Tight Coupling

In this data mining architecture, a database or data warehouse is considered as an information retrieval component of a data mining system using integration.

All the features of a database or data warehouse are utilized to perform data mining tasks. This architecture provides system scalability, high performance, and integrated information.


Functionalities of Data Mining

For businesses, data mining is used to discover patterns and relationships in the data in order to help make better business decisions. Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty.

Specific uses of data mining include the following as depicted in Figure:

  • Market segmentation – It refers to identifying the common traits of customers who purchase the same products from your company.

  • Customer churn – This is predicting which customers are expected to leave your company and use the product of a competitor.

  • Fraud detection – It is recognizing which transactions are most expected to be deceitful.

  • Direct marketing – It is recognizing which prospects should be added to a mailing list to gain the highest response rate.

  • Interactive marketing – This is predicting what an individual is interested in seeing on a website.

  • Market basket analysis – It is understanding which products or services are commonly being purchased together. For example, butter and bread.

  • Trend analysis – This is revealing the difference between the purchase of a typical customer within the last and the current month.

Classification of Data Mining System

Data mining is a field that consists of a set of disciplines, including database systems, statistics, machine learning, visualization, and information science. Additionally, depending on the data mining approach used, we can also apply other techniques, such as neural network methods for data mining.

The data mining system may also integrate techniques from spatial data analysis, information retrieval, pattern recognition, image analysis; signal processing, computer graphics, web technology, economics, or psychology depending on the kinds of data to be mined or on the given data mining application.

Because of the variety of disciplines contributing to data mining, data mining research is likely to produce a large variety of data mining systems. Therefore, a distinct classification of data mining systems is needed. This kind of classification may help potential users in differentiating data mining systems and identifying systems depending on their needs.

Data mining systems can be classified according to various criteria, as follows:

Classification Using the Kinds of Databases Mined

You can classify a data mining system according to the kinds of databases mined. Database systems themselves can also be classified according to different criteria such as data models, types of data, or applications involved, each of these may need its own data mining technique. Therefore, data mining systems can be classified accordingly.

For instance, if we classify according to data models we may have a system which could be relational, transactional, object-oriented, object-relational, or data warehouse mining system.

If we classify according to the types of data, we may have a spatial, time-series, text, or multimedia data mining system, or a World Wide Web mining system. Other system types comprise heterogeneous data mining systems and legacy data mining systems.

Classification Using the Kinds of Knowledge Mined

We can classify data mining systems according to the kind of knowledge they mine, i.e. on the basis of data mining functionalities, such as characterization, discrimination, association, classification, clustering, trend and evolution analysis, deviation analysis, similarity analysis, etc.

A complete data mining system usually provides numerous integrated data mining functionalities.

Furthermore, we can also categorize data mining systems on the basis of granularity or levels of abstraction of the knowledge mined, including generalized knowledge, primitive-level knowledge or knowledge at multiple levels.

An advanced data mining system should simplify the detection of knowledge at various levels of abstraction.

Classification Using the Kinds of Techniques Utilised

We can also classify data mining systems according to the fundamental data mining techniques employed.

We can describe these techniques according to the degree of user interaction involved (For example, autonomous systems, interactive exploratory systems, query-driven systems), or the methods of data analysis employed (For example, database-oriented or data warehouse-oriented techniques, machine learning, statistics, visualization, pattern recognition, neural networks, and so on).

A sophisticated data mining system usually adopts numerous data mining techniques or forms an effective, integrated technique which is a combination of the merits of a few individual approaches.

ARTICLE SOURCES
  • Bibliography: Ponniah, P. (2001). Data warehousing fundamentals. 1st ed. New York: Wiley

  • Bibliography: Seltzer, M. (2014). Data Mining for Dummies. 1st ed. For Dummies.

  • Data Warehousing. Retrieved from: http://www.1keydata.com/datawarehousing/datawarehouse.html

  • Definition of Data Mining. Retrieved from: http://www.anderson. ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm

  • Classification of Data Mining. Retrieved from: http://www.tutorialspoint.com/data_mining/dm_systems.htm

  • Meaning of Business Intelligence. Retrieved from: http://www. cio.com/article/40296/Business_Intelligence_Definition_and_Solutions

Leave a Reply