Data Mining: A Statistical Tool

Data Mining: A Statistical Tool

Data mining has attracted a lot of attention in the information industry and in society as a whole in recent years. The reason for this attention is the wide availability of huge amount of data and the imminent need for turning such data into useful information and knowledge. The information and knowledge gained can be used for applications ranging from market analysis, fraud detection, customer retention, etc. In a simplified statement, we can state Data mining is just extracting or “mining” knowledge from a large amount of data. This data can be of any kind like relational databases, data warehouses, transactional databases, flat files or can also include the World Wide Web.
Now let’s take an example expressing the need for data mining in current scenario: As we all know that the amount of data kept in computers and databases is growing at a phenomenal rate. At the same time, the users of these data are expecting more sophisticated information from them. A marketing manager is no longer satisfied with a simple listing of marketing contacts but wants detailed information about customers past purchases as well as predictions of future purchases. Simple structured query language queries are not adequate to support these increased demands for information. Data mining steps in to solve these needs.It’s often defined as finding hidden information in a database. As another example, researchers in molecular biology hope to use the large amounts of genomic data currently being gathered to better understand the structure and function of genes. In the past, traditional methods in molecular biology allowed scientists to study only a few genes at a time in a given experiment. Recent breakthroughs in microarray technology have enabled scientists to compare the behaviour of thousands of genes under various situations. Such comparisons can help determine the function of each gene and perhaps isolate the genes responsible for certain diseases. However, the noisy and high dimensional nature of data requires new types of data analysis. In addition to analysing gene array data, data mining can also be used to address other important biological challenges such as protein structure prediction, multiple sequence alignment, the modelling of biochemical pathways, and phylogenetic.Traditional database queries, access a database using a well-defined query stated in a language such as SQL. The output of the query consists of the data from the database that satisfies the query or we can say it’s usually a subset of the database. But rather in data mining, the database referred is usually different from the original one as it needs to be pre-processed before getting used. Also, the query might not be well-formed and the output is not the subset of the database. Instead, it is the output of some analysis of the contents of the database.The kinds of patterns that can be discovered depend upon the data mining tasks employed. By and large, there are two types of data mining tasks: descriptive data mining tasks that describe
the general properties of the existing data, and predictive data mining tasks that attempt to do predictions based on inference on available data.It is common that users do not have a clear idea of the kind of patterns they can discover or need to discover from the data at hand. It is therefore important to have a versatile and inclusive data mining system that allows the discovery of different kinds of knowledge and at different levels of abstraction. This also makes interactivity an important attribute of a data mining system.

Leave A Reply

Your email address will not be published. Required fields are marked *