What is data mining?
Using data to identify sequential patterns, correlations and trends has been used throughout history but it was only in the 1990s that the term ‘data mining’ was coined. As digital technologies grow and evolve at unprecedented speed, so too do the methods and materials used to collect and interpret data.
As data mining – also known as knowledge discovery in data (KDD) – can be used to accurately predict certain outcomes, it is no surprise that more businesses than ever now use this practice in their day-to-day operations. This business intelligence practice isn’t only limited to tech giants like Google and Amazon. Data mining is used across all sectors, including retailers, banks, manufacturers, and telecommunications providers to name a few.
By using automated data analysis, businesses build an understanding of how their customers interact with their services or the popularity of products. It can also be used to keep track of the economy, risk and competition. Using this branch of data science in business can create stability and give a clear steer in managerial decision-making, as well as generating large financial gains and growth.
The foundation of data mining is underpinned by three scientific disciplines:
- statistics – the numeric study of data relationships
- artificial intelligence – software and/or machines which display human-like intelligence
- machine learning – algorithms which can learn to make predictions through data
How data mining is used in business
Different volumes of data are captured from multiple different teams within a business. Data warehousing across an organisation is an efficient way to have all data stored centrally. Data mining uses the information in centrally stored databases to glean insights on the past, present and future of an organisation’s operations and output.
By using data, marketing campaigns can be optimised to improve segmentation, cross-sell offers, and target customers more directly, thereby increasing return on investment (ROI). Data collected in a marketing campaign can also be used by sales teams, and data mining can provide useful information on the customers most likely to convert into sales for a more efficient process.
Data mining techniques can also be used to reduce costs across a business’s operational functions by identifying glitches in processes and aiding more thoughtful decision making.
Primarily used in banking and financial institutions, data mining can be used for fraud detection as data anomalies can highlight risks quickly.
How does data mining work?
This first step of effective data mining is data collection. Many businesses have a data warehouse where a large collection of business data is stored and used to help make effective business decisions.
When undertaking data mining projects, the six phases in the Cross-Industry Standard Process for Data Mining (CRISP-DM) is a flexible workflow which is frequently used as a guideline for the data mining process.
The CRISP-DM phases are:
- business understanding – identifying project objectives and scope, and uncovering a question or problem that data mining can answer or solve
- data understanding – collecting the raw data relevant to the question, which often comes from multiple sources and may include both structured and unstructured data, and initial exploratory analysis to select the subset of data for analysis and modeling
- data preparation – preparing the final data set and identifying the dimensions and variables to explore and prepare the final data set for model creation
- modeling – selecting the appropriate modeling technique for the data set
- evaluation – testing and measuring the model on its success at answering the question or solving the problem outlined in phase one, and editing the model or the question whilst assessing the progress to ensure it’s on the right track
- deployment – deploying the model into the real world once it is accurate and reliable through a well thought out roll-out plan
Data mining modeling techniques
There are three main data mining modeling techniques in use today.
Descriptive modeling
This data mining technique uncovers shared similarities or groupings in historical data to determine answers to set questions on successes or failures. Within this technique, questions can include:
- clustering – which groups similar records together as part of data mining applications
- anomaly detection – which identifies any outliers
- association rule learning – which detects relationships between records within the data set
- principal component analysis – which detects relationships between variables
- affinity grouping – which groups the data of individuals together through their common interests or similar goals
Predictive modeling
Predictive analytics can be used to predict events and outcomes in the future and this technique can help uncover insights relating to any business. Within this technique, questions can include:
- regression – which measures the strength of a relationship between one dependent variable and a series of independent variables
- neural networks – which are computer programs that detect patterns, make predictions, and learn from them (a neural network of three or more layers of big data is considered deep learning)
- decision trees – which are tree-shaped diagrams with each branch representing different probable outcomes
- support vector machines – which are supervised learning models with associated learning algorithms
Prescriptive modeling
This model uses a combination of techniques and tools and applies them against input from many different small and large data sets including historical data, real-time data, big data, and text mining which filters and transforms unstructured data from the web, social media, comment fields, books, email, PDFs, audio and other text sources. Within this technique, questions can include:
- prescriptive analysis plus rules – which develops if/then rules from patterns and predicts outcomes
- marketing optimisation – which simulates the most successful media mix for marketing campaigns to create the highest ROI
Does data mining require coding?
There are a wide range of data mining software and tools available, from open source programming languages such as R and Python to familiar tools like Excel.
As programming languages are a key part of manipulating, analysing and visualising data, data miners need an understanding of these languages to be able to efficiently use data mining tools to uncover knowledge discovery in databases.
Become an integral part of the modern workplace
Data scientists are in high demand across businesses in a range of sectors, so by studying a specialised degree you will be setting yourself up for a successful future in this ever-evolving field.
Learn how to solve business problems, develop key skills in data management and database systems, and further your understanding of data mining algorithms on the University of York’s 100% online MSc Computer Science with Data Analytics.
Studying part-time and around your own commitments, you can continue to earn as you learn and apply your learning to your current role.