What is data mining?

Using data to identify sequential patterns, correlations and trends has been used throughout history but it was only in the 1990s that the term ‘data mining’ was coined. As digital technologies grow and evolve at unprecedented speed, so too do the methods and materials used to collect and interpret data.

As data mining – also known as knowledge discovery in data (KDD) – can be used to accurately predict certain outcomes, it is no surprise that more businesses than ever now use this practice in their day-to-day operations. This business intelligence practice isn’t only limited to tech giants like Google and Amazon. Data mining is used across all sectors, including retailers, banks, manufacturers, and telecommunications providers to name a few. 

By using automated data analysis, businesses build an understanding of how their customers interact with their services or the popularity of products. It can also be used to keep track of the economy, risk and competition. Using this branch of data science in business can create stability and give a clear steer in managerial decision-making, as well as generating large financial gains and growth.

The foundation of data mining is underpinned by three scientific disciplines:

  • statistics – the numeric study of data relationships
  • artificial intelligence – software and/or machines which display human-like intelligence
  • machine learning – algorithms which can learn to make predictions through data

How data mining is used in business

Different volumes of data are captured from multiple different teams within a business. Data warehousing across an organisation is an efficient way to have all data stored centrally. Data mining uses the information in centrally stored databases to glean insights on the past, present and future of an organisation’s operations and output.

By using data, marketing campaigns can be optimised to improve segmentation, cross-sell offers, and target customers more directly, thereby increasing return on investment (ROI). Data collected in a marketing campaign can also be used by sales teams, and data mining can provide useful information on the customers most likely to convert into sales for a more efficient process.

Data mining techniques can also be used to reduce costs across a business’s operational functions by identifying glitches in processes and aiding more thoughtful decision making.

Primarily used in banking and financial institutions, data mining can be used for fraud detection as data anomalies can highlight risks quickly.

How does data mining work?

This first step of effective data mining is data collection. Many businesses have a data warehouse where a large collection of business data is stored and used to help make effective business decisions. 

When undertaking data mining projects, the six phases in the Cross-Industry Standard Process for Data Mining (CRISP-DM) is a flexible workflow which is frequently used as a guideline for the data mining process. 

The CRISP-DM phases are:

  1. business understanding – identifying project objectives and scope, and uncovering a question or problem that data mining can answer or solve
  2. data understanding – collecting the raw data relevant to the question, which often comes from multiple sources and may include both structured and unstructured data, and initial exploratory analysis to select the subset of data for analysis and modeling
  3. data preparation – preparing the final data set and identifying the dimensions and variables to explore and prepare the final data set for model creation
  4. modeling – selecting the appropriate modeling technique for the data set
  5. evaluation – testing and measuring the model on its success at answering the question or solving the problem outlined in phase one, and editing the model or the question whilst assessing the progress to ensure it’s on the right track
  6. deployment – deploying the model into the real world once it is accurate and reliable through a well thought out roll-out plan

Data mining modeling techniques

There are three main data mining modeling techniques in use today. 

Descriptive modeling

This data mining technique uncovers shared similarities or groupings in historical data to determine answers to set questions on successes or failures. Within this technique, questions can include:

  • clustering – which groups similar records together as part of data mining applications
  • anomaly detection – which identifies any outliers
  • association rule learning – which detects relationships between records within the data set
  • principal component analysis – which detects relationships between variables
  • affinity grouping – which groups the data of individuals together through their common interests or similar goals

Predictive modeling

Predictive analytics can be used to predict events and outcomes in the future and this technique can help uncover insights relating to any business. Within this technique, questions can include:

  • regression – which measures the strength of a relationship between one dependent variable and a series of independent variables
  • neural networks – which are computer programs that detect patterns, make predictions, and learn from them (a neural network of three or more layers of big data is considered deep learning)
  • decision trees – which are tree-shaped diagrams with each branch representing different probable outcomes
  • support vector machines – which are supervised learning models with associated learning algorithms

Prescriptive modeling

This model uses a combination of techniques and tools and applies them against input from many different small and large data sets including historical data, real-time data, big data, and text mining which filters and transforms unstructured data from the web, social media, comment fields, books, email, PDFs, audio and other text sources. Within this technique, questions can include:

  • prescriptive analysis plus rules – which develops if/then rules from patterns and predicts outcomes
  • marketing optimisation – which simulates the most successful media mix for marketing campaigns to create the highest ROI

Does data mining require coding?

There are a wide range of data mining software and tools available, from open source programming languages such as R and Python to familiar tools like Excel. 

As programming languages are a key part of manipulating, analysing and visualising data, data miners need an understanding of these languages to be able to efficiently use data mining tools to uncover knowledge discovery in databases.

Become an integral part of the modern workplace

Data scientists are in high demand across businesses in a range of sectors, so by studying a specialised degree you will be setting yourself up for a successful future in this ever-evolving field.

Learn how to solve business problems, develop key skills in data management and database systems, and further your understanding of data mining algorithms on the University of York’s 100% online MSc Computer Science with Data Analytics.

Studying part-time and around your own commitments, you can continue to earn as you learn and apply your learning to your current role.

What you need to know about blockchain

Blockchain technology is best known for its role in fintech and making cryptocurrency a reality, but what is it? 

Blockchain is a database that stores information in a string of blocks rather than in tables, and which can be decentralised by being made public. Bitcoin, one of the most talked about and unpredictable cryptocurrencies, uses blockchain as does Ether, the currency of Ethereum. 

Although cryptocurrencies have been linked with criminal activity, blockchain’s mechanism of storing data with time stamps provides offers transparency and traceability. Although central banks and financial institutions have been wary of the lack of regulation, retailers are increasingly accepting Bitcoin transactions. It’s said that Bitcoin founder, Satoshi Nakamoto, created the cryptocurrencies as a response to the 2008 financial crash. It was a way of circumnavigating financial institutions by saving and transferring digital currency in a peer-to-peer network without the involvement of a central authority.

Ethereum is a blockchain network that helped shift the focus away from cryptocurrencies when it opened in 2015 by offering general purpose blockchain that can be used in different ways. In a white paper written in 2013, the founder of Ethereum, Vitalik Buterin, wrote about the need for application development beyond the blockchain technology of Bitcoin, that would lead to attachment to real-world assets such as stocks and property. Ethereum blockchain has also provided the ability to create and exchange non-fungible tokens (NFTs). NFTs are mainly known as digital artworks but can also be digital assets, such as viral video clips, gifs, music, or avatars. They’re attractive because once bought, the owner has exclusive rights to the content. They also protect the intellectual property of the artist by being tamper-proof.

There has recently been a lot of hype around NFTs because the piece Everydays: The First 5000 Days by digital artist Beeple (Mike Winkelmann) sold for a record-breaking $69,346,250 at auction. That’s the equivalent of 42,329 Ether, which was what Vignesh Sundaresan, owner of Metapurse, used to purchase the piece that combines 5,000 images created and collated over 13 years. NFTs may seem like a new technology but they’ve actually been around since 2014.

IOTA is the first cryptocurrency to make possible free micro-transactions between Internet of Things (IoT) objects. While Ethereum moved the focus away from cryptocurrency, IOTA is looking to move cryptocurrency beyond blockchain. By using a Directed Acyclic Graph called the Tangle, IOTA manages to rid any need for miners, allows for near-infinite scaling, and removes fees entirely.   

How blockchain works

Blockchain applications are many and varied including the decentralisation of financial services, healthcare, internet browsing, real estate, government, voting, music, art, and video games. Blockchain solutions are increasingly utilised across industries, for example, to provide transparency in the supply chain, or in lowering administrative overheads with smart contracts.  

But how does it actually work? Blockchain uses several technologies including distributed ledger technology, digital signatures, distributed networks and encryption methods to link the blocks of the ledger for record-keeping. Information is collected in groups which make up the blocks. The blocks have certain capacities which, once filled, become chained to the previously filled block. This creates a timeline because each block is given a timestamp which cannot be overwritten.

The benefits of blockchain are seen not just in cryptocurrencies but in legal contracts and stock inventories as well as in the sourcing of products such as coffee beans. There are notoriously many steps between coffee leaving the farm where it was grown and reaching your coffee cup. Because of the complexity of the coffee market, coffee farmers often only receive a fraction of what the end-product is worth. Consumers also increasingly want to know where their coffee has come from and that the farmer received a fair price. Initially used as an effective way to cut out the various middlemen and streamline operations, blockchain is now being used as an added reassurance for supermarket customers. In 2020, Farmer Connect partnered with Folger’s coffee in using the IBM blockchain platform to connect producers with customers. A simple QR code helps consumers see how the coffee they hold in their hand was brought to the shelf. Walmart is another big name providing one of many case studies for offering transparency with blockchain by using distributed ledger software called Hyperledger Fabric.

Are blockchains hackable?

In theory, blockchains are hackable, however the time and resources – including a vast network of computers – needed to achieve a successful hack are beyond the average hacker. Even if a hacker did manage to simultaneously control and alter 51% of the copies of the blockchain in order to gain control of the ledger and make their own copy the majority copy, each block would then have different timestamps and hash codes (the cryptographic algorithm). The deliberate design of blockchain – using decentralisation, consensus, and cryptography – makes it impossible to alter the chain without it being noticed by others and irreversibly changing the data along the whole chain.

Blockchain is not invulnerable to cybersecurity attack through phishing and ransomware but it is currently one of the most secure forms of data storage. Permissioned blockchain adds an additional access control layer – actions performed only by identifiable users allow access. These blockchains are different to both public blockchains and private blockchains.

Are blockchains good investments?

Currencies like Bitcoin and Ether are proving to be good investments both in the short-term and the long-term; NFTs are slightly different though. A good way to think about NFTs is as collector’s items in digital form. Like anything that’s collectable, it’s best to buy something because you truly admire it rather than because it’s valuable, especially in the volatile cryptocurrency ecosystem. It’s also worth bearing in mind that the values of NFTs are based entirely on what someone is prepared to pay rather than any history of worth – demand drives price.

Anyone can start investing but as most digital assets like NFTs can only be bought with cryptocurrency, you’ll need to purchase some, which you can easily do with a credit card on any of the crypto platforms. You will also need a digital wallet in which to store your cryptocurrency and assets. You’ll be issued with a public key, which works like an email address when sending and receiving funds, and a private key, which is like a password that unlocks your virtual vault. Your public key is generated by your private key which makes them a pair and adds to the security of your wallet. Some digital wallets like Coinbase also serve as crypto bank accounts for savings. Although banks occasionally freeze accounts with relation to Bitcoin transactions, they are becoming more accustomed to cryptocurrencies. Investment banks such as JP Morgan and Barclays even show interest in the asset class despite the New York attorney general declaring “Play by the rules or we will shut you down” in March 2021.

Are blockchain transactions traceable?

In a blockchain, each node (a bank of computers) has a complete record of the data that has been stored on the blockchain since it began. So for example, the data held by a Bitcoin is the entire history of its transactions. If one node presents an error in its data, the thousands of other nodes help by providing a reference point for the error so it can correct itself. This architecture means that no single node in the network has the power to alter information held within it. It also means that the record of transactions in each block that make up Bitcoin’s blockchain is irreversible. This also means that any Bitcoins extracted by a hacker can be easily traced by the transactions that appear in the wake of the hack.

Blockchain explorers allow anyone to see transactions happening in real-time.

Learn more about cryptocurrencies and blockchain

Whether you’re interested in improving cybersecurity or becoming a blockchain developer, looking for enhanced expertise in data science or artificial intelligence, specialist online Master’s degrees from University of York cover some of the hottest topics in these areas.

Discover more and get a step ahead with the MSc Computer Science with Data Analytics or the MSc Computer Science with Artificial Intelligence.