Data analysis is the process of evaluating data using analytical and statistical tools to discover useful information and aid in business decision making. There are a several data analysis methods including data mining, text analytics, business intelligence and data visualization.
How Is Data Analysis Performed?
Data analysis is a part of a larger process of deriving business intelligence. The process includes one or more of the following steps:
- Defining Objectives: Any study must begin with a set of clearly defined business objectives. Much of the decisions made in the rest of the process depends on how clearly the objectives of the study have been stated.
- Posing Questions: An attempt is made to ask a question in the problem domain. For example, do red sports cars get into accidents more often than others?
- Data Collection: Data relevant to the question must be collected from the appropriate sources. In the example above, data might be collected from a variety of sources including: DMV or police accident reports, insurance claims and hospitalization details. When data is being collected using surverys, a questionnaire to be presented to the subjects is needed. The questions should be appropriately modeled for the statistical method being used.
- Data Wrangling: Raw data may be collected in several different formats. The collected data must be cleaned and converted so that data analysis tools can import it. For our example, we may receive DMV accident reports as text files, insurance claims from a relational database and hospitalization details as an API. The data analyst must aggregate these different forms of data and convert it into a form suitable for the analysis tools.
- Data Analysis: This is the step where the cleaned and aggregated data is imported into analysis tools. These tools allow you to explore the data, find patterns in it, and ask and answer what-if questions. This is the process by which sense is made of data gathered in research by proper application of statistical methods.
- Drawing Conclusions and Making Predictions: This is the step where, after sufficient analysis, conclusions can drawn from the data and appropriate predictions can be made. These conclusions and predications may then be summarized in a report delivered to end-users.
Let us now look in some detail at the methods of data analysis in particular.
Data mining is a method of data analysis for discovering patterns in large data sets using the methods of statistics, artificial intelligence, machine learning and databases. The goal is to transform raw data into understandable business information. These might include identifying groups of data records (also known as cluster analysis), or identifying anomolies and dependencies between data groups.
Applications of data mining:
- Anomoly detection can process huge amounts of data (“big data”) and automatically identify outlier cases, possibly for exclusion from decision making or detection of fraud (e.g. bank fraud).
- Learning customer purchase habits. Machine learning techniques can be used to model customer purchase habits and determine frequently bought items.
- Clustering can identify previously unknown groups within the data.
- Classification is used to automatically classify data entries into pre-specified bins. A common example is classifying email messages as “spam” or “not-spam” and having the system learn from the user.
Text analytics is the process of deriving useful information from text. It is accomplished by processing unstructured textual information, extract meaningful numerical indices from the information and make the information available to statistical and machine learning algorithms for further processing.
Text mining process includes one or more of the following steps:
- Collecting information from various sources including web, file system, database, etc.
- Linguistic analysis including natural language processing.
- Pattern recognition (e.g. recognizing phone numbers, email addresses, etc.)
- Extracting summary information from the text, such as relative frequencies of the words, determining similarities between documents, etc.
Examples of text analytics applications:
- Analyzing open-ended survey responses. These surveys are of an exploratory nature and include open-ended questions related to the topic in question. The respondents can then express their views without being constrained to a particular response format.
- Analysis of emails, documents, etc to filter out “junk”. This also includes automatic classification of messages into pre-defined bins for routing to different departments.
- Investigate competitors by crawling their websites. This could be used to derive information about competitors’ activities.
- Security applications which can process log files for intrusion detection.
Business intelligence transforms data into actionable intelligence for business purposes and may be used in an organization’s strategic and tactical business decision making. It offers a way for people to examine trends from collected data and derive insights from it.
Some examples of business intelligence in used today:
- An organization’s operating decisions such as product placement and pricing.
- Identifying new markets, assessing the demand and suitability of products for different market segments.
- Budgeting and rolling forecasts.
- Using visual tools such as heat maps, pivot tables and geographical mapping.
Data visualization refers very simply to the visual representation of data. In the context of data analysis, it means using the tools of statistics, probability, pivot tables and other artifacts to present data visually. It makes complex data more understandable and usable.
Increasing amounts of data are being generated by a number of sensors in the environment (referred to as “Internet of Things” or “IOT”). This data (referred to as “big data”) presents challenges in understanding which can be eased by using the tools of Data visualization. Data visualization is used in the following applications.
- Extracting summary data from the raw data of IOT.
- Using a bar chart to represent sales performance over several quarters.
- A histogram shows distribution of a variable such as income by dividing the range into bins.
Data Analysis in Review
Data analysis is used to evaluate data with statistical tools to discover useful information. A variety of methods are used for this purpose, including data mining, text analytics, business intelligence, and data visualization.
Have you used data analysis in your organization to model anything? How was your experience? Do you have any useful insights to offer? Please let us know in the comments below.