How to Import Excel Data Into Python Scripts Using Pandas
Whatsapp Pinterest
Advertisement

Microsoft Excel is the most widely-used spreadsheet software in the world, and for good reason: the user-friendly interface and powerful built-in tools make it simple to work with data.

Unlock the FREE "Essential Excel Formulas" cheat sheet now!

This will sign you up to our newsletter

Enter your Email

But if you want to do more advanced data processing, you’ll need to go beyond Excel’s capabilities and start using a scripting/programming language like Python. Rather than manually copying your data into databases, here’s a quick tutorial on how to load your Excel data into Python using Pandas.

Note: If you’ve never used Python before, this tutorial may be a tad difficult. We recommend starting with these websites for learning Python The 5 Best Websites to Learn Python Programming The 5 Best Websites to Learn Python Programming Want to learn Python programming? Here are the best ways to learn Python online, many of which are entirely free. Read More and these basic Python examples to get you started 10 Basic Python Examples That Will Help You Learn Fast 10 Basic Python Examples That Will Help You Learn Fast This article of basic python examples is for those who already have some programming experience and simply want to transition to Python as quickly as possible. Read More .

What Is Pandas?

Python Data Analysis Library (“Pandas”) is an open-source library for the Python programming language that’s used for data analysis and data manipulation.

Pandas loads data into Python objects known as Dataframes, which store data in rows and columns just like a traditional database. Once a Dataframe is created it can be manipulated using Python, opening up a world of possibilities.

Installing Pandas

Note: You must have Python 2.7 or later to install Pandas.

To begin working with Pandas on your machine you will need to import the Pandas library. If you’re in search of a heavyweight solution you can download the Anaconda Python Distribution, which has Pandas built-in. If you don’t have a use for Anaconda, Pandas is simple to install in your terminal.

Pandas is a PyPI package, which means you can install using PIP for Python via the command line. Modern Mac systems come with PIP. For other Windows, Linux, and older systems it’s easy to learn how to install PIP for Python How to Install Python PIP on Windows, Mac, and Linux How to Install Python PIP on Windows, Mac, and Linux Many Python developers rely on a tool called PIP for Python to make everything easier and faster. Here's how to install Python PIP. Read More .

Once you’ve opened your terminal, the latest version of Pandas can be installed using the command:

>> pip install pandas

Pandas also requires the NumPy library, let’s also install this on the command line:

>> pip install numpy

You now have Pandas installed and ready to create your first DataFrame!

Prepping the Excel Data

For this example, let’s use a sample data set: an Excel workbook titled Cars.xlsx.

Excel Table For Python Pandas

This data set displays the make, model, color, and year of cars entered into the table. The table is displayed as an Excel range. Pandas is smart enough to read the data appropriately.

This workbook is saved to the Desktop directory, here is the file path used:

/Users/grant/Desktop/Cars.xlsx

You will need to know the file path of the workbook to utilize Pandas. Let’s begin by opening up Visual Studio Code to write the script. If you don’t have a text editor, we recommend either Visual Studio Code or Atom Editor Visual Studio Code vs. Atom: Which Text Editor Is Right for You? Visual Studio Code vs. Atom: Which Text Editor Is Right for You? Looking for a free and open-source code editor? Visual Studio Code and Atom are the two strongest candidates. Read More .

Writing the Python Script

Now that you have your text editor of choice, the real fun begins. We’re going to bring together Python and our Cars workbook to create a Pandas DataFrame.

Importing the Python Libraries

Open your text editor and create a new Python file. Let’s call it Script.py.

In order to work with Pandas in your script, you will need to import it into your code. This is done with one line of code:

import pandas as pd

Here we are loading the Pandas library and attaching it to a variable “pd”. You can use any name you would like, we are using “pd” as short for Pandas.

To work with Excel using Pandas, you need an additional object named ExcelFile. ExcelFile is built into the Pandas ecosystem, so you import directly from Pandas:

from pandas import ExcelFile

Working With the File Path

In order to give Pandas access to your workbook, you need to direct your script to the location of the file. The easiest way to do this is by providing your script with the full path to the workbook.

Recall our path in this example: /Users/grant/Desktop/Cars.xlsx

You will need this file path referenced in your script to extract the data. Rather than referencing the path inside of the Read_Excel function, keep code clean by storing the path in a variable:

Cars_Path = '/Users/grant/Desktop/Cars.xlsx'

You are now ready to extract the data using a Pandas function!

Extract Excel Data Using Pandas.Read_Excel()

With Pandas imported and your path variable set, you can now utilize functions in the Pandas object to accomplish our task.

The function you will need to use is appropriately named Read_Excel. The Read_Excel function takes the file path of an Excel Workbook and returns a DataFrame object with the contents of the Workbook. Pandas codes this function as:

pandas.read_excel(path)

The “path” argument is going to be the path to our Cars.xlsx workbook, and we have already set the path string to the variable Cars_Path.

You’re ready to create the DataFrame object! Let’s put it all together and set the DataFrame object to a variable named “DF”:

DF = pd.read_excel(Cars_Path)

Lastly, you want to view the DataFrame so let’s print the result. Add a print statement to the end of your script, using the DataFrame variable as the argument:

print(DF)

Time to run the script in your terminal!

Running the Python Script

Open your terminal or command line, and navigate to the directory which houses your script. In this case, I have “Script.py” located on the desktop. To execute the script, use the python command followed by the script file:

Python Script In Terminal

Python will pull the data from “Cars.xlsx” into your new DataFrame, and print the DataFrame to the terminal!

Python DataFrame In Terminal

A Closer Look at the DataFrame Object

At first glance, the DataFrame looks very similar to a regular Excel table. Pandas DataFrames are easy to interpret as a result.

Your headers are labeled at the top of the data set, and Python has filled in the rows with all your information read from the “Cars.xlsx” workbook.

Notice the leftmost column, an index starting at 0 and numbering the columns. Pandas will apply this index to your DataFrame by default, which can be useful in some cases. If you do not want this index generated, you can add an additional argument into your code:

DF = pd.read_excel(Cars_Path, index=False)

Setting the argument “index” to False will remove the index column, leaving you with just your Excel data.

Doing More With Python

Now that you have the ability to read data from Excel worksheets, you can apply Python programming any way you choose. Working with Pandas is a simple way for experienced Python programmers to work with data stored in Excel Workbooks.

The ease with which Python can be used to analyze and manipulate data is one of the many reasons why Python is the programming language of the future 6 Reasons Why Python Is the Programming Language of the Future 6 Reasons Why Python Is the Programming Language of the Future Want to learn or expand your programming skills? Here's why Python is the best programming language to learn this year. Read More .

Image Credit: Rawpixel/Depositphotos

Explore more about: Data Analysis, Microsoft Excel, Python, Scripting.

Enjoyed this article? Stay informed by joining our newsletter!

Enter your Email

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. venkat
    November 18, 2019 at 3:18 am

    Index=False is not working
    Still we can see the
    0
    1
    2
    .
    .
    7 in results