Pivot tables continue to be among the most revered and widely used tools within MS Excel. Whether you’re a data analyst, data engineer, or simply a regular user, chances are you already have a soft spot for MS Excel.

Nonetheless, there is an increasing scope for replicating MS Excel’s tools and utilities, especially in Python. Did you know you can create extensive pivot tables in Python’s DataFrames with a few lines of code itself?

Yes, that's correct; if you are intrigued, here’s how you can do it.

Pre-Requisites for Creating Pivot Tables

Like any other programming language, even Python needs you to fulfill a few pre-requisites before you can get to coding.

To get the most optimized experience while creating your very first pivot table in Python, here’s what you will need:

  • Python IDE: Most Python codes have an integrated development environment (IDE) pre-installed on their system. There are several Python compatible IDEs in the market, including Jupyter Notebook, Spyder, PyCharm, and many others.
  • Sample Data: For illustration, here's a sample dataset for you to work on. Alternatively, feel free to tweak these codes directly on your live data.

Data sample link: Sample Superstore

Importing the Essential Libraries

Since Python works on the concept of third-party libraries, you need to import the Pandas library for creating pivots.

You can use Pandas to import an Excel file into Python and store the data in a DataFrame. To import Pandas, use the import command in the following manner:

        import pandas as pd
    

How to Create Pivots in Python

Since the library is now available, you need to import the Excel file into Python, which is the base for creating and testing pivots in Python. Store the imported data in a DataFrame with the following code:

        # Create a new DataFrame

# replace with your own path here
path = "C://Users//user/OneDrive//Desktop//"

# you can define the filename here
file = "Sample - Superstore.xls"

df = pd.read_excel(path + file)

df.head()

Where:

  • df: Variable name to store the DataFrame data
  • pd: Alias for Pandas library
  • read_excel(): Pandas function to read an Excel file into Python
  • path: The location where the Excel file is stored (Sample Superstore)
  • file: File name to import
  • head(): Displays the first five rows of the DataFrame, by default

The above code imports the Excel file into Python and stores the data in a DataFrame. Finally, the head function displays the first five rows of data.

Jupyter Notebook interface showing Python code

This function is handy to ensure the data is imported correctly into Python.

Which Pivot Table Fields Exist in Python?

Like its Excel counterpart, a pivot table has a similar set of fields in Python. Here are a few fields you need to know about:

  • Data: The data field refers to the data stored within a Python DataFrame
  • Values: Columnar data used within a pivot
  • Index: An index column(s) for grouping the data
  • Columns: Columns help in aggregating the existing data within a DataFrame

Purpose Behind Using the Index Function

Since the index function is the primary element of a pivot table, it returns the data’s basic layout. In other words, you can group your data with the index function.

Suppose you want to see some aggregated values for the products listed within the Segment column. You can calculate a pre-defined aggregate (mean value) in Python by defining the designated column as an index value.

        df.pivot_table(index = "Segment")
    

Where:

  • df: DataFrame containing the data
  • pivot_table: Pivot table function in Python
  • index: In-built function for defining a column as an index
  • Segment: Column to use as an index value

Python's variable names are case-sensitive, so avoid transitioning away from the pre-defined variable names listed in this guide.

Jupyter Notebook interface showing Python code

How to Use Multi-Index Values

When you want to use multiple index columns, you can define the column names in a list within the index function. All you have to do is specify the column names within a set of square brackets ([ ]), as shown below:

        df.pivot_table(index = ["Category", "Sub-Category"])
    
Jupyter Notebook interface showing Python code

The pivot function indents the index column in the output. Python displays the mean of all the numerical values against each index value.

Learn to Restrict the Values in the Output

Since Python picks all the numerical columns by default, you can restrict the values to tweak the results shown in the final output. Use the values function to define the columns you wish to see.

        df.pivot_table(index = ["Region", "Category", "Sub-Category"], values = "Sales")
    

In the final output, there will be three index columns, and the mean values for the Sales column pitted against each element.

Jupyter Notebook interface showing Python code

Defining Aggregate Functions in Pivot Table

What happens when you don't want to calculate the mean values by default? The pivot table has a lot of other functionalities, which extend beyond calculating a simple mean.

Here's how to write the code:

        df.pivot_table(index = ["Category"], values = "Sales", aggfunc = [sum, max, min, len])
    

Where:

  • sum: Calculates the sum of values
  • max: Calculates the maximum value
  • min: Calculates the maximum value
  • len: Calculates the count of values
Jupyter Notebook interface showing Python code

You can also define each of these functions in separate lines of code.

How to Add Grand Totals to the Pivot Table

No data asset is complete without the grand totals. To calculate and display the grand totals per data column, use the margins and margins_name function.

        df.pivot_table(index = ["Category"], values = "Sales", aggfunc = [sum, max, min, len], margins=True, margins_name='Grand Totals')
    

Where:

  • margins: Function for calculating the grand total
  • margins_name: Specify the name of the category in the index column (for example, Grand Totals)

Modify and Use the Final Code

Here's the final code brief:

        import pandas as pd

# replace with your own path here
path = "C://Users//user/OneDrive//Desktop//"

# you can define the filename here
file = "Sample - Superstore.xls"

df = pd.read_excel(path + file)

df.pivot_table(index = ["Region", "Category", "Sub-Category"], values = "Sales",
              aggfunc = [sum, max, min, len],
              margins=True,
              margins_name='Grand Totals')

Creating Pivot Tables in Python

When you are using Pivot tables, the options are simply endless. Python lets you easily handle vast data arrays without worrying about data discrepancies and system lags.

Since Python's functionalities are not restricted to just condensing data into pivots, you can combine multiple Excel workbooks and sheets, while performing a series of related functions with Python.

With Python, there is something new on the horizon always.