Python, as a language, is valuable beyond measure, especially when you want to work with structured data. Since people store a lot of data in Excel files, it is imperative to consolidate multiple files to save time and effort.

Python lets you do exactly that; no matter how many Excel files you want to combine, you can do it with relative ease. Given its range of libraries and third-party resources, you can import and use Python's multi-faceted tools to do your bidding.

In this guide, you will need to install and use the Pandas libraries to import data into Python before consolidating it.

Install Pandas Libraries in Python

Pandas is a third-party library that you can install in Python. Some IDEs already have Pandas installed in them.

If you are using an IDE version that doesn't come with pre-installed Pandas, rest assured, you can install it directly in Python.

Here's how to install Pandas:

        pip install pandas
    

If you're using Jupyter Notebook, you can install Pandas directly with the PIP command. Mostly, when you have installed Jupyter with Anaconda, there are high chances of already having Pandas available for direct use.

If you cannot call Pandas, you can use the above command to install them directly.

Combining Excel Files With Python

First, you need to create a folder in your preferred location with all the Excel files. Once the folder is ready, you can start writing the code to import the libraries.

You will use two variables in this code:

  1. Pandas: The Pandas library provides the data frames to store Excel files.
  2. OS: The library is beneficial for reading data from your machine's folder

To import these libraries, use these commands:

        Import Pandas as pd

Import OS
  • Import: Python syntax used to import the libraries in Python
  • Pandas: Name of the library
  • pd: Alias given to the library
  • OS: A library to access the system folder

Once you have imported the libraries, create two variables to store the input and output file path. The input file path is needed to access the files' folder. The output file path is necessary as the combined file will be exported there.

If you are using Python, make sure you change the backslash to forward-slash (\ to /)

        input_file_path = "C:/Users/gaurav/OneDrive/Desktop/Excel files/"

output_file_path = "C:/Users/gaurav/OneDrive/Desktop/"

Append the / at the end as well to complete the paths.

The folder's files are available in a list. Create a list to store all the file references of the input folder using the listdir function from the OS library.

If you are unsure of the functions available within a library, you can use the dir function with the library name. For example, to check the exact version of the listdir function, you can use the command as follows:

        dir(OS)
    

The output will consist of all the associated functions available within the OS library. The listdir function is one of the many functions available within this library.

Create a new variable to store the input files from the folder.

        excel_file_list = os.listdir(input_file_path)
    

Print this variable to see the names of the files stored within the folder. All files stored within the folder are displayed once you use the print function.

        print (excel_file_list)
    

Next, you need to add a new data frame to store each Excel file. Imagine a data frame as a container for storing data. Here's the command for creating a data frame.

        df = pd.DataFrame()
    
  • df: Variable to store the value of the DataFrame
  • pd: Alias for the Pandas library
  • DataFrame: Default syntax for adding a data frame

The input folder has three .xlsx files in this example. The file names are:

        File1_excel.xlsx

File2_excel.xlsx

File3_excel.xlsx
Files folder

To open each file from this folder, you need to run a loop. The loop will run for each of the files in the list created above.

Here's how you can do it:

        for excel_files in excel_file_list:
    

Next, it's necessary to check the extensions of the files since the code will open XLSX files only. To check these files, you can use an If statement.

Use the endswith function for this purpose, as follows:

        for excel_files in excel_file_list:

   

   if excel_files.endswith(".xlsx"):
  • excel_files: List with all the file values
  • endswith: Function to check the extension of the files
  • (".xlsx"): This string value can change, depending on what you want to search for

Now that you've identified the Excel files, you can create a new data frame to read and store the files individually.

        for excel_files in excel_file_list:



   if excel_files.endswith(".xlsx"):

       

       df1 = pd.read_excel(input_file_path+excel_files)
  • df1: New data frame
  • pd: Pandas library
  • read_excel: Function to read Excel files within the Pandas library
  • input_file_path: Path of the folder where the files are stored
  • excel_files: Any variable which is used in the for loop

To start appending the files, you need to use the append function.

        for excel_files in excel_file_list:



   if excel_files.endswith(".xlsx"):

       

       df1 = pd.read_excel(input_file_path+excel_files)

       df = df.append(df1)

Finally, now that the consolidated data frame is ready, you can export it to the output location. In this case, you are exporting the data frame to an XLSX file.

        df.to_excel(output_file_path+"Consolidated_file.xlsx")
    
  • df: Dataframe to export
  • to_excel: Command used to export the data
  • output_file_path: Path defined for storing the output
  • Consolidated_file.xlsx: Name of the consolidated file

Now, let's look at the final code:

        #Pandas is used as a dataframe to handle Excel files

import pandas as pd

import os



# change the slash from “\” to “/”, if you are using Windows devices



input_file_path = "C:/Users/gaurav/OneDrive/Desktop/Excel files/"

output_file_path = "C:/Users/gaurav/OneDrive/Desktop/"



#create a list to store all the file references of the input folder using the listdir function from the os library.

#To see the contents of a library (like the listdir function, you can use the dir function on the library name).

#Use dir(library_name) to list contents



excel_file_list = os.listdir(input_file_path)



#print all the files stored in the folder, after defining the list

excel_file_list





#Once each file opens, use the append function to start consolidating the data stored in multiple files



#create a new, blank dataframe, to handle the excel file imports

df = pd.DataFrame()



#Run a for loop to loop through each file in the list

for excel_files in excel_file_list:

   #check for .xlsx suffix files only

   if excel_files.endswith(".xlsx"):

       #create a new dataframe to read/open each Excel file from the list of files created above

       df1 = pd.read_excel(input_file_path+excel_files)

       #append each file into the original empty dataframe

       df = df.append(df1)



#transfer final output to an Excel (xlsx) file on the output path

df.to_excel(output_file_path+"Consolidated_file.xlsx")
Jupyter Notebook code snippet

Using Python to Combine Multiple Excel Workbooks

Python's Pandas is an excellent tool for beginners and advanced users alike. The library is used extensively by developers who want to master Python.

Even if you are a beginner, you can benefit immensely by learning the nuances of Pandas and how the library is used within Python.