Excel spreadsheets can be powerful, but some tasks may call for more programming power. Enter Python and the Openpyxl module.

Openpyxl provides functions that allow you to work with an Excel file from Python. You can use it to process data, create new spreadsheets, and even manipulate formulae.

The library is handy if you need to automate the same repetitive calculations over many Excel files. This may be the case if you need to deal with big data or perform data analysis.

How to Install the Openpyxl Module

First, install Openpyxl onto your computer with these steps:

  1. Open a command prompt and run the following:
            pip install openpyxl
        
  2. If the above does not work, you can download the module from Openpyxl’s Download Files page:
    Browser open to page where you can download Openpyxl
    1. Download the openpyxl-version.tar.gz file.
    2. Extract the file’s contents.
    3. Open a command prompt. Navigate to the folder of the extracted files and run the following command:
              py setup.py install
          

Create Your Excel Files and Test Data

Create many Excel files and add some test data to them. If you are new to Excel, there are lots of tools that can help you learn it.

  1. Create several Excel files in the same folder. In this scenario, there are five Excel files. Each file stores population data for three different countries.
    Windows File Explorer open showing many Excel files
  2. In each Excel file, add some test data. For example:
    Excel File opened with table of population data

This example includes data relating to the population of many countries.

How to Create the Python Script

Create the Python Script as follows:

  1. Create a new file called dataAnalysisScript.py. Open it using any good text editor, like Visual Studio Code or Atom.
  2. Add the following three imports at the top of the file. "os" and "sys" relate to accessing files on your computer or closing the program. "openpyxl" is the module downloaded in the previous step that allows you to read and write Excel files.
            import openpyxl
    import os
    import sys
  3. Below the imports, add the basic structure of the Python script below. Keep in mind that the code structure in Python depends on correct indentation.
            # Main, start of the program
    if __name__ == "__main__":
        while True:
            # Write code here
            # When the code finishes, close the program.
            sys.exit()
  4. Inside the while loop, add a prompt for the user to enter the path to the folder that contains the Excel files.
            if __name__ == "__main__":
      while True:
        # Asks the user to enter the filepath of the excel file.
        filePath = input('Please enter the path of the folder where the excel files are stored: ')
        # Goes inside that folder.
        os.chdir(filePath)
        # Gets the list of excel files inside the folder.
        excelFiles = os.listdir('.')

How to Read and Write Data From the Excel Files

Use a for loop to open, read, manipulate, and close each of the Excel files.

  1. Add a for loop for each Excel file. Inside the loop, open each file.
            # For each Excel file
    for i in range(0, len(excelFiles)):
        # This is using the openpyxl module to open the Excel file.
        wb = openpyxl.load_workbook(excelFiles[i])
        sheet = wb.active
  2. The following code reads a specific value from a particular cell:
            cellValue = sheet[f'B3'].value
  3. The following code writes data to the cell "A10":
            sheet<strong>[</strong>f'A10'<strong>].</strong>value <strong>=</strong> 56
        
    You can also write to the cell by specifying the row and column number. The following code adds the number "2" into cell "F1".
            sheet.cell(row=1, column=6).value = 2

How to Perform Built-In Excel Functions Using the Python Script

For each Excel file, calculate the sum, average, and standard deviation for each "country" column. The following code calculates the sum of all numbers for each country:

        sheet[f'B11'].value = '=SUM(B4:B9)' 
sheet[f'C11'].value = '=SUM(C4:C9)'
sheet[f'D11'].value = '=SUM(D4:D9)'
  • '=SUM(B4:B9)' is the Excel calculation that will add all numbers in the B column between cells B4 and B9.
  • sheet[f'B11'].value is the value in cell B11. The program will store the final result in this cell.

You can also calculate the average and standard deviation Excel functions in the same way shown above.

        sheet[f'B12'].value = '=AVERAGE(B4:B9)'
sheet[f'C12'].value = '=AVERAGE(C4:C9)'
sheet[f'D12'].value = '=AVERAGE(D4:D9)'
        sheet[f'B13'].value = '=STDEV(B4:B9)' 
sheet[f'C13'].value = '=STDEV(C4:C9)'
sheet[f'D13'].value = '=STDEV(D4:D9)'

Excel File opened with Sum, Average, and Standard Deviation Filled

How to Write to the File and Close the Workbook

The remaining code saves and closes each workbook, then closes the program.

  1. Inside the for loop, after performing the required Excel functions, save the changes made to the Excel file.
            wb.save(excelFiles[i])
    print(excelFiles[i] + ' completed.')
  2. Outside the for loop, close the Python Script.
            sys.exit()
        

The code altogether:

        # Main, start of the program
if __name__ == "__main__":
  while True:
    filePath = input('Please enter the path of the folder where the excel files are stored: ')
    os.chdir(filePath)
    excelFiles = os.listdir('.')
    
    # For each excel file
    for i in range(0, len(excelFiles)):
    
        wb = openpyxl.load_workbook(excelFiles[i])
        sheet = wb.active
        
        # Perform all required Excel functions as shown above here
        
        # Save and close the workbook
        wb.save(excelFiles[i])
        print(excelFiles[i] + ' completed.')
    
    # When the code finishes, close the program.
    sys.exit()

How to Run the Python Script

  1. Open the command prompt. Navigate to where you stored your script. If you stored your script on the Desktop, the command would look something like this:
            cd C:\Users\Sharl\Desktop
        
  2. Enter the following to run the script:
            python dataAnalysisScript.py
        
  3. The script will start by asking you to enter the folder that stores all your Excel files. As an example, if you stored this folder on the desktop, the file path would be:
            C:\Users\Sharl\Desktop\CountryData
        
  4. Open one of the Excel files in the folder to view the changes made.

Make sure none of the Excel files are open when you run the script.

Automating More With Python

Now you have the basics of how to perform Excel functions inside a Python Script. You can now learn how to perform many more Excel functions, as well as many other ways to automate Excel files. This includes using a library like Pandas to import data, or even using Visual Basic to perform repetitive tasks in Excel.