Large data assets are messy, especially when you have to pull them from websites, servers, or other data sources.

UI-based applications like MS Excel are good for dealing with simple data sets, but can struggle when the data gets bigger. This is a good reason for you to move to Python to carry out more complex data-based operations.

Python’s third-party library, Pandas, goes a long way in helping you sort your existing datasets quickly. If you’re looking to sort your data in Python, this article looks at a few ways to achieve this task.

Prerequisites for Using Python to Sort Data

Before sorting your data in Python, you need to take care of a few prerequisites:

  • Download a Python IDE. You can use a Python-compatible IDE, such as Jupyter Notebook, PyCharm, and Spyder, amongst others. Each of these is compatible with all Python versions.
  • Install pandas. You’ll need the pandas package which you can install using PIP or your preferred method.
  • Sample dataset. Download a sample dataset to practice the listed codes. Alternatively, you can use these procedures on your exclusive data.

Importing the Pandas Library in Python

Pandas is a third-party Python library that you can use to handle Excel, CSV, and other data formats.

To work with a sample Excel file, start by importing the pandas library. After that, you’ll use the import procedure to read the Excel data into Python.

To Import the Library

        import pandas as pd
    

Create a New DataFrame to Load the Excel Data

        file = "Sample - Superstore.xls"
df = pd.read_excel(file)
df.head()

Where:

  • df is a DataFrame object which stores the imported data.
  • pd is an alias for the Pandas library.
  • read_excel is a method to read the Excel file into Python.
  • file is a path to the Excel file.
  • head is a method that returns the first five rows from the DataFrame.
Jupyter Notebook interface showing Python code

Once your program has loaded the data, you can use the many available DataFrame methods to sort it in various ways.

1. Sorting by a Single Column in a DataFrame

Since your data will have a lot of rows and columns, you will often want to sort the data based on a specific column or columns.

Python sorts the data in ascending order by default. If you want to change the sorting order, you must explicitly mention it in your code.

Sort By a Single Column (Ascending Order)

        df.sort_values(by = "Customer ID")
    
Jupyter Notebook interface showing Python code

Sort By a Single Column (Descending Order)

Set the ascending parameter to False to sort your column in descending order.

        df.sort_values(by = "Customer ID", ascending=False)
    
Jupyter Notebook interface showing Python code

Where:

  • df is a DataFrame object containing the data.
  • sort_values is a method to sort by data values.
  • by is a parameter to define the column name.
  • ascending is a parameter to define the sorting order.

2. Sorting Multiple Columns in a DataFrame

If your requirements call for it, you can also sort your DataFrame(s) based on multiple columns at once. In such a scenario, you must define the column references in a list.

Sort by Multiple Columns Ascending

        df.sort_values(by = ["Customer ID", "City"])
    
Jupyter Notebook interface showing Python code

Sort by Multiple Columns Descending

Use the function ascending = False to sort your columns in descending order. Remember, you need to specify the names of the columns within a list to sort them simultaneously.

        df.sort_values(by = ["Customer ID", "City"], ascending = False)
    
Jupyter Notebook interface showing Python code

Sorting by Multiple Columns in Different Sort Orders

With the basics of sorting out of the way, what happens when you want to sort one column in descending order and another in ascending order? You need to tweak your code slightly to incorporate these requirements.

For example, to sort the Region and City columns in descending and ascending order, respectively:

        df.sort_values(by = ["Region", "City"], ascending = [False, True])
    
 Jupyter Notebook interface showing Python code

The explanation of this code is simple; you define the DataFrame name and pass the sort_values function along with the column names in a list. You should use Boolean values to specify the sort order.

Calling the function like this means that Python will sort by the DataFrame’s Region column in descending order, first. Then, rows with an identical Region will be further sorted by the City column, in ascending order.

3. How to Sort Columns in a DataFrame by Index

The index variable is the default value assigned to each row within a Python Dataframe. You can define the index values or let Python set an index value on its own.

To sort the data by its index value, you can use the sort_index function. This function sorts based on the index rather than on any values contained within the original dataset.

        df.sort_index()
    
Jupyter Notebook interface showing Python code

As with sort_values, you can pass an ascending parameter to specify the direction of the sort. For example, pass a value of False to sort data in descending order:

        df.sort_index(ascending = False)
    
Jupyter Notebook interface showing Python code

4. Sorting Columns in a DataFrame Instead of Rows

Instead of sorting the rows in a DataFrame, you can sort its columns. You can do so by calling the sort_index method and passing it an axis parameter with a value of 1:

        df.sort_index(axis=1)
    
Jupyter Notebook interface showing Python code

This step sorts the DataFrame, by its columns, in ascending order. To sort the DataFrame's columns in descending order, you can specify the sorting order in your sort step.

        df.sort_index(axis=1, ascending = False)
    
Jupyter Notebook interface showing Python code

5. Modifying the DataFrame While Sorting It

The two sort methods work by returning a copy of the original data, in its newly sorted state. To save storage space, or simply to write more concise code, you can modify the original DataFrame data instead. Each method accepts an inplace boolean parameter which modifies the data rather than returning a modified copy.

        df.sort_values(by = ["Customer ID", "City"], ascending = False, inplace = True)
    
Jupyter Notebook interface showing Python code

Learning to Sort Data in Python

Python replicates many of Excel’s built-in functions with a few lines of code. From sorting procedures to creating elaborate Pivot tables on your data, you name it, and you can do it in Python.

If you’re still new to Python and learning the ropes, these steps will enhance your coding skills relatively easily.