If you use Python, even for the simplest tasks, you’re probably aware of the importance of its third-party libraries. The Pandas library, with its excellent support for DataFrames, is one such library.

You can import multiple types of file into Python DataFrames and create various versions to store different data sets. Once you import your data using DataFrames, you can merge them to perform detailed analysis.

Tackling the Basics

Before you get started merging, you need to have DataFrames to merge. For development purposes, you can create some dummy data to experiment with.

Create the DataFrames in Python

As a first step, import the Pandas library into your Python file. Pandas is a third-party library that handles DataFrames in Python. You can use the import statement to use the library, as follows:

        import pandas as pd
    

You can assign an alias to the library name to shorten your code references.

You need to create dictionaries, which you can convert into DataFrames. For best results, create two dictionary variables—dict1 and dict2—to store specific pieces of information:

        dict1 = {"user_id": ["001", "002", "003", "004", "005"],
        "FName": ["John", "Brad", "Ron", "Roald", "Chris"],
        "LName": ["Harley", "Cohen", "Dahl", "Harrington", "Kerr-Hislop"]}
 
dict2 = {"user_id": ["001", "002", "003", "004"], "Age": [15, 28, 34, 24]}

Remember, you need to have a common element in both dictionary values, to act as the primary key for combining your DataFrames later.

Convert Your Dictionaries Into DataFrames

To convert your dictionary values into DataFrames, you can use the following method:

        df1 = pd.DataFrame(dict1)
df2 = pd.DataFrame(dict2)

Some IDEs let you check the values within the DataFrame by referencing the DataFrame function and pressing Run/Execute. There are many Python-compatible IDEs, so you can pick and choose the one which is the easiest for you to learn.

Jupyter Notebook code snippet

Once you’re satisfied with the contents of your DataFrames, you can move on to the merging step.

Combining Frames With the Merge Function

The merge function is the first Python function you can use to combine two DataFrames. This function takes the following default arguments:

        pd.merge(DataFrame1, DataFrame2, how= type of merge)
    

Where:

  • pd is an alias for the Pandas library.
  • merge is the function that merges DataFrames.
  • DataFrame1 and DataFrame2 are the two DataFrames to merge.
  • how defines the merge type.

Some extra optional arguments are available, which you can use when you have a complex data structure.

You can use different values for the how parameter to define the type of merge to carry out. These types of merge will be familiar if you’ve used SQL to join database tables.

Left Merge

The left merge type keeps the first DataFrame’s values intact and pulls the matching values from the second DataFrame.

Jupyter Notebook code snippet

Right Merge

The right merge type keeps the second DataFrame's values intact and pulls the matching values from the first DataFrame.

Jupyter Notebook code snippet

Inner Merge

The inner merge type retains the matching values from both DataFrames and removes non-matching values.

Jupyter Notebook code snippet

Outer Merge

The outer merge type retains all matching and non-matching values and consolidates the DataFrames together.

Jupyter Notebook code snippet

How to Use the Concat Function

The concat function is a flexible option compared to some of Python’s other merge functions. With the concat function, you can combine DataFrames vertically and horizontally.

However, the drawback of using this function is that it discards any non-matching values by default. Like some other related functions, this function has a few arguments, of which only a few are essential for a successful concatenation.

        concat(dataframes, axis=0, join='outer'/’inner’)
    

Where:

  • concat is the function that joins DataFrames.
  • dataframes is a sequence of DataFrames to concatenate.
  • axis represents the direction of concatenation, 0 being horizontal, 1 being vertical.
  • join specifies either an outer or inner join.

Using the above two DataFrames, you can try out the concat function as follows:

        # define the dataframes in a list format
df_merged_concat = pd.concat([df1, df2])
 
# print the results of the Concat function
print(df_merged_concat)

The absence of the axis and join arguments in the above code combines the two datasets. The resulting output has all the entries, irrespective of the match status.

Similarly, you can use additional arguments to control the direction and output of the concat function.

To control the output with all matching entries:

        # Concatenating all matching values between the two dataframes based on their columns
df_merged_concat = pd.concat([df1, df2], axis=1, join = 'inner')
 
print(df_merged_concat)

The result contains all the matching values between the two DataFrames only.

Jupyter Notebook code snippet

Merging DataFrames With Python

DataFrames are an integral part of Python, considering their flexibility and functionality. Given their multi-faceted uses, you can use them extensively to perform a variety of tasks with utmost ease.

If you’re still learning about Python DataFrames, try importing some Excel files, then combine them with different approaches.