The Internet Movie Database (IMDb) is the largest online database containing information related to films, television series, home videos, video games, and streaming content. The online database contains millions of accurate records that you can use to perform data analysis.

Cinemagoer (formerly known as IMDbPY) is a Python library for managing and retrieving the data of the IMDb movie database. You can access data about movies, people, and companies, that can be further used for analysis.

Installing Required Libraries

You need to install the cinemagoer Python library to access the IMDb database. Run the following command in the command prompt to install the library:

        pip install cinemagoer
    

You must have pip installed on your system to install external Python libraries.

The code used in this project is available in a GitHub repository and is free for you to use under the MIT license.

Extracting IMDb Data Using Python

You need to import the cinemagoer library before using it in your code.

        from imdb import Cinemagoer

ia = Cinemagoer()

The above code imports the cinemagoer library and creates an instance of the cinemagoer class.

Searching Movies

You can search for movies with a given (or similar) title using the search_movie() method. For example, if you want to search for movies having the title "rock", you need to run the following code:

        from imdb import Cinemagoer
 
# Creating an instance of the Cinemagoer class
ia = Cinemagoer()
 
# Searching movies that have rock in their name
movies = ia.search_movie('rock')
print(movies[0])

This should print out the first movie it finds, for example:

python imdb search movies

You can get a movie by its IMDb ID. You can then extract further information like director names, and genres. You need to loop through the list to get individual information.

        from imdb import Cinemagoer
 
# Creating an instance of the Cinemagoer class
ia = Cinemagoer()
 
# Getting movie by IMDb ID
movie = ia.get_movie('0468569')
print(movie)
 
# Printing the names of the directors of the movie
print('Directors:')
 
for director in movie['directors']:
    print(director['name'])
 
# printing the genres of the movie
print('Genres:')
 
for genre in movie['genres']:
    print(genre)

In the output, you should see the name of the given movie, its director(s), and its genre(s):

python imdb search movies by id

Searching for a Person

You can search for people using the search_person() method. For example, if you want to search for "Heath", you need to run the following code:

        from imdb import Cinemagoer
 
# Creating an instance of the Cinemagoer class
ia = Cinemagoer()
 
# Searching for people having Heath in their names
persons = ia.search_person('Heath')
print(persons[0])

You'll see the name of the first matching person the search finds:

python terminal output imdb search person

Searching Companies

You can search for companies using the search_company() method. For example, if you want to search for "Universal", you need to run the following code:

        from imdb import Cinemagoer
 
# Creating an instance of the Cinemagoer class
ia = Cinemagoer()
 
# Searching for companies having Universal in their names
companies = ia.search_company('Universal')
print(companies)

You'll get the list of all companies that have Universal in their name.

You can also retrieve a person and company data using its ID.

        from imdb import Cinemagoer
 
# Creating an instance of the Cinemagoer class
ia = Cinemagoer()
 
# Getting person data by ID
person = ia.get_person('0005132')
print(person['name'])
print(person['birth date'])
 
# Getting company data by ID
company = ia.get_company('0005073')
print(company['name'])

The output will show details of the person and the name of a company:

python imdb person company data by id

Finding Top and Bottom Movies

You can retrieve the data for top 250 and bottom 100 movies using the get_top250_movies() and get_bottom100_movies() methods, respectively:

        from imdb import Cinemagoer
 
# Creating an instance of the Cinemagoer class
ia = Cinemagoer()
 
# Finding the top 250 movies
top = ia.get_top250_movies()
print(top[0])
 
# Finding the bottom 100 movies
bottom = ia.get_bottom100_movies()
print(bottom[0])

In response, you'll see the name of the best movie, and the name of the worst:

python imdb top and bottom movies

The cinemagoer library also provides some other methods like get_top250_tv(), get_popular100_movies(), and get_top250_indian_movies().

Learn to Use Data Analytics Software Tools

Data analysis is the evaluation of data using analytical or statistical tools to extract information. The popularity of data analysis is growing every day. It's now used by businesses, marketing companies, and sports teams. The complete process of data analytics includes defining objectives, posing questions, data collection, data scrubbing, data analysis, and concluding results.

You can get datasets for your projects using Python libraries like Cinemagoer or via online platforms like Kaggle. Alongside full languages like Python and R, you can use other tools like Microsoft Excel, Tableau, and Stata to perform data analysis.