Wordpress & Web Development

How To Generate Dummy Data in Ruby, Perl and Python

Matthew Hughes 10-04-2014

You’re building a web application, and you need some realistic information to shove into it. You need to check that your validation functions work perfectly, and see that your product actually works.


The only problem is, you can’t really use real-world data. There are just far too many legal and ethical considerations you need to make. Indeed, in some jurisdictions, there are specific legal obstacles to using real-world data in development environments. Take, for instance, the UK.

Here, there’s something called the Data Protection Act, 1998. It’s quite unambiguous with how companies are allowed to handle the data it retains:

Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.

Or, in other words data can only be used within a context agreed with the person who has provided their data, albeit with a handful of exceptions. As a result, it’s often not possible to use personal data in a testing or development environment. So, how do we get around this?

Easy. We generate fake data. But what if you need to generate huge amounts of realistic data? Thankfully, there are a number of libraries called Faker which programatically create dummy personal information, including names, email addresses and phone numbers.

In this article, I’m going to show you how to use these libraries within a number of popular languages, including Ruby, Perl, Python and JavaScript.



I’m a big Ruby fan. There’s a lot to love with this language, including one of the best package managers out there, a friendly and welcoming developer community and a healthy ecosystem of third-party libraries. It’s also ludicrously easy to learn 3 Interactive, Fun, Free Ways To Start Learning The Ruby Programming Language Ruby is an expressive, very high-level, scripting language. It is used on the Web mainly as part of the Ruby on Rails web development framework, but also standalone. If you’re curious about what Ruby (not... Read More .

To get your hands on the Faker library for Ruby, you will first need to make sure you have RubyGems installed. You can grab a binary for your development platform of choice on the official RubyGems website.

Then, install Faker from the command line:

gem install faker

You may need to install it as root. If so, run:

sudo gem install faker

And then fire up your favorite text editor. We’re now going to create some fake names!

require 'faker'
puts Faker::Name.name

So, we import the faker module, and then print out some names. When you run this, you should see something like this.


Okay, let’s add some other stuff. We’re going to generate some (algorithmically valid) credit card numbers, an email address and a street address. Add the following lines.

puts Faker::Address.street_address
puts Faker::Business.credit_card_number
puts Faker::Internet.email

Run that again. You’ll see something like this.



Perl ain’t dead. No, sir-e. Whilst it’s hardly the hippest, trendiest language on the block right now, it still has its fans. Unsurprisingly, there’s a port of Faker for Perl. But how do you use it?

Well, first you need to install it. I’m assuming you have Perl and CPAN installed. If not, install it. If you are using Windows, may I recommend you install Strawberry Perl, which is a mature, community supported implementation of Perl for Windows XP to 8.1.


In a command prompt, run:

cpan Data::Faker

You may be prompted for your root password, so don’t walk away. Then, open up your favorite text editor and create a file called ‘data.pl’. Inside, add the following lines.

use Data::Faker;
my $faker = Data::Faker->new();
print $faker->name."\n";
print $faker->street_address."\n";
print $faker->email."\n";

This should make a fair bit of sense. We import the Data::Faker libraries, instantiate the Faker object and then print out a name, street address and email. You might notice we’re not creating credit card numbers here, however. That’s because the Perl port is significantly more limited than the Ruby port.

When you run it, you should see something like this.



Let’s move on to Python. I write about Python a lot Move Over Shell-Scripts: Sh.py Is Here, And It's Awesome. I bet you didn't know that you could write shell scripts in Python: sh.py allows you to call programs, pass parameters and handle outputs. Read More , and it’s without a doubt my favorite language to code in. If you’re tempted to give it a try, check out this article The 5 Best Websites to Learn Python Programming Want to learn Python programming? Here are the best ways to learn Python online, many of which are entirely free. Read More from my colleague Joel Lee about sites where you can learn to program in Python.  It also turns out that Faker has been ported to this awesome language. The Python port of Faker is unique with respect to how it allows you to create fake information specific to a locale. Here’s how you can use it.

Firstly, install Faker. On Python, it goes by the name of ‘fake-factory’.  I’m assuming that you have a current install of pip and Python installed. If not, install it.

pip install fake-factory

And then open up a text editor and add the following lines.

from faker import Factory
fake = Factory.create()

Run it, and you’ll see this.


Okay, but what about those other locales we discussed? Suppose we want to generate fake information that is specific to France? That’s easy. We just pass Factory.create() a corresponding ISO language code string. So, for French, we write:

fake = Factory.create('fr_FR')

Which (when executed) produces this:


Cool, right?


Faker is a powerful tool for those building tools where they need access to realistic information, without breaking any data protection rules. Whilst support isn’t consistent (or complete) across all languages, it remains a pretty useful tool.

It’s worth noting that whilst we discussed Faker within the context of Perl, Python and Ruby, it is also available for PHP and JavaScript, although it’s worth noting that the JavaScript port isn’t actually all that usable. The code for this article is available on my Github profile.

As always, let me know your thoughts on this post and drop me a comment.

Related topics: App Development, Programming, Python.

Affiliate Disclosure: By buying the products we recommend, you help keep the site alive. Read more.

Whatsapp Pinterest

Leave a Reply

Your email address will not be published. Required fields are marked *

  1. Brandt
    January 30, 2015 at 10:55 pm

    Did you figure out how to remove phone extensions from phone_number() ?

  2. Monte Milanuk
    April 12, 2014 at 8:46 pm

    After removing via 'pip uninstall fake-factory', I did a clean reinstall:

    monte@machin-shin:~$ sudo pip install fake-factory
    Downloading/unpacking fake-factory
    Downloading fake-factory-0.4.0.tar.gz (244kB): 244kB downloaded
    Running setup.py egg_info for package fake-factory

    Installing collected packages: fake-factory
    Running setup.py install for fake-factory
    changing mode of build/scripts-3.3/faker from 644 to 755

    changing mode of /usr/local/bin/faker to 755
    Successfully installed fake-factory
    Cleaning up...
    monte@machin-shin:~$ python data.py
    Traceback (most recent call last):
    File "data.py", line 1, in
    from faker import Factory
    ImportError: No module named faker

    ...which is weird because it *was* installed and pip didn't throw any errors.

    monte@machin-shin:~$ pip show fake-factory
    Name: fake-factory
    Version: 0.4.0
    Location: /usr/local/lib/python3.3/dist-packages

    ...okay, again, weird because I installed it using the 'system' python, which is 2.7.5 on Ubuntu 13.10, last I checked:

    monte@machin-shin:~$ python
    Python 2.7.5+ (default, Feb 27 2014, 19:37:08)
    [GCC 4.8.1] on linux2
    Type "help", "copyright", "credits" or "license" for more information.

    So... tried running the program using python3 as the interpreter instead of 2.x:

    monte@machin-shin:~$ python3 data.py
    Virginie de la Moreau
    7, boulevard Claude Baron

    I think the reason it didn't work at *all* yesterday was that when I wrote my 'test' script using the template you provided in the article, I included '#!/usr/bin/env python' at the beginning of the file without really thinking much about it, and was running it as an executable.

    I am kind of confused as to why it installed under python 3.3 when I ran pip from python 2.7?

    • Matthew H
      April 19, 2014 at 1:29 pm

      That's really, really, really weird. I have no idea what brought that on. Thanks for your thorough and detailed comment though. Our readers might well benefit from it.


  3. Matthew H
    April 12, 2014 at 1:14 pm

    Weird. Did Pip throw any errors?

    Do me a favor. Run the example from my Github and tell me what you see. https://github.com/matthewhughes/Data-Faking/blob/master/python/data.py

  4. Monte Milanuk
    April 11, 2014 at 3:15 am

    Did the 'pip install fake-factory' bit... but can't import faker:

    monte@machin-shin:~$ ./fake-data.py
    Traceback (most recent call last):
    File "./fake-data.py", line 3, in
    from faker import Factory
    ImportError: No module named faker

  5. Brian Wisti
    April 10, 2014 at 5:26 pm

    generatedata is a useful resource, and it's available to install on your own. Those are good things.

    A library written in your primary development language, directly accessible and configurable , is preferable for somebody like me. I would expect it to also have a lower overhead than making a few thousand HTTP requests - whether local or remote - but I could be wrong about that.

    • Matthew H
      April 10, 2014 at 6:08 pm

      Exactly. Lower overhead, more scalable and generally just better.

      Thanks for the comment man!

  6. steve
    April 10, 2014 at 5:11 pm

    Or you could simply use http://www.generatedata.com/

    • Matthew H
      April 10, 2014 at 5:21 pm

      You can. But damn, that's one long-winded process.

      Also, sometimes you need to generate dummy data as part of a fixtures script. It's just way easier to use a faker library.