How To Generate Dummy Data in Ruby, Perl and Python
You’re building a web application, and you need some realistic information to shove into it. You need to check that your validation functions work perfectly, and see that your product actually works.
The only problem is, you can’t really use real-world data. There are just far too many legal and ethical considerations you need to make. Indeed, in some jurisdictions, there are specific legal obstacles to using real-world data in development environments. Take, for instance, the UK.
Here, there’s something called the Data Protection Act, 1998. It’s quite unambiguous with how companies are allowed to handle the data it retains:
Personal data shall be obtained only for one or more specified and lawful purposes, and shall not be further processed in any manner incompatible with that purpose or those purposes.
Or, in other words data can only be used within a context agreed with the person who has provided their data, albeit with a handful of exceptions. As a result, it’s often not possible to use personal data in a testing or development environment. So, how do we get around this?
Easy. We generate fake data. But what if you need to generate huge amounts of realistic data? Thankfully, there are a number of libraries called Faker which programatically create dummy personal information, including names, email addresses and phone numbers.
I’m a big Ruby fan. There’s a lot to love with this language, including one of the best package managers out there, a friendly and welcoming developer community and a healthy ecosystem of third-party libraries. It’s also ludicrously easy to learn .
To get your hands on the Faker library for Ruby, you will first need to make sure you have RubyGems installed. You can grab a binary for your development platform of choice on the official RubyGems website.
Then, install Faker from the command line:
gem install faker
You may need to install it as root. If so, run:
sudo gem install faker
And then fire up your favorite text editor. We’re now going to create some fake names!
require 'faker' puts Faker::Name.name
So, we import the faker module, and then print out some names. When you run this, you should see something like this.
Okay, let’s add some other stuff. We’re going to generate some (algorithmically valid) credit card numbers, an email address and a street address. Add the following lines.
puts Faker::Address.street_address puts Faker::Business.credit_card_number puts Faker::Internet.email
Run that again. You’ll see something like this.
Perl ain’t dead. No, sir-e. Whilst it’s hardly the hippest, trendiest language on the block right now, it still has its fans. Unsurprisingly, there’s a. But how do you use it?
Well, first you need to install it. I’m assuming you have Perl and CPAN installed. If not, install it. If you are using Windows, may I recommend you install Strawberry Perl, which is a mature, community supported implementation of Perl for Windows XP to 8.1.
In a command prompt, run:
You may be prompted for your root password, so don’t walk away. Then, open up your favorite text editor and create a file called ‘data.pl’. Inside, add the following lines.
use Data::Faker; my $faker = Data::Faker->new(); print $faker->name."\n"; print $faker->street_address."\n"; print $faker->email."\n";
This should make a fair bit of sense. We import the Data::Faker libraries, instantiate the Faker object and then print out a name, street address and email. You might notice we’re not creating credit card numbers here, however. That’s because the Perl port is significantly more limited than the Ruby port.
When you run it, you should see something like this.
Let’s move on to Python. I write about Python a lot , and it’s without a doubt my favorite language to code in. If you’re tempted to give it a try, check out this article from my colleague Joel Lee about sites where you can learn to program in Python. It also turns out that Faker has been ported to this awesome language. The Python port of Faker is unique with respect to how it allows you to create fake information specific to a locale. Here’s how you can use it.
Firstly, install Faker. On Python, it goes by the name of ‘fake-factory’. I’m assuming that you have a current install of pip and Python installed. If not, install it.
pip install fake-factory
And then open up a text editor and add the following lines.
from faker import Factory fake = Factory.create() print(fake.name()) print(fake.street_address())
Run it, and you’ll see this.
Okay, but what about those other locales we discussed? Suppose we want to generate fake information that is specific to France? That’s easy. We just pass Factory.create() a corresponding ISO language code string. So, for French, we write:
fake = Factory.create('fr_FR')
Which (when executed) produces this:
Faker is a powerful tool for those building tools where they need access to realistic information, without breaking any data protection rules. Whilst support isn’t consistent (or complete) across all languages, it remains a pretty useful tool.
As always, let me know your thoughts on this post and drop me a comment.