How to Back Up Your Linux Device to Amazon S3
Code 42, the company behind CrashPlan has decided to completely abandon home users . Their super competitive pricing made their backup solution such a temptation for people with large backup needs. While their failure to keep their promises may have sown seeds of distrust, there are other cloud providers. But which provider do you trust with your archive of memes?
Currently the world leader when it comes to cloud computing is Amazon Web Services (AWS). The learning curve for AWS can seem steep, but in reality, it’s simple. Let’s find out how to take advantage of the world’s leading cloud platform.
Simple Storage Solution
The Simple Storage Solution, commonly referred to as S3 is Amazon’s behemoth of a storage solution. Some notable companies that use S3 include Tumblr, Netflix, SmugMug and of course, Amazon.com. If your jaw is still attached to your face, AWS guarantees 99.99999999999 percent durability for its standard option and a maximum file size (of any single file) of five terabytes (5 TB). S3 is an object store which means it isn’t designed for installing and running an operating system on but is geared perfectly for backups.
The Tiers and Pricing
By far and large this is the most complicated part of S3. The pricing varies from region to region, and our example uses current pricing for the U.S. (North Virginia) region. Take a look at this table:
S3 is made up of four storage classes. Standard obviously is the standard option. Infrequently Accessed is cheaper overall to store your data on, but is more expensive to get your data in and out of. Reduced Redundancy is generally used for data you can regenerate if lost, like image thumbnails for example. Glacier is used for archival storage as it is the cheapest to store on. However, it will take between three to five hours before you can retrieve a file from Glacier. With glacier or cold storage, you get reduced per-gigabyte costs but increased usage costs. That makes cold storage better suited to archival and disaster recovery. Businesses generally take advantage of a combination of all the classes to further reduce costs.
The best in each category is marked in blue. Durability is how unlikely your file is of being lost. Bar Reduced Redundancy, Amazon will have to suffer catastrophic loss in two data centres before your data is lost. Basically, AWS will store your data in multiple facilities with all classes except the reduced redundancy class. Availability is how unlikely there is to be downtime. The rest is easier demonstrated by means of an example.
Our use case is as follows.
I want to store ten files on S3 Standard with a total size of one gigabyte (1 GB). Uploading the files or Put will incur the request in cost of $0.005 and $0.039 for the total storage. That means in the first month you will be charged a total of about 4.5 cents ($0.044) and just under 4 cents ($0.039) for parking your data thereafter.
Why is there such a complicated pricing structure? This is because it is pay-for-what-you-use. You never pay for anything you don’t use. If you think about a large-scale company, this offers all the advantages of having a world class storage solution, while keeping costs to an absolute minimum. Amazon also provide a Simple Monthly Calculator which you can find here, so you can project your monthly spend. Fortunately, they also offer a free tier, which you can sign up for here, so you can trial their services for up to 12 months. As with anything new, once you start using it, the more comfortable and understandable it becomes.
The free tier from AWS allows you to try out all their services, to an extent, for a full year. Within the free tier, S3 gives you 5 GB of storage, 20,000 gets and 2,000 puts. This should allow for ample breathing room to trial AWS and decide if it fits your requirements. The sign up for AWS takes you through a few steps. You will need a valid credit or debit card, and a phone for verification purposes. Once you launch the management console you will be welcomed into the AWS dashboard.
At a quick glance, it may feel like there is an awful lot to take in, and that’s simply because there is. The main items you will be accessing, which are annotated in the screenshot, are:
- Services: Surprise, surprise this is where you will find all the AWS services.
- Account: To access your profile and billing.
- Region: This is the AWS region in which you are working in.
Because you want the lowest latency between your computer(s) and AWS, pick a region that is closest to you. There are some regions that don’t have all of the AWS services but they do get rolled out on an on-going basis. Fortunately for us, S3 is available in all regions!
Before continuing, the first job is securing your account. Click on Services > Security, Identity & Compliance > IAM. In the process, we’re also going to grant the necessary permissions to your computer, so that you can backup and restore securely.
This is a simple five-step process. You’ll note from the screenshot that MFA can be activated on your account. Although multi-factor authentication (MFA), also known as two-factor authentication (2FA) , is not required, it is highly recommended. In a nutshell, it requires a combination of your username and password, along with a code on your mobile device. You can either get a compatible physical MFA device or use an app like Google Authenticator. Head over to either the App Store or the Play Store to download the Google Authenticator app.
Using the Optional Multi-Factor Authentication
Expand Activate MFA on your root account and click on Manage MFA. Make sure A virtual MFA device is selected and click Next Step.
Open Google Authenticator on your device and scan the barcode that is on screen. Type in the Authorisation code in the Authorisation code 1 box and wait for the code to refresh in Google Authenticator. It takes around 30 seconds for the next code to be displayed. Type in the new code in the Authorisation code 2 box from Google Authenticator. Now click on the Activate Virtual MFA button. Once you refresh your screen, Activate MFA will have the green tick.
You should now have MFA activated on your account, and have linked the Google Authenticator to AWS. The next time you sign in to the AWS Console, you will type your username and password as normal. AWS will then prompt you for an MFA code. This will be obtained from the Google Authenticator app just as you’ve done in the previous step.
Groups and Permissions
It’s time to decide the level of access your computer will have to AWS. The easiest and most secure way of doing so will be to create a group and a user for the computer you wish to backup. Then grant access or add a permission for that group to only access S3. There are numerous advantages with this approach. The credentials given to said group is limited to S3 and can not be used to access any other AWS services. Additionally, in the unfortunate event that your credentials are leaked, you just need to delete the group’s access and your AWS account will be safe.
It actually makes more sense to create the group first. To do this expand Create individual IAM users and click Manage Users. Click on Groups from the panel on the left followed by Create New Group. Choose a name for your group and click Next Step. Now we’re going to attach the permission or policy for this group. As you only want this group to have access to S3, filter the list by typing S3 in the filter. Ensure that AmazonS3FullAccess is selected and click Next Step finally followed by Create Group.
Create a User
All you need to do now is create a user and add it to the group you created. Select Users from the panel on the left, and click Add User. Choose any username you like, under access type make sure Programmatic Access is selected and click Next: Permissions. On the next page select the group you created and click on Next: Review. AWS will confirm that you’re adding this user to the selected group and confirm the permissions being granted. Click on Create User to move on to the next page.
You’ll now see an Access key ID and a Secret access key. These are self-generated and only displayed once. You can either copy and paste them to a secure location, or click Download .csv which will download a spreadsheet containing these details. This is the equivalent of the username and password that your computer will be using to access S3.
It’s worth noting that you should treat these with the highest level of security. If you lose your Secret access key there is no way of retrieving it. You will need to return to the AWS console and generate a new one.
Your First Bucket
The time has come to create a place for your data. S3 has stores called buckets. Each bucket name has to be globally unique, meaning when you create a bucket you will be the only one on the planet with that bucket name. Each bucket can have its own set of configuration rules set against it. You can have versioning enabled on buckets so that it will keep copies of files you update so that you can revert back to previous versions of files. There are also options for cross region replication so that you can further backup your data to another region in another country.
You can get to S3 by navigating to Services > Storage > S3. Creating a bucket is as easy as clicking the Create bucket button. After you’ve chosen a globally unique name (lowercase only), pick a region that you would like your bucket to live in. Clicking the Create button will finally give you your first bucket.
Command Line Is Life
If command line is your weapon of choice , you can access your newly created S3 bucket using s3cmd which you can download from here. After you have chosen the latest release, download the zip archive into a folder of your choosing. The current latest release is 2.0.0 which you’ll be using in our example. To unzip and install s3cmd open a terminal window and type:
sudo apt install python-setuptools unzip s3cmd-2.0.0 cd s3cmd-2.0.0 sudo python setup.py install
s3cmd is now installed on your system, and is ready to be configured and linked to your AWS account. Be sure to have your Access key ID and Secret access key to hand from when you created your user. Start by typing:
You will now be prompted to enter a few details. Firstly, you will be promoted to enter your Access key ID followed by your Secret access key. All other settings can be left as default by just hitting the enter key, except the Encryption setting. You can choose a password here so that data sent in and out of S3 is encrypted. This will prevent a man in the middle attack , or someone intercepting your internet traffic.
At the end of the configuration process s3cmd will perform a test to ensure all settings are working and you can successfully connect to your AWS account. When this is done you will be able to type a few commands like:
This will list all the buckets within your S3 account. As the screenshot below shows, the bucket you created is visible!
Sync Using Command Line
The sync command for s3cmd is extremely versatile. It’s very similar to how you would normally copy a file in Linux, and looks a little something like this:
s3cmd sync [LOCAL PATH] [REMOTE PATH] [PARAMETERS]
Test its usage with a simple sync. First, create two text files using the touch command, then use the sync command to send the files you’ve just created into the bucket created earlier. Refresh the S3 bucket; you’ll notice that the files have indeed been sent to S3! Ensure that you replace the local path with the local path on your computer as well as changing the remote path to your bucket name. To accomplish this type:
touch file-1.txt touch file-2.txt s3cmd sync ~/Backup s3://makeuseof-backup
The sync command, as mentioned, first checks and compares both directories. If a file doesn’t exist within S3 it will upload it. More so, if a file does exist, it will check if it has been updated before copying to S3. If you would like it to also delete the files that you have deleted locally, you can run the command with the –delete-removed parameter. Test this by first deleting one of the text files that we’ve created followed by the sync command with the additional parameter. If you then refresh your S3 bucket, the deleted file has now been removed from S3! To try this, type:
rm file-1.txt s3cmd sync ~/Backup s3://makeuseof-backup --delete-removed
At a glance, you can see how compelling this method is. If you wanted to backup something to your AWS account, you could add the sync command to a cron job and back your computer up automatically to S3.
The GUI Alternative
If the command line isn’t your thing, there is a graphical user interface (GUI) alternative to s3cmd: Cloud Explorer. While it doesn’t have a very modern interface, it does have some interesting features. Ironically the easiest method to getting your hands on the latest version is via the command line. Once you’ve got a terminal window open with a folder you would like to install it in, type:
sudo apt -y install openjdk-8-headless ant git git clone https://github.com/rusher81572/cloudExplorer.git cd cloudExplorer ant cd dist java -jar CloudExplorer.jar
When the interface launches, some of the required fields should already seem familiar. To get your AWS account loaded enter your Access Key, Secret Key, and provide a name for your account, and click Save.
You can now click on your saved profile and get access to your bucket.
Exploring the Explorer
Taking a quick look at the interface, you’ll see the following:
- Explore and search
- Upload files
- Text editor
- A panel for a list of your buckets
- A panel for navigating a selected bucket
Setting up the sync capabilities of Cloud Explorer is similar to s3cmd. Firstly, create a file that doesn’t exist within the S3 bucket. Then, click on the Sync button within Cloud Explorer and browse to the folder you would like to sync with S3. Clicking on To S3 will check the differences between the folder on your local computer and the folder with S3 and upload any differences it finds.
When you refresh the S3 bucket in the browser, you’ll notice that the new file has been sent to S3. Unfortunately, the sync function of Cloud Explorer doesn’t cater for any files that you have deleted on your local machine. So if you remove a file locally, it will still remain in S3. This is something to bear in mind.
Home Users Can Use Business-Focused Cloud Storage
While AWS is a solution engineered for businesses to take advantage of the cloud, there is no reason why home users shouldn’t get in on the action. Using the world’s leading cloud platform comes with many benefits. You never have to worry about upgrading the hardware or paying for anything you don’t use. Another interesting fact is that AWS has more market share than the next 10 vendors combined. This is an indication of how far ahead they are. Setting up AWS as a back up solution requires:
- Creating an account.
- Securing your account with MFA.
- Creating a group and assigning permissions to the group.
- Adding a user to the group.
- Creating your first bucket.
- Using command line to sync to S3.
- A GUI alternative for S3.
Do you currently use AWS for anything? Which cloud backup vendor do you use currently? What features do you look for when choosing a backup vendor? Let us know in the comments below!