Pinterest Stumbleupon Whatsapp
Ads by Google

Two of the most criminally under-appreciated Linux utilities are Sed and Awk. Although admittedly they can seem a bit arcane, if you ever have to make repetitive changes to large pieces of code or text, or if you ever have to analyze some text, Sed and Awk are invaluable.

So, what are they? How are they used? And how, when combined together, do they make it easier to process text?

What Is Sed?

Sed was developed in 1971 at Bell Labs, by legendary computing pioneer Lee E. McMahon.

The name stands for stream editor, and that’s kinda what it does. It allows you to edit bodies or streams of text programmatically, through a compact and simple, yet Turing-complete programming language.

The way it works is simple: it reads text, line-by-line into a buffer. For each line, it’ll perform the predefined instructions, where applicable.

For example, if someone was to write a Sed script that replaced the word “beer” with “soda”, and then passed in a text-file that contained the entire lyrics to “99 Bottles of Beer on the Wall”, it would go through that file on a line by line basis, and print out “99 Bottles of Soda on the Wall”, and so on.

Ads by Google

The most basic Sed script is a Hello World one. Here, we use the Unix Echo utility, which merely output strings, to print “Hello World”. But we pipe this to Sed, and tell it to replace “World” with”Dave”. Self explanatory stuff.

echo "Hello World" | sed s/world/Dave

sedawk-dave

You can also combine Sed instructions into files, if you need to do some more complicated editing. Inspired by this hilarious Reddit thread, I’m going to take the lyrics to A-Ha’s Take On Me, and replace each instance of “I”, “Me”, and “My”, with Greg.

First, I’ll put the lyrics to the song in a text file called tom.txt. Then I’ll open up my preferred text editor (my favorite is Vim The Top 7 Reasons To Give The Vim Text Editor A Chance The Top 7 Reasons To Give The Vim Text Editor A Chance For years, I've tried one text editor after another. You name it, I tried it. I used each and every one of these editors for over two months as my primary day-to-day editor. Somehow, I... Read More , but Nano nano vs. vim: Terminal Text Editors Compared nano vs. vim: Terminal Text Editors Compared Although Linux has become easy enough for practically anyone to use without ever having to use the Terminal, there are some of us who regularly use it or are curious about how one can control... Read More and Gedit gedit: One Of The Most Feature-Filled Plain Text Editors [Linux & Windows] gedit: One Of The Most Feature-Filled Plain Text Editors [Linux & Windows] When you think of plain text editors, the first thing that may pop into your head is Windows' Notepad application. It does exactly what its job description states - plain features for a plain text... Read More are both excellent choices), and add the following lines. Ensure the file you create ends with .sed.

sed-greg-sed

You might notice that in the example above, I’ve repeated myself (e.g. s/me/Greg/ and  s/Me/Greg/). That’s because some versions of Sed, like the one that ships with Mac OS X, do not support case-insensitive matching. As a result, we have to write a two Sed instructions for each word, so it recognizes the capitalized and uncapitalized version.

This won’t work perfectly, as though you’ve replaced each instance of “I”, “Me”, and “My” by hand. Remember, we’re just using this as an exercise to demonstrate how you can group Sed instructions into one script, and then execute them with a single command.

Then, we need to invoke the file. To do that, we run this command.

cat tom.txt | sed -f greg.sed

Let’s slow down and look at what this does. Eagle-eyed readers will have noticed the we’re not using Echo here. We’re using Cat. That’s because while Cat will print out the entire contents of the file, echo will only print out the file name. You’ll have also noticed that we’re running Sed with the “-f” flag. This tells it to open the script as a file.

The end result is this.

sed-greg-script

It’s also worth noting that Sed supports regular expressions (REGEX). These allow you to define patterns in text, using a special and complicated syntax.

Here’s an example of how that might work. We’re going to take the aforementioned song lyrics, but use regex to print out every line that doesn’t start with “Take”.

cat tom.txt | sed /^Take/d

sed-regex-take

Sed is, of course, incredibly useful. But it’s even more powerful when combined with Awk.

What Is Awk?

Awk, like Sed, is a programming language designed for dealing with large bodies of text. But while Sed is used to process and modify text, Awk is mostly used as a tool for analysis and reporting.

Like Sed, Awk was first developed at Bell Labs in the 1970s. Its name doesn’t come from what the program does, but rather the surnames of each of the authors – Alfred Aho, Peter Weinberger, and Brian Kernaghan.

Awk works by reading a text file or input stream one line at a time. Each line is scanned to see if it matches a predefined pattern. If a match is found, an action is performed.

But while Sed and Awk may share similar purposes, they’re two completely different languages, with two completely different design philosophies. Awk more closely resembles some general purpose languages How To Pick A Programming Language To Learn Today & Get A Great Job In 2 Years How To Pick A Programming Language To Learn Today & Get A Great Job In 2 Years It can take years of dedicated work to become a truly good programmer; so is there a way to choose the right language to start from today, in order to get hired tomorrow? Read More , like C, Python and Bash. It has things like functions, and a more C-like approach to things like iteration and variables (James Bruce explained how iteration works The Absolute Basics Of Programming For Beginners (Part 2) The Absolute Basics Of Programming For Beginners (Part 2) In part 2 of our absolute beginners guide to programming, I'll be covering the basics of functions, return values, loops and conditionals. Make sure you’ve read part 1 before tackling this, where I explained the... Read More ). Put simply, it feels more like a programming language.

So, let’s try it out. Using the lyrics to Take On Me, we’re going to print all the lines that are longer than 20 characters.

awk ' length($0) > 80 ' tom.txt

awk-length

The next example I’ve shamelessly cribbed from the official Awk documentation. But it’s a great example of the potential of this powerful, yet tiny language. It’s also a great demonstration of how things like iteration and variables work in it.  First, create a file called “WordCount.awk”, and add the following lines.

{
 for (i = 1; i <= NF; i++)
 freq[$i]++
}
END {
 for (word in freq)
 printf "%s\t%d\n", word, freq[word]
}

Save it, and then run it with the following command.

awk -f WordCount.awk tom.txt

awk-wordcount
Cool, right? You’ll probably notice that they’re not in any kind of order. You can sort the results using the Unix sort utility. But we’ll leave that for another day. We’re going to keep it simple.

Combining The Two

Awk and Sed are both incredibly powerful when combined. You can do this by using Unix pipes. Those are the “|” bits between commands.

Let’s try this: We’re going to list all the lines in Take On Me that have more than 20 characters, using Awk. Then, we’re going to strip all the lines that begin with “Take”. Together, it all looks like this:

awk 'length($0)>20' tom.txt | sed /^Take/d

And produces this:

awk-length-sed

Now let’s flip that around. We’re going to start by removing all the lines that start with Take, and then pipe them to Awk, where we’ll count how many times each word appears. It looks a bit like this:

cat tom.txt | sed /^Take/d | awk -f WordCount.awk

awk-wordcount-sed

The Power Of Sed and Awk

There’s only so much you can explain in a single article. But I hope I’ve illustrated how immeasurably powerful Sed and Awk are. Simply put, they’re a text-processing powerhouse.

So, why should you care? Well, besides the fact that you never know when you need to make predictable, repetitive changes to a text document, Sed and Awk are great for parsing log files. This is especially handy when you’re trying to debug a problem in your LAMP server Signed Up for SSH-only Web Hosting? Don't Worry - Easily Install Any Web Software Signed Up for SSH-only Web Hosting? Don't Worry - Easily Install Any Web Software Don’t know the first thing about operating Linux through its powerful command line? Worry no more. Read More , or looking at your access logs to see whether your server has been hacked.

Have you found an interesting use for Sed and Awk? Are there any other Linux utilities you feel are under-appreciated? Let me know in the comments below, and we’ll chat.

  1. Samba
    March 17, 2016 at 10:42 pm

    Can I use it to parse a spreadsheet?

    • Casey Primozic
      May 2, 2016 at 5:08 pm

      Sure; just convert it to a CSV first.

  2. Luc
    February 15, 2016 at 3:25 am

    So basically, Sed offers pretty muc what grep offers with regular expression, with something extra embedded in a turing complete programming language?

    awk would seem useful, if regex tools also didnt have pre and post processing filtering.

    so, sure sed and awk are probably more powerful then some grep wrapper or gui with extra processing options, but is it really worth it learning someful extra that you might have to consult the manual every few months when you need it? since regex is present in editor and therefore is what we already use weekly..

    no mystery why it is criminally under-appreciated

    if you reeeally need some monitoring/reporting scheme where cron/grep + whatever isnt enough, might as well just embbed grep in the scripting language of your choice and do things easier than with sed+awk.

    so cron+script+grep (that is, if your scripting lang doesnt do regex)

    • Bruce Epper
      March 21, 2016 at 6:07 am

      Something that wasn't mentioned is that awk still processes the file line-by-line, but it automatically breaks each record (line) into fields using spaces/tabs by default, but it can be modified by the user by specifying a field separator (FS). This makes it dead simple for processing text files such as logs, the passwd file, configuration files, directory listings, and much more.

      An awk script can contain any number of pattern matching rules that will be run against each input line as it is read and perform the specified actions for the rule.

      An awk script can utilize expressions so you could write a script to balance your checkbook or count blank lines in a file which is what the following does:
      /^$/ { ++x }
      END { print x }

      It has a set of system variables you can use to control how awk works (default input and output separators, current record, number of fields in the current record, etc.).

      It supports boolean and relational operations, conditionals, loops, arrays, and so much more.

      In general, if you play with it a little bit, you can end up with useful scripts for repetitive editing and reporting jobs that end up being easier to maintain than their shell equivalents.

  3. Daniel Toebe
    November 2, 2015 at 5:26 pm

    Mix those with tails, top, and a cron job... and you can get a great reporting system

  4. Colonel Angus
    November 1, 2015 at 6:02 pm

    I have used Sed, but it's been quite some time ago. Like years. Never had the occasion to need Awk.

  5. Danny
    October 31, 2015 at 12:38 am

    I have been using Linux exclusively at home for a year now, and I managed never to find a need for these two. I hope I never will.

Leave a Reply

Your email address will not be published. Required fields are marked *