pipes | Why Love Linux

Pipes and FIFOs

June 8, 2011 Leave a comment

Overview

The basic design of the Unix command line world is the same one which makes it so powerful and which many people swear by, namely the pipes and FIFOs design. The basic idea is that you have many simple commands which take an input and produce an output, and then to string them together into something that will give the desired effect.

Standard IO Streams or Pipes

To start out, let me give a rough explanation of the IO pipes. These are Standard Input, Standard Output and Standard Error. Standard Error and Standard Output are both output streams writing out whatever the application puts into to them. Standard Input is an input stream giving to the application whatever is written into it. Each program when run has each of these 3 streams available to it by default.

From this point forward I’ll refer to the 3 streams as STDOUT for Standard Output, STDERR for Standard Error and STDIN for Standard Input. These are the general short form or constant names for these streams.

Take for example the echo and cat commands. The echo command takes all text supplied as arguments on it’s command line and writes it out STDOUT. For example, the following command will print the text “Hi There” to the STDOUT stream, which by default is linked to the terminal’s output.

echo Hi There

Then, in it’s simplest form the cat command takes all data it reads from it’s STDIN stream and writes it back out to STDOUT exactly as it was received. You can also instruct cat to read in the contents of one or more files and write it back out to STDOUT. For example, to read in the contents of a file name namelist, and write it to STDOUT (the terminal) you can do:

cat namelist

To see cat in it’s purest form, simply run it without arguments, as:

cat

Each line of input typed in will be duplicated. This is because the input you type is sent to STDIN. This input is then received by cat which will write it back to STDOUT. The end of your input can be indicated by pressing Ctrl+D, which is the EOF or End of File key. Pressing Ctrl+D will close the STDIN stream, and will be handled by the program the same as if it was reading a file and came to the end of that file.

Pipes and Redirects

Now, all command line terminals allow you to do some powerful things with these IO pipes. Each type of shell has it’s own syntax, so I will be explaining these using the syntax for the Bash shell.

You could for instance redirect the output from a command into a file using the greater than or > operator. For example, to redirect the STDOUT of the echo command into a file called message, you would do:

echo Hi There > message

You could also read this file back into a command using the less than or < operator. This will take the contents of the file and write it to the command’s STDIN stream. For example, reading the above file into the cat program, would have it written back to STDOUT. So this has the same effect as supplying the filename as an argument to cat, but instead uses the IO pipes to supply the data.

cat < message

Where things really get powerful is when you start stringing together commands. You can take the STDOUT of one command and pipe it into the STDIN of another command, with as many commands as you want. For example, the following command pipes the message “Pipes are very useful” into the cut command, instructing it to give us the 4th word of the line. This will result in the text “useful” being printed to the terminal.

echo Pipes are very useful | cut -f 4 -d " "

As you can see, commands are stringed together with the pipe or | operator. The pipe operator by itself makes many powerful things possible.

Using the pipe (|) and redirect (>) operator, let’s give a more complex example. Let’s say we want to get the PID and user name of all running processes, sorted by the PID and separated by a comma. We can do something like this:

ps -ef | tail -n+2 | awk '{print $2 " " $1}' | sort -n | sed "s/ /,/"

To give an idea of what happens here, let me explain the purpose of each of these commands with the output each one produces (which becomes the input of the command that follows it).

Command	Description
ps -ef	Gives us a list of processes with many columns of data, of these the 1st column being the user and the 2nd column being the PID. Output: UID PID PPID C STIME TTY TIME CMD root 4222 443 0 20:14 ? 00:00:00 udevd quintin 3922 2488 0 20:14 pts/2 00:00:00 /bin/bash quintin 4107 2496 0 20:18 pts/0 00:00:00 vi TODO
tail -n+2	Takes the output of ps and gives us all the lines from line 2 onwards, effectively stripping the header. Output: root 4222 443 0 20:14 ? 00:00:00 udevd quintin 3922 2488 0 20:14 pts/2 00:00:00 /bin/bash quintin 4107 2496 0 20:18 pts/0 00:00:00 vi TODO
awk ‘{print $2 ” ” $1}’	Takes the output of tail, and prints the PID first, a space and then the user name. The rest of the data is discarded here. Output: 4222 root 3922 quintin 4107 quintin
sort -n	This sorts the lines received from awk numerically. Output: 3922 quintin 4107 quintin 4222 root
sed “s/ /,/”	Replaces the space separating the PID and user name with a comma. Output: 3922,quintin 4107,quintin 4222,root

Some Example Useful Commands

The above should give you a basic idea of what it’s all about. If you feel like experimenting, here are a bunch of useful commands to mess around with.

I’ll be describing the commands from the perspective of the standard IO streams. So even though I don’t mention it, some of these commands also support reading input from files specified as command line arguments.

To get more details about the usage of these commands, see the manual page for the given command by running:

man [command]

Command	Description
echo	Writes to STDOUT the text supplied in command line arguments.
cat	Writes to STDOUT the input from STDIN.
sort	Sorts all lines of input from STDIN.
uniq	Strips duplicate lines. The input needs to be sorted first, thus same basic effect can be achieved with just sort -u
cut	Cuts a string by a specified character and returns requested parts.
grep	Search for a specified pattern or string in the data supplied via STDIN.
gzip	Compresses the input from STDIN and writes the result to STDOUT. Uses gzip compression.
gunzip	Uncompresses the gzip input from STDIN and writes the results to STDOUT. Basically the reverse of gzip.
sed	Stream editor applying basic processing and filtering operations to STDIN and writes to STDOUT.
awk	Pattern scanning and processing langauge. Powerful script-like processing of lines/words from input.
column	Takes the input from STDIN and formats it into columns, writing the result to STDOUT. Useful for displaying data.
md5sum	Takes the input from STDIN and produces a md5sum of the data
sha1sum	Takes the input from STDIN and produces a sha1sum of the data
base64	Takes the input from STDIN and base64 encodes or decodes it
xargs	Takes input from STDIN and a uses it as arguments to a specified command
wc	Count the number of lines, words or characters read from input.
tee	Read input and write it to both STDOUT as well as a specified file.
tr	Translate or delete characters read from input

Conclusion

I would recommend anyone to get comfortable with these aspects of the Linux terminal as well as Bash scripting. Not knowing this, you might not even realize how many of your common tasks could be automated/simplified by it. Also remember that automation not only makes your tasks be completed quicker, but also reduces the chances for errors/mistakes that come from doing repetitive tasks by hand.

So Why Love Linux? Because the pipes and FIFOs pattern gives you a lot of power for building complex instructions.

Tags: automation, bash, command line, fifos, pipes, scripting, shell, terminal

All Configuration Files are Plain Text

May 29, 2011 Leave a comment

Plain Text Configs?

Almost all of the configuration files in a Linux system are text files under /etc. You get the odd application that stores it’s config in a binary format like a SQL database, specialized or some other proprietary format.

The benefit of text, however, is the ease of accessing it. You don’t need to implement an API, which if it’s your first implementation can set you back quite some time. And if you only need 1 or 2 entries from it, could end up as a huge expense. Having the configuration available as plain text you can immediately access it with the myriad of stream processors you have at your disposal, like grep, sed, awk, cut, cat, sort to name a few, or inside programming languages any of the file operations and methods.

To give an example, lets say I wanted to know the PIDs of all the users on the system. I didn’t need their names, just the PIDs, it’s as simple as this (assuming you don’t use something like NIS):

awk -F ':' '{print $3}' /etc/passwd

Or if you wanted to get a list of all hostnames a specified IP is associated with in /etc/hosts:

grep "127.0.0.1" hosts | cut -f 2- -d ' '

A very useful one is where your configuration file’s entries take the format of

RUN_DAEMON="true"
INTERFACE="127.0.0.1"
PORT="7634"

In this case you can easily extract the values from the file. For example, if the configuration file is named /etc/default/hddtemp, and you want to extract the PORT option, you simply need to do:

value=$( . "/etc/default/hddtemp"; echo $PORT )
echo "PORT has value: $value"

Or for the same file, if you want to extract those options into your current environment, it’s as simple as doing:

. "/etc/default/hddtemp"
echo "PORT has value: $PORT"
echo "INTERFACE has value: $INTERFACE"

What the above does is evaluate the configuration file as a bash script, which then basically sets those configuration options as variables (since they use the same syntax as bash for setting variables). The difference between the first and the second method, is that the second method could potentially override variables you’re already using in your environment/script (imagine already using a variable named PORT at the time of loading the configuration file). The first method loads the configuration in it’s own environment and only returns the value you requested.

Note that when using these 2 methods, had there been commands in the configuration files they would have been executed. So be aware of this if you decide to use this technique to read your configuration files. It’s very useful, but unless you’re careful you could end up with security problems.

Conclusion

The mere fact that these files are text means you immediately have access to it using methods available to even the most primitive scripting/programming langauges, and all existing text capable commands and programs are able to view/edit your configuration files.

Something like the Windows registry surely has it’s benefits. But in the world of Unix pipes/fifos, having it all text based is a definite plus.

So Why Love Linux? Because it has mostly plain text configuration files.

Tags: configuration, fifos, pipes, scripting, streams, text

Almost Everything is Represented by a File

May 25, 2011 Leave a comment

Device Files

Other than your standard files for documents, executables, music, databases, configuration and what not, Unix based operating systems have file representations of many other types. For example all devices/drivers have what appears to be standard files.

In a system with a single SATA hard drive, the drive might be represented by the file /dev/sda. All it’s partitions will be enumerations of this same file with the number of the partition added as a suffix, for example the first three primary partitions will be /dev/sda1, /dev/sda2 and /dev/sda3.

When accessing these files you are effectively accessing the raw data on the drive.

If you have a CDROM drive, there will be a device file for it as well. Most of the time a symlink will be created to this device file under /dev/cdrom, to make access to the drive more generic. Same goes for /dev/dvd, /dev/dvdrw and /dev/cdrw the last 2 being for writing to DVDs or CDs. In my case, the actual cd/dvdrom device is /dev/sr0, which is where all of these links will point to. If I wanted to create an ISO image of the CDROM in the drive, I would simply need to mirror whatever data is available in the “file” at /dev/cdrom, since that is all an ISO image really is (a mirror of the data on the CDROM disc).

So to create this ISO, I can run the following command:

dd if=/dev/cdrom of=mycd.iso

This command will read every byte of data from the CDROM disc via the device file at /dev/cdrom, and write it into the filesystem file mycd.iso in the current directory.

When it’s done I can mount the ISO image as follows:

mkdir /tmp/isomount ; sudo mount -o loop mycd.iso /tmp/isomount

Proc Filesystems

On all Linux distributions you will find a directory /proc, which is a virtual filesystem created and managed by the procfs filesystem driver. Some kernel modules and drivers also expose some virtual files via the proc filesystem. The purpose of it all is to create an interface into parts of these modules/drivers without having to use a complicated API. This allows scripts and programs to access it with little effort.

For instance, all running processes have a directory named after it’s PID in /proc. These can be identified by all the numbers between 1 and 65535 in the /proc directory. To see this in action, we execute the ps command and select an arbitrary process. For this example we’ll pick the process with PID 16623, which looks like:
16623 ? Sl 1:55 /usr/lib/chromium-browser/chromium-browser

So, when listing the contents of the directory at /proc/16623 we see many virtual files.
quintin@quintin-VAIO:~$ cd /proc/16623 quintin@quintin-VAIO:/proc/16623$ ls -l total 0 dr-xr-xr-x 2 quintin quintin 0 2011-05-21 19:48 attr -r-------- 1 quintin quintin 0 2011-05-21 19:48 auxv -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 cgroup --w------- 1 quintin quintin 0 2011-05-21 19:48 clear_refs -r--r--r-- 1 quintin quintin 0 2011-05-21 18:05 cmdline -rw-r--r-- 1 quintin quintin 0 2011-05-21 19:48 coredump_filter -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 cpuset lrwxrwxrwx 1 quintin quintin 0 2011-05-21 19:48 cwd -> /home/quintin -r-------- 1 quintin quintin 0 2011-05-21 19:48 environ lrwxrwxrwx 1 quintin quintin 0 2011-05-21 19:48 exe -> /usr/lib/chromium-browser/chromium-browser dr-x------ 2 quintin quintin 0 2011-05-21 16:30 fd dr-x------ 2 quintin quintin 0 2011-05-21 19:48 fdinfo -r--r--r-- 1 quintin quintin 0 2011-05-21 16:36 io -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 latency -r-------- 1 quintin quintin 0 2011-05-21 19:48 limits -rw-r--r-- 1 quintin quintin 0 2011-05-21 19:48 loginuid -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 maps -rw------- 1 quintin quintin 0 2011-05-21 19:48 mem -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 mountinfo -r--r--r-- 1 quintin quintin 0 2011-05-21 16:30 mounts -r-------- 1 quintin quintin 0 2011-05-21 19:48 mountstats dr-xr-xr-x 6 quintin quintin 0 2011-05-21 19:48 net -rw-r--r-- 1 quintin quintin 0 2011-05-21 19:48 oom_adj -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 oom_score -r-------- 1 quintin quintin 0 2011-05-21 19:48 pagemap -r-------- 1 quintin quintin 0 2011-05-21 19:48 personality lrwxrwxrwx 1 quintin quintin 0 2011-05-21 19:48 root -> / -rw-r--r-- 1 quintin quintin 0 2011-05-21 19:48 sched -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 schedstat -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 sessionid -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 smaps -r-------- 1 quintin quintin 0 2011-05-21 19:48 stack -r--r--r-- 1 quintin quintin 0 2011-05-21 16:30 stat -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 statm -r--r--r-- 1 quintin quintin 0 2011-05-21 18:05 status -r-------- 1 quintin quintin 0 2011-05-21 19:48 syscall dr-xr-xr-x 22 quintin quintin 0 2011-05-21 19:48 task -r--r--r-- 1 quintin quintin 0 2011-05-21 19:48 wchan

From this file listing I can immediately see the owner of the process is the user quintin, since all the files are owned by this user.

I can also determine that the executable being run is /usr/lib/chromium-browser/chromium-browser since that is what the exe symlink points to.

If I wanted to see the command line, I can view the contents of the cmdline file, for example:

quintin@quintin-VAIO:/proc/16623$ cat cmdline
/usr/lib/chromium-browser/chromium-browser --enable-extension-timeline-api

More Proc Magic

To give another example of /proc files, see the netstat command. If I wanted to see all open IPv4 TCP sockets, I would run:

netstat -antp4

Though, if I am writing a program that needs this information, I can read the raw data from the file at /proc/net/tcp.

What if I need to get details on the system’s CPU and it’s capabilities? I can read /proc/cpuinfo.

Or if you need the load average information in a script/program you can read it from /proc/loadavg. This same file also contains the PID counter’s value.

Those who have messed around with Linux networking, would probably recognize the sysctl command. This allows you to view and set some kernel parameters. All of these parameters are also accessible via the proc filesystem. For example, if you want to view the state of IPv4 forwarding, you can do it with the sysctl as follows:

sysctl net.ipv4.ip_forward

Alternatively, you can read the contents of the file at /proc/sys/net/ipv4/ip_forward:

cat /proc/sys/net/ipv4/ip_forward

Sys Filesystem

Similar to the proc filesystem is the sys filesystem. It is much more organized, and as I understand it intended to supersede /proc and hopefully one day replace it completely. Though for the time being we have both.

So being very much the same, I’ll just give some interesting examples found in /sys.

To read the MAC address of your eth0 network device, see /sys/class/net/eth0/address.

To read the size of the second partition of your /dev/sda block device, see /sys/class/block/sda2/size.

All devices plugged into the system have a directory somewhere in /sys/class, for example my Logitech wireless mouse is at /sys/class/input/mouse1/device. As can be seen with this command:

quintin@quintin-VAIO:~$ cat /sys/class/input/mouse1/device/name
Logitech USB Receiver

Network Socket Files

This is mostly disabled in modern distributions, though remains a very cool virtual file example. These are virtual files for network connections.

You can for instance pipe or redirect some data into /dev/tcp/10.0.0.1/80, which would then establish a connection to 10.0.0.1 on port 80, and transmit via this socket the data written to the file. This could be used to give basic networking capabilities to languages that don’t have it, like Bash scripts.

The same goes for UDP sockets via /dev/udp.

Standard IN, OUT and ERROR

Even the widely known STDIN, STDOUT and STDERR streams, most probably available in every operating system and programming language there ever was, is represented by files in Linux. If you for instance wanted to write data to STDERR, you can simply open the file /dev/stderr, and write to it. Here is an example:

echo I is error 2>/tmp/err.out >/dev/stderr

After running this you will see the file /tmp/err.out containing “I is error”, proving that having written the message to /dev/stderr, resulted in it going to the STDERR stream.

Same goes for reading from /dev/stdin or writing to /dev/stdout.

Conclusion

So Why Love Linux? Because the file representation for almost everything makes interacting with many parts of the system much easier. The alternative for many of these would have been to implement some complicated API.

Tags: fifos, filesystem, kernel, pipes, procfs, sysfs, virtual files

Pipe a Hard Drive Through SSH

May 24, 2011 Leave a comment

Introduction

So, assume you’re trying to clone a hard drive, byte for byte. Maybe you just want to backup a drive before it fails, or you want to do some filesystem data recovery on it. The point is you need to mirror the whole drive. But what if you can’t install a second destination drive into the machine, like in the case for most laptops. Or maybe you just want to do something funky?

What you can do is install the destination drive into another machine and mirror it over the network. If both these machines have Linux installed then you don’t need tany extra software. Otherwise you can just boot from a Live CD Linux distribution to do the following.

Setup

We’ll assume the source hard drive on the client machine is /dev/sda, and the destination hard drive on the server machine is /dev/sdb. We’ll be mirroring from /dev/sda on the client to /dev/sdb on the server.

An SSH server instance is also installed and running on the server machine and we’ll assume you have root access on both machines.

Finally, for simplicity in these examples we’ll name the server machine server-host.

The Command

So once everything is setup, all you need to do is run the following on the client PC:

dd if=/dev/sda | ssh server-host "dd of=/dev/sdb"

And that’s it. All data will be

read from /dev/sda,
piped into SSH, which will in turn
pipe it into the dd command on the other end, which will
write it into /dev/sdb.

You can tweak the block sizes and copy rates with dd parameters to improve performance, though this is the minimum you need to get it done.

Conclusion

So Why Love Linux? Because it’s Pipes and FIFOs design is very powerful.

Tags: data, device files, pipes, ssh, terminal

Why Love Linux

Archive

Pipes and FIFOs

Overview

Standard IO Streams or Pipes

Pipes and Redirects

Some Example Useful Commands

Conclusion

All Configuration Files are Plain Text

Plain Text Configs?

Conclusion

Almost Everything is Represented by a File

Device Files

Proc Filesystems

More Proc Magic

Sys Filesystem

Network Socket Files

Standard IN, OUT and ERROR

Conclusion

Pipe a Hard Drive Through SSH

Introduction

Setup

The Command

Conclusion

Recent Posts

My Tweets

Archives

Poll

Why Love Linux

Archive

Pipes and FIFOs

Overview

Standard IO Streams or Pipes

Pipes and Redirects

Some Example Useful Commands

Conclusion

Share this:

All Configuration Files are Plain Text

Plain Text Configs?

Conclusion

Share this:

Almost Everything is Represented by a File

Device Files

Proc Filesystems

More Proc Magic

Sys Filesystem

Network Socket Files

Standard IN, OUT and ERROR

Conclusion

Share this:

Pipe a Hard Drive Through SSH

Introduction

Setup

The Command

Conclusion

Share this:

Recent Posts

My Tweets

Archives

Poll