1. UNIX and Perl

1 – Preparing for the exercises

Exercise

The goal of this exercise is to create a directory on the Desktop with nested sub-directories in it. All commands should be run in the terminal. Unless anything else is stated you simply press Enter to execute the program or command.

Listing content and creating directories

  1. Launch the Terminal. The terminal starts the shell program BASH. BASH is a command processor that basically presents the command line and interprets your commands. We use it to run Perl programs, for instance. Execute the command:
    It tells you where you are currently located in the filesystem. It should be something like /Users/<username>/ where <username> is your particular user on the machine. This is your home directory, where all of your files are stored in various directories.
  2. To list directories and files that reside inside your home directory, execute:
  3. To list directories and files that reside inside your home directory, including hidden ones (which names start with a dot), execute:
    Some of these hidden files can be used to tweak the shell environment by editing settings for BASH. You may benefit from doing so on your own office computer but we will not need to edit any such files during the course. Many text editors auto save the files you work with as hidden backup files, so knowing how to list hidden content is useful for that reason too.
  4. To get a summary of the disk usage of all your data in your home directory, execute:
  5. Change directory to /Users/<username>/Desktop by executing:
    This directory corresponds to the normal desktop. Should you need it, you can change back to directory you we in before your current directory by doing cd -. The course exercises are arranged according to topics that correspond to one or several chapters in LP. First create a directory called perl_course with the command:
    and then create subdirectories, one for each topic:

    Repeat that for all of these:

    Such a repetitive task can be automated in BASH, by running a for loop:

    Perl syntax and BASH syntax are similar enough to be confusing so we will not go into further details about BASH here. We suggest you use these directories for the exercises throughout the course. Your teacher will tell you if it does not apply to some particular exercise. It is generally recommended to avoid white space in file and directory names when you work from the command line because they can be tricky to specify and deal with correctly later on when you need to access the data in them. Moreover, it is usually a good idea to stick to lower case letters which are easier to type without needing to engage extra brain cells that are off doing something else.

2 – Filtering and redirecting output

One of the most powerful aspects of the command-line environment is the ability to chain commands together, the output from one command serving as the input stream for the next. This functionality often relies on the data being transferred to be clear text, but is frequently used for custom binary data as well (such as many formats of next generation sequence data) which can not be easily read by many command line programs. There are three standard Unix I/O streams: STDIN (for reading information), STDOUT (for printing information) and STDERR (for printing error messages).

Exercise

Here we will launch Perl without any actual script and instead make it print the version. We will then filter and manipulate the output.

Piping information

  1. Move to the 1_unix directory and execute Perl like this:
    It should print about twelve lines out output.
  2. To see only the line with the version number you can chain perl -v together with the command grep, by executing: .
    Or alternatively:

    Can you come up with a different text pattern to match only that line?
    The vertical bar is the character that signals the output from perl to be the input for grep.
  3. The same thing can of course be done with the line mentioning Larry Wall, the creator of Perl, by executing: .
  4. You can continue piping that output and manipulate it using other programs, such as awk that lets you filter out only specific words. Execute this and try to figure out what it does:
  5. You can redirect that exciting piece of information into a text file by executing:
    You can display the contents in the new file with cat, by executing: .

3 – Downloading and displaying contents of text files

Exercise

Here we will use command-line tools to download and inspect a data file that summarizes genetic differences among individuals at some positions along a chromosome. The actual contents of the file is not important at this point but rather subject for a completely different course.

Retrieving and displaying data

  1. Double check that you are still in the 1_unix directory and use the program curl to download a text file from the course web page, by executing:
  2. Display its contents with cat. Rather noisy, no?
  3. less is a program that displays contents of of files by paging through the text one screenful at a time. Open snp_data.vcf with the less program by executing:
    You can move up and down the text file using the arrow or PgUp/PgDown keys. You quit from less by pressing the ‘q’ key.
  4. By default, less wraps all lines so that a complete line is shown before it moves on to the next line. Contents of complex files with long lines like this one are not very readable when wrapped. You can change the display behaviour in less so that it does not wrap lines, by executing:
    That output is much more readable. By the end of the course you should be able to parse a file like this but a million times larger and extract only specific data from it. You can search for and highlight pieces of text in less by first pressing the ‘/’ key and then pass a search phrase (or regular expression), for instance ‘DP=’ or ‘Chr1\t1135’ (without the citation symbols).
  5. Make a copy of the file and put that copy in the directory for the Input/Output exercises (we will need it for that exercise too):
  6. The position of the new file (which will use the same name) is here specified relative to where you are currently located in the filesystem. The .. symbol tells cp to back out of the current directory and then into 5_input_output. The final . tells cp to use the same name for the copy, but is not strictly needed. This works as well:

4 – Hello world!

Exercise

The usual suspect

  1. Copy the little piece of code above, paste it into your text editor and save it as hello_world.pl in 1_unix. You can use the little copy document symbol at the top of the code box. This is how you will get the starting code for many of the exercise programs you will be writing, although most of the time, you will have to come up with names for the files yourself ;-). The first line in the code is the “shebang” or “hashbang” character sequence that tells the system what kind of file this is and where to look for the program or interpreter (in this case Perl) that acts on it.
  2. Make it executable by running:
    in the same directory.
  3. Execute it like this:
    and marvel at the output. Because this directory is not part of the PATH of directories where BASH automatically looks for commands and programs, you tell BASH to execute this specific file with the ./.