Shell exercises 1
=================

Exercises: login and file transfer
----------------------------------

1.  Copy the file
    <http://mbdata.science.ru.nl/courses/ghe/ghe_files_practical.tgz> to
    your home directory on the server.

Exercises: Files and directories
--------------------------------

1.  Unpack `ghe_files_practical.tgz`. The `.tgz` file is a *compressed
    tarball*, a file packaged with `tar` and compressed with `gzip`. You
    unpack such a file using the following command:

``` {.bash}
$ tar xzfv ghe_files_practical.tgz
```

The `tar` command is followed by four arguments: `x` for extract, `z` to
uncompress using `gzip`, `f` to specify we have a file and `v` for
verbose.

How many files do you now have in your home directory?

2.  Let's clean up a bit. Delete `ghe_files_practical.tgz`. We don't
    need it anymore, as it is unpacked.

3.  Create a directory called `experiments` and a `data` directory.

4.  Move all `.txt` files to `experiments` and all `.bam` files to
    `data`. How many files are now in each directory?

There's one file, `exercises.txt`, that doesn't belong in the
`experiments` directory. Let's move it back to your home directory.
We're now going to use this file to record the answers to these
questions.

5.  Move the file `exercises.txt` back to your home directory.

6.  Make a copy of this file, called `answers.txt`.

7.  Edit the file `answers.txt` using nano and include the answers and
    all the commands you used. *Hint: use the &lt;up&gt; arrow to see
    previous commands.*

Exercises: Redirection and pipes
--------------------------------

Go to the `experiments` directory, located in your home directory and
answer the following questions. Include the command you used with your
answer.

1.  What are the first three lines of `E-GEUV-2.idf.txt`?

2.  What is the last line of `E-GEUV-2.idf.txt`?

3.  How many samples are present in `E-GEUV-1.sdrf.txt`?

4.  The Geuvadis project sequenced RNA from different populations. Which
    column contains the population information in `E-GEUV-1.sdrf.txt`?
    How many different populations were sequenced? How many samples from
    each population are present in `E-GEUV-1.sdrf.txt`? How many samples
    are present from each population in all *.sdrf.txt files (*hint*:
    use the `cat` command to concatenate multiple files*).

5.  How long were the reads that were sequenced in this project?

6.  Create a file called `source_names.txt` with all unique 'Source
    Names' from the `sdrf.txt` files.


