The Unix Shell

This course will have two parts.

Part I will discuss basic concepts such as

  • Introduction to the shell

  • Navigating the file system (pwd, ls, cd)

  • Moving files and directories (mv)

  • Copying files and directories (cp)

  • Deleting files and directories (rm)

Part II will have more advanced topics such as

  • Working with files and searches (wc)

  • Redirecting output (>,>>)

  • Pipes (|, sort, tail)

  • Finding Things (grep, find)

A lot of this course is based off of the material from the Software Carpentry website: The Unix Shell. This tutorial is meant for you to walkthorugh by typing the commands in your terminal. We will also be working with a dataset which is the shell-lesson-data. Download the zip file from the link and unzip the file in the Documents directory.

Part I

1) Introduction to Unix Shell

a) Shell Basics

Unix is a powerful operating system (OS) that is used around the world. There are many OS based on Unix such as BSD, Mac OS and Linux among many others. These OSs use a Graphical User Interface (GUI) which can give instructions to the OS through visualization and using menu-driven interactions through a computer mouse.

The GUI allows an intuitive method for using a computer and can help with many tasks. However, when required to perform multiple analytic based the GUI does not work as well as the Unix Shell. The Unix Shell is a Command-Line Interface and scripting language that can be accessed through a terminal. There are different types of CLIs

  • Bash - Default for most Linux systems and what we will use for this tutorial

  • Zsh - Default for Mac OS (similar to Bash)

  • Csh

  • Bourne Shell

The CLI is powerful as you can customize the interface as well as run various programs either locally or on remote servers through Secure SHell (SSH). The below image is a typical example of a terminal

3e60d6635d6a4c648cd1ae0757de77af

In our examples we have created an account called sio on a computer named localhost. The $ is the prompt and users can begin entering commands after the prompt. The features of the terminal can be customized through your bash profile. As mentioned previously, we will be using the Bash shell for our examples. To determine which shell you are using you can run the following command.

sio@localhost $ echo $0

For Mac OS users, the default CLI is zsh. If you are using a zsh shell then that is fine and there are no additional changes needed as zsh is very similar to bash. We will also later go into the echo program as well as what the $ is used for (not the prompt).

[ ]:
echo $0

# output
-bash

2) Navigating through Files and Directories

Every Unix based OS has a hierarchical structure to organize files and folders. This is known as the file system. The root directory is the Top-level folder and is represted by a foward slash, /. Every subfolder is known as the child or subdirectory of the parent folder.

5b36284731f94b539f508afd1b343a64

When starting the CLI your folder will be located in /Users/username. For the examples used in this tutorial it will be /Users/sio. This is referred to as the __home directory. You can see which directory you are currently in by type pwd (Print Working Direcotry).

[ ]:
# input
sio@localhost $ pwd

# output
sio@localhost $ /Users/sio

To begin we will start with changing directories. You can change directories with the cd (change directory) command. Let’s change the directory to where we had stored the shell-lesson data

[ ]:
cd /Users/sio/Documents/shell-lesson-data
/Users/sio/Documents/shell-lesson-data
[21]:
pwd
[21]:
'/Users/sio/Documents/shell-lesson-data'

With cd command we had changed the directory and the pwd command shows which location we are in. We can also list files in the directory with the ls command.

[23]:
ls
exercise-data/      north-pacific-gyre/

There are several built in shortcuts that can be used with various commands. The tilde, ~ is a shortcut to your home directory. If you wanted to change your directory back to home you could type in

cd ~

To go back to the directory you were just in you could use the dash, -

cd -

You can also move up to the parent directory by using the `.. shortcut

cd ..  # moves up one directory

Often times we would like to see what commands we had previously run in a session. This could be done by using the history command.

Lastly, to clear the screen use the clear command

[25]:
clear

When referring to directories and paths we often talk about the absolute path or the relative path. The absolute path is the entire path of a directory, for example

/Users/sio/Documents

Would be the absolute path towards the Documents directory. If we were in the documents directory and wanted to access the shell-lesson-data then the relative path would be

Documents/shell-lesson-data

Besides listing what objects there are in a directory the ls command, in addition to providing options, can also list information on if the item is a file or folder, the size of the object as well as when the object was last modified

ls -lsG

f482600723f04acfa5fa084891882c2b

There will be times, especially on a remote server, where you cannot copy or remove files in a specific directory. This is because of user or group permissions

168ef8aaf51448118d72655833ee7385

When using ls with the -l option you will see the permission structure for the Owner, Group and Other. In the below screenshot we see the following permissions

drwxr-xr-x 7 tgstone staff

e4bd8a3b1a5544eca33445fe947ac0c2

There will always be ten characters in this first column. The first index determines if the object is a file, -, or a directory d. In the above example we see that the docs object is a directory.

The next three characters are the permissions for the Owner.

  • r -read

  • w - write

  • x - execute

The owner, tgstone, has read,write and execute permissions for the docs folder.

The next three characters determine the permissions for the group. The Group is staff and the permissions are read and execute. The dash indicates that they do not have permissions for the w.

Lastly, if the user is not the owner or in the group then they are placed in the Other category. The permissions for other are read and execute. The chmod command can be used to change an objects permissions but this is outside the scope of this tutorial.

Using ls with the -h option lists all hidden files. Hidden files are files or folders that begin with a .

bd7a14b0cdd44311a26176fc30e16ff3

a) Syntax and Commands

We will now go over examples of command syntax

sio@localhost $ ls -lGh .

560f2d6e409c4f63b972a3f2d485bb30

Generally with the CLI we will run the command, give additional options and then list the arguments. Some commands do not follow this form (such as find) however it is important to know how we can enter the commands. The option list additional information we want to include with our command and the . (this is a shortcut for current directory) determines which object we are working with. The -lGh options allow us to see the directory in long format )-l, color code files and directories (-G) and list all hidden files (-h).

Often times you will see . or .. used in a command

  • . refers to the current directory

  • .. refers to the directory above

b) Finding more information

If we want to find what options are avaiable or how a command works we can use the help command or look at the man (manual) pages.

help pwd

82f839c0f374431394dc2682e7cb772c

man pwd

356fbb16531a46f7a33873768f7ee8ee

Lastly, thanks to the advances of AI, we could also just use an LLM to get additional information on a command or how to use a command

a4826a89015141d894b15ccda343b97e

3) Creating, deleting, copying and moving files and folders

Now that we have the ability to view information in the file system we want to be able to manipulate objects. Some of these basic tasks can be done with a few commands

  • mkdir - Create directories

  • touch or

  • mv - To move files and directories

  • cp - To copy files and directories

  • rm - To remove files and directories (need additional options to remove a directory)

  • rmdir - To remove a directory

a) Creating directories

Currently you should be in the shell-lesson-data/exercise-data directory. Confirm this information with the pwd command. We can create a new folder called thesis

[14]:
mkdir thesis

You can also create sub directories using the -p option

[15]:
mkdir -p ../project/data ../project/result

b) Creating files

There are multiple options when creating a file

  1. Create a new file with the touch command. The touch command is often use when you need a file for initialization. As an example, in previous versions of python the __init__.py file was required to make a module. No information was needed in the file, only that it existed

[ ]:
touch thesis/draft.txt
  1. Use an editor such as nano, vim or emacs to create a file

[ ]:
nano thesis/draft.txt

c) Moving files and directories

To move a file or folder to another location or rename the object, we use the mv command. If we wanted to change the name of the draft.txt file we just created to quotes.txt

WARNING: If you rename a file and there is another file with the same name, it will be overwritten

[ ]:

mv thesis/draft.txt thesis/quotes.txt

If we wanted to move the quotes files to our current directory we could specify the file and write out the destination

[ ]:
mv thesis/quotes.txt .

Let’s say that we created a references.tex, introduction.tex and methods.txt file in the wrong directory.

[ ]:
touch references.tex introduction.tex methods.txt

To move these files to the thesis directory we have two options

  1. Manually move the files by listing each file

mv introduction.tex references.tex thesis

  1. Use special that describe patterns to move the files

[ ]:
mv *.tex *.txt thesis

The __*__ Character

The __*__ characters is called the wildcard (also known as a glob). It is used to match 0 or more characters in a file or directory. Use the .tex file as an example, the wilcard is stating that any file that has a .tex at the end will be moved to thesis. Another example is if we only wanted to move files that started with an r and ended with a .tex we could write m*.tex

d) Copying files and directories

Similar to moving files, we can also copy files and folders into new locations. This is done with the cp command

[ ]:
cp quotes.txt thesis/quotations.txt

Notice that, in addition to copying the file, we have also renamed the file to quotations.txt. To copy multiple files and folders we need to include the recursive option, -r. The below command will create a backup of the thesis folder and name it thesis_backup

[22]:
cp -r thesis thesis_backup

e) Removing files and directories

We can now delete the quotes.txt file since it was in the wrong directory

[ ]:
rm quotes.txt

WARNING: The remove command does not move the file to the trash bin. It ‘unlinks’ the file which will mark that the space as unused on the hard drive so it can be used for other items. This means that the file cannot be recovered. There are special programs that can perform this recovery but it is often very difficult to do.

If we want to remove an empty directory we can use the rmdir command. If we need to remove a directory that contains items we use the rm command with the -r option (recursive).

[ ]:
rm -ir thesis

I have included the -i option above. Often times we may accidentally delete files in a folder that we were not aware of. The -i option prompts for confirmation of deleting each file. Now that we have covered the basics of the Unix shell we can move onto the fun stuff

Part II - The Fun Stuff

1) Working with files and searches

Navigate to the shell-lesson-data/exercise-data/writing directory. Inside the writing directory is a file called abstract.txt. Using the wc command we can list

  1. The number of lines in the file

  2. The number of words

  3. How many bytes the file uses.

For the wc command the -l option only displays the number of lines and the -c option only displays the size of the file (in bytes).

[ ]:
cd writing
wc abstract.txt

1b146f2952de4c9cb21b3d916d5d5a64

If we wanted to know the number of lines, words and sizes of each file as well as the sum for the lines, words and sizes we could use the wildcard character

[ ]:
wc *.txt

ab7777ad44794663aab59cbc0d6704c5

2) Redirecting output

In the above example there are only three files listed, if we had thousands of files that we were trying to determine the size we may want to use another method for outputting the data. In the CLI we can use the > characters to move the information from std output (i.e. Unix shell) to a file

[ ]:
wc *.txt > lengths.txt

Notice that nothing is displayed in the CLI. This is because we have decided to write the information to the lengths.txt file.

Redirect Option

  • > willl create the file if it doesn’t exist or, if the file does exist, it will overwrite the file

  • >> will create the file if it doesn’t exist or append the new information to the file

We can confirm that the data was written to the lengths.txt file by using the cat command

[ ]:
cat lengths.txt

3) Using the Pipe |

The reason why the Unix shell is so powerful is that it has the ability to chain together multiple commands. This is done using the pipe |. If we wanted to sort the output of the data we could use the sort command

[ ]:
wc *.txt | sort -n

The -n option sorts the numeric values. Given our example again, if we wanted to know which were the largest files in our directory we could use the tail command to view the bottom three entries

[ ]:
wc *.txt | sort -n | tail -n 3

4) Finding Things

Often times it is difficult to remember the location of files or directories on servers or computers. This is especially true when dealing with multiple iterations of a program or script. There are two powerful tools that can be used to find items within directories and text documents.

  • grep A utility that searches any given input file, selecting lines that match one or more patterns

  • find A utility that recursively descends the directory tree for each path listed, evaluating the expression in terms of each file in the tree

In the current writing directory there is a file called haiku.txt. If we wanted to search for any words that have the word “not” in the line we could use grep

[ ]:
grep not haiku.txt

5dcc9351bdc34f138fa96e5f4b5d7440

We see that grep has successfully returned all the lines that have the word not in it. Let’s try again with using The as a pattern to search for

[ ]:
grep The haiku.txt

72317b92b6234a619d4554edd9a04f61

First, we see that grep also included words that have the sequence “The” in them. Additionally, we see from the below image by running cat haiku.txt that all lower case versions of the word the were missed.

1bdb020a6b404375a44d05709779adc7

To fix this we can include additional options to perform a better search

[ ]:
grep -win The haiku.txt

0bdd09e00db5431988c729238b2ac2e1

From above image grep is now returning what we had initially intended. This shows that

  • grep by default is case-sensitive. Using the -i option allows grep to be case-insensitive

  • grep will search for any sequence of a pattern. Using -w will search for the word

  • We can also return line numbers with grep by using the -n option.

If we wanted to search multiple text files in a directory as well as the subdirectories we could use the recursive option, -r with grep

[ ]:
grep -iwnr "Yesterday" .

From the above command we see that, in addition to the options and patter, we also now must include the directory we want to search. It is important to note that grep only works with ASCII (text) files and cannot search binary files.

5) grep and the pipe

Just as with any other program, we can also pipe things to grep so that it can search. As an example, when users want to find a command that had previously been executed in the terminal they can use the history command with grep

[ ]:
history | grep -i "cd"

The above command will list the entire history and return only the lines that have “cd” in the string.

6) find: a tool for searching for directories and files

grep is used to search for information within a file. The find searches the file system to match patterns associated with a file and a directory.

[ ]:
find .

The above command will recursively search for all files and directories in the current working directory and display them to the std output.

To search only for directories use the -type d option

find . -type d

To search for files use -type f instead

find . -type f

To find text files you can use patterns associated with file names

find . -type f -name "*.txt"

Notice that find doesn’t match our typical syntax for Unix shell commands. Find can also be used to display the path to a file or directory and used with other commands. As an example, if we wanted to find all text files that match the pattern NE*.txt and determine how many lines the files have

[ ]:
wc -l $(find . -type f -name "NE*.txt")

In the above command the items contained within the parenthesis are executed first, then the $ symbol treats the return value from find as a variable. This extends the capabilities of wc as we were only able to use the program with files in the curren working directory. We could also perform the same step by using the -exec option with find

[ ]:
find . -type f -name "NE*.txt" -exec wc -l {} \;

We have now given a basic overview of the Unix shell and a few of the powerful features. In further lessons we will go into scripting and working with the shell using remote sessions.