30 June 2001
Copyright, Information Technology Services
training@cc.utexas.edu
The University of Texas at Austin
This course explains how to use several UNIX tools to search for text strings and files, sort files, set file permissions, and perform other operations. After completing this course you should be able to:
· Execute a shell, change your default login shell, perform I/O redirection
· Create regular expressions and search for text using grep.
· Sort files and other input.
· Count items with the wc command.
· View and set file permissions with ls and chmod.
An operating system manages the use of hardware resources, the storage and recall of data, controls the flow of data, and provides an environment for application programs. The UNIX operating system offers an interactive, multi-user, multitasking environment for users and provides for the creation and management of processes, the file system, and communications.
UNIX can be installed and used on many different types of computers. The UNIX operating system can be used on a CRAY supercomputer or on a microcomputer in your home. The beauty of UNIX is its portability and consistency across many different platforms.
Filters are tools that allow you to manipulate text. Filters allow you to perform a task on a file and then filter the changes on to another file. Filters do not alter the original file.
The sort command sorts the lines in a file alphabetically or numerically. The default sort is alphabetical. Options are used to determine the type of sort that is used. The valid options are listed below.
-n |
Sort
by arithmetic value (Ignore blanks and tabs) |
-r |
Reverse
the sort order |
-f |
Sort
regardless of upper or lower case |
+x |
Limit
sort to field x |
To
use the sort command, type sort, space, an option if desired,
space and the filename.
% sort -f students.doc
1. Use the sort command to sort the contents of the file students.doc. Which line is listed last? Why is this line listed last?
2. Using the cat command, display the contents of the file students.doc. Has the file been modified?
3. Sort the file students.doc in reverse alphabetical orrder.
4. Sort the file students.doc, telling UNIX to ignore the case.
5. Sort the file students.doc in reverse order and ignoring case.
The
grep (global regular expression
printer) command is used to search for patterns in a file. To use the grep command, type grep,
space, the string of characters you wish to search for, space, and the
filename.
% grep nd test.doc
Since
grep distinguishes between upper and
lower case, the -i option can be used to ignore case.
% grep -i nd test.doc
The -n option is used to display the line number for each string located.
% grep
-in bush students.doc
Exercise:
Using the grep command on the file students.doc, determine if the following students took the class.
Boris Yeltsin
Steve Allen
Bill Clinton
The wc command reads a file and displays the number of lines, words and characters contained in the file. The output from the wc command lists the number of lines, the number of words, the number of characters in a file and is followed by the filename. A "word" is defined as a string of characters surrounded by white space.
% wc test.doc
55 400 2369 test.doc
lines words characters filename
Exercise:
1. Using the wc command, determine how many lines are in the students.doc file.
2. Does this give you an accurate count of the number of students in the class? Why or why not?
3. How many words are in this file?
The lpr command (lpr is an abbreviation for lineprinter.) is used to send a copy of your file to a printer. The option -P is used to name a print site which follows the option. To use the lpr command, type lpr, space, -P, the print site, space and the filename.
% lpr -Pfacsmf_lw test.doc
For a list of commonly used print sites available, type man sites at the prompt.
% man sites
If your file is a PostScript file (produced by troff, TeX or LaTeX text formatters), it will not print accurately on a lineprinter. A PostScript file has %! as its first two characters and contains formatting instructions that cannot be interpreted by a lineprinter. PostScript files should be sent to a laser printer (output sites for Laser Writers end with an lw). For more information about printing PostScript output files, see the man pages (man lpr).
At each output location, output is filed according to the last three digits of the ITS user number.
File and directory protections are displayed using the ls command with the -l option. When executed the "long" form of the ls command displays output similar to the sample below.
File Protections |
No. of file links |
User |
Group |
Size |
Date last modified |
Time last modified |
Filename |
-
rw- r- -r- - |
1 |
username |
|
204 |
Mar
16 |
13:59 |
README |
drwxr-xr-x |
2 |
username |
|
2369 |
Jan
12 |
21:57 |
bin |
The first character in the long file description signifies whether the file is a ordinary file (a dash "-" character signifies an ordinary file) or a directory file (a "d" signifies a directory file). The next 9 characters define the type of permissions set for the file. If permission is not granted, a dash "-" character is displayed.
Character #1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
File/Directory |
|
User |
|
|
Group |
|
|
Others |
|
(-/d) |
read (r/-) |
write (w/-) |
execute (x/-) |
read (r/-) |
write (w/-) |
execute (x/-) |
read (r/-) |
write (w/-) |
execute (x/-) |
There are three levels of file protection - user, group and others. They are defined as follows:
User (u) The user or owner of the file or directory.
Group (g) A defined set of users that share access to the file or directory.
Others (o) The remainder of authorized users on this system.
Every file or directory in a UNIX file system has three types of permissions (or protections) that define whether certain actions are permissible for the user, a group and others. These permissions are:
read (r) a user who has read permission to a file may look at its contents. For a directory, read permission enables a user to list the files in that directory.
write (w) a user who has write permission to a file can modify the contents of that file. For a directory, the user can create and delete files in that directory.
execute (x) a user who has execute permission for a file can execute the file (providing that it is a valid executable file). For a directory, execute permission allows a user to change to that directory, search the directory for a specified file and include the directory name in a path.
The number of file links is usually 1. A directory has at least 2 file links. It is possible to have a number of directory entries all pointing to one file.
The next two fields display the owner's name and the group name designations for the file. The group field is left blank unless the -g option is specified.
The group is followed by the size of the file in bytes, date last modified, time last modified and the filename.
The chmod command (abbreviated from change mode) is used to change the permissions for a file. The chmod command is listed first on the command line and is followed by a space. Next, the set of users whose permissions you wish to change must be designated. Any or all sets may be changed. Following the set of users, the action is specified. An action can add, remove, or set the specified permissions. The action is followed by the permissions of read, write and execute. A single permission, multiple permissions or all permissions may be changed. The last item on the command line is the filename.
possible levels |
possible actions |
possible permissions |
u - user only g - group only o - others only a – all ug - user and group go - group and others only ugo - user, group and others |
+ add - remove = sets the specified permissions and removes previous settings |
r read w write x execute |
% chmod a+rw neon
adds read and write permissions for user, group and others to the file named neon
% chmod o-rwx neon
removes read, write and execute permissions for others to the file named neon
% chmod u=r neon
sets permissions to read only for the owner of the file and removes all other permissions for the owner to the file named neon
Exercise:
1. Set the file permissions for the file chap1.1 to read and write for members of your group. Use the ls –l command to confirm your changes.
2. Set the file permissions for the file victor to read only for user, group and others. Use the ls –l command to review your changes.
3. Now change the file permissions for the file victor to read and write for the user only (removing all other permissions. Use the ls -l command to review your changes.
Another method used to specify permissions for the chmod command uses a 3 digit octal number, with each octal digit representing the permissions for a particular set of users (user, group, other). The table below provides the octal numbers for each possible combination of symbolic permissions.
Symbolic Code |
Octal Number |
r - - |
4 |
- w - |
2 |
- - x |
1 |
rw - |
4 +2 = 6 |
r - x |
4 + 1 = 5 |
- wx |
2 + 1 = 3 |
rwx |
4 + 2 + 1 = 7 |
- - - |
0 |
A specific octal number designates each of the possible permission combinations. A single octal number can be used to describes the permissions for user, group or others. A three digit octal number can be used to describe the permissions for user, group and others. If the permissions you wish to specify are rw - r - -r - -, then the octal number can be determined as shown in the diagram below.
|
user |
group |
others |
symbolic |
r
w - |
r -
- |
r -
- |
|
\ / |
\ / |
\ / |
|
4
+ 2 = 6 |
4
+ 0 = 4 |
4
+ 0 = 4 |
|
\ / |
\ / |
\ / |
octal |
6 |
4 |
4 |
Using the octal numbers in the diagram above, the chmod command would be issued as follows:
% chmod 644 test.doc
As
with files, directories also have permissions.
To list only directory files and their permissions, issue the following
command
% ls -ld
To
set the permissions such that the group can read, write and execute the
directory named research, type:
% chmod g=rwx research
Alternatively,
using the octal notation, you could just type:
% chmod 770 research
To
change the permissions for you current directory to only user read, write and
execute, type:
% chmod 700 .
To
change the permissions for you parent directory to only user read, write and
execute, type:
% chmod 700 ..
To
limit access to your directory for group and others, use the following
permissions:
drwx may do anything to this directory
d - - - may do nothing to this directory
d - - x may look at files (only files with read
permission) if you know the filenames
dr - x may list the contents of the
directory, look at the files, but not create or remove file in the directory
d - wx may create, remove or access files in
the directory if you know the filenames (cannot list filenames)
1. Use the ls –ld command to review the directory permissions for you home directory.
2. Set
the file permissions on your home direcotry so that all users in your group can
execute your directory, but not see the contents of your directory. Now, use the ls –ld command to review your changes.
3. Using the mkdir command, create a directory call documents and set the file permissions so that members of your group can review information about files in this directory (read the list of files in the directory).
When you create a file, the system gives it a default set of permissions. The default permissions are controlled by the system administrator and will vary from installation to installation. When changed, the umask command will only affect newly created files. To change the permissions for previously existing files, use the chmod command.
The umask command uses octal numbers to indicate permissions that have been turned OFF. (Please note that the chmod command is exactly opposite.) For example, a umask command using the octal number of 027 can be converted into the following symbolic expression for file permissions:
rwxr- x - - -
To see what your current umask settings are, issue the umask command without any arguments. The system will display the octal number.
% umask
22
Leading zeros are not displayed, so the 22 designation above can be interpreted as 022. No permissions for the user are turned off, write permissions for group and others are turned off. The symbolic expression is rwxr - xr - x.
The easiest way to compute the umask is to subtract the permissions that you want from 777. For example if you wish to have permissions of 644, then subtract 644 from 777 to obtain the umask setting.
777
- 644
133
To change the permission defaults, type umask, space and the appropriate three digit octal number.
% umask 133
Note: The
umask command only effects the current login session. To permanently change your umask setting, the umask command must
be added to the .cshrc file.
The shell acts as a user interface that communicates both with you, the user and with the operating system or kernel. The shell is known as a command interpreter because the shell interprets the commands that you type, starts the programs that you request and works as a buffer between the user and the operating system. One of the most outstanding characteristics of the UNIX operating system is that the shell is just a program. You may choose to run any of several shell programs. The diagram below represents the relationship between the shell, the kernel and the hardware.
The shell can also be used as a programming language. You can use a shell to execute a set of shell commands placed in a file, called a shell script or shell "program". That shell script can then be invoked at any time and will perform the listed commands.
The most common shells are described below. The ITS machines use the C-shell as the default interactive shell. The Bourne shell, by convention, is the shell used for writing shell programs.
sh The "Bourne shell" was named for the shell's author, Steven Bourne of Bell Labs. The Bourne shell is found on every UNIX system. The usual convention is to use the Bourne shell to write shell programs.
csh The "C-shell" was written by a group of people at Berkeley. Many of the programming language constructs resemble the C language, hence the name C-shell. The C-shell is commonly used for interactive use. The C-shell is known for the three major features described below.
Job Control allows the user to switch between multiple processes.
History keeps a list of previously executed commands which can be recalled and executed.
Aliases allow the user to abbreviate commands.
ksh The Korn shell, written by David Korn is available from the AT&T Toolchest. The Korn shell adds job control, history, command line editing, aliases and subroutines to the Bourne shell. The Korn shell is compatible with Bourne shell scripts and is commonly available on System VR4 UNIX machines. The Korn shell is only available on emx at the ITS.
tcsh The tcsh shell is public domain software. Tcsh is basically the C-shell with command completion (allows the user to type partial commands which are completed by the system) and command line editing.
bash The bash shell (Bourne Again SHell) is offered free by the GNU foundation. The bash shell is bourne shell compatible and offers command line editing, command aliases and history.
To identify the shell you are currently using, type printenv SHELL at the prompt. In the example below, the shell in use is the C-shell.
% printenv SHELL
/bin/csh
Locates commands; passes
arguments and control to the command
Handles the sequential and
concurrent execution of commands
Performs I/O redirection
Provide wildcards for
filenames
Maintains environment
variables
Handles pipes (chain
output of one program into another program)
Provides job control
Provides command aliases
Provides a programming
language
Each program that you run while logging in or working in the shell is called a process. After the login process is complete, the C-shell process is running and provides the prompt that you see. At this point, additional processes may be run. Examples of other processes one might choose to run are the mail program, vi, troff, a C compiler or another C-shell.
In the diagram above, the C-shell represents the login shell. A child process is a program that was started under the current shell. The shell and its children are arranged in a hierarchical structure, much like the file system. The additional processes running under the login C-shell are known as children. During a UNIX session, you may start as many processes as you need.
An environment variable can store information that is available to the current shell and to its children. Examples of some environment variables are shown below.
The printenv command is used to display the current environment variables. To use the printenv command, type printenv at the command prompt. The shell displays the current environment variables. An example of the output displayed from the printenv command with an explanation for each variable is listed below.
% printenv
HOME=/home/path/u0/cc/userdirectory your login
or home directory pathname
SHELL=/bin/csh your default shell
TERM=vt100 terminal type, used by editors
USER=username your login name
PATH=/usr/local/bin:/usr/local:/usr/ucb:/bin:/usr/bin:
the list
of directories searched for commands
EDITOR=vi name of your default editor
The variables listed in the previous example are known as global variables. This set of variables is available to all processes. The setenv command is used to set global environment variables. The unsetenv command is used to unset global environment variables. To change the EDITOR environmental variable to ed, type setenv, space, EDITOR (in caps), space and ed.
% setenv EDITOR ed
To change the EDITOR environment variable back to vi, type the command again substituting vi for the ed value.
In the C-shell you have the option to create and set additional "local" variables which are used in the current process. The set command is used to create and set a local variable. The UNIX convention for variables is that UPPERCASE LETTERS are used for GLOBAL VARIABLES and lowercase letters are used for local variables. In the example below, the set command is used to set a local variable g to the value /home/games.
% set g=/home/games
To display a list of local variables, type set at the prompt. To delete a local variable, type unset, space and the variable name.
% unset g
The environment variables that you will use frequently are HOME, PATH and TERM. A discussion of each is provided.
The PATH environment variable is a list of directories separated by colons that the shell searches when it attempts to execute a command. The order of the directories listed in the PATH variable is significant since the shell will search for a command using the order specified.
For example, if a program is stored both in the /bin directory and in your home directory, when the program is executed it will search the PATH for the first directory listed that contains the program. If the /bin directory is listed first in the PATH, then it will be searched first and the program will be executed from the /bin directory. If your home directory is listed in the PATH first, the program will execute from your home directory.
To see the current PATH environment variable setting, type printenv at the prompt. The shell will display the current environment variables including PATH.
When the shell starts up, it builds a "hash" table listing the commands found in each directory in the PATH. The rehash command is used to rebuild this table. If you change the PATH, use the rehash command to update the table and to use the new PATH in your current login session. If you do not use the rehash command, the changes will not take effect until the next time you login.
Exercise:
1. Using the pwd command, print out the path to your current working directory.
The place from which a program reads input is called Standard Input. Standard input by default is the keyboard. For some commands, if no file argument is specified, the input will be read from standard input (the keyboard).
The place to which a program writes its output is called Standard Output. Standard output by default is the terminal screen (or monitor).
Standard Error is used to notify the user about errors. Standard error by default is displayed on your screen.
As long as the standard input, standard output, and standard error defaults are not changed, the operating system expects to read information from your keyboard, writes output to your screen and displays standard error on the screen.
Both standard input and standard output can be changed. For example, standard input can come from a file and standard output can be written or appended to a file. The characters <, >, and >> are used to redirect standard input and output. The I/O (short for input/output) redirection characters are used in a command line between file or process names. All I/O redirection is handled by the shell.
The command "who" generates a list of users currently using the system. The standard output for this command is the terminal screen. To redirect the output to a new file named users, type who, the > character and the filename.
% who > users
The table below displays the various I/O redirection characters that are available.
> filename |
Redirects
standard output to a file |
>> filename |
Appends
standard output to an existing file or create a new file if the file does not exist |
< filename |
Redirects
standard input from a file |
Examples of standard I/O redirection:
who > users |
output
from who is redirected into a file
named users (a new file) |
who >> test.doc |
output
from who is appended to the file
named test.doc |
cat test.doc
> Saved.text.mss |
output
from the catenation of test.doc is
written to Saved.text.mss (Saved.text.mss will be overwritten) |
sort < student.doc >
student.sort |
The
file student.doc is input for the sort command . The output from the sort command creates a file named student.sort |
Exercise:
1. Use the who command to determine the users currently logged in.
2. Issue the who command again, but this time redirect the output to a file called users.
3. Sort this file and redirect the output to a file called users.sorted.
A Pipe connects the standard output of one program to the standard input of another program. The "|" character is used to signify a pipe. More than two commands can be connected with a pipe. A series of commands can be connected together, the standard output from each becoming the standard input to the next command in the series.
For example, if you want to display a list of the first ten entries of a sorted list of users on the system, you could issue the following command:
% who | sort | head
Exercise:
1. Use pipe (|) with ls -l to see a list of your files one screen at a time?
2. We want to see who is currently logged on, but we want the information sorted and displayed one screen at a time. Use the pipe (|) in conjunction with the who, sort and more commands to achieve this goal.
A series of commands can be listed on one command line using the semicolon (;) character to separate each command. The commands will be executed sequentially. If one command fails, the other commands are still executed.
% who ; cat students.doc ; more test.doc
The UNIX operating system is a multitasking. The ampersand (&) character placed at the end of a command line is used to run a process or job in the background. The process in the foreground continues to accept standard input and is displayed on your screen. To start a process and run it in the background, type the command followed by the ampersand character.
% cc program.c &
After
you enter the command, the process id is displayed and the command continues to
run in the background.
% cc program.c &
[4] 17738
The ps (process status) command displays the status of all processes that are running. To use the ps command, type ps at the prompt. A list of your current processes is displayed.
% ps
PID TT STAT TIME COMMAND
5218 2f T 0:01 vi
7937 2f T 0:00 vi
16462 2f E 0:01 csh
The kill command is used to send a signal to a process. There are two common options used with the kill command. The first option, -HUP sends a hangup signal to a process. The HUP option allows the process to clean up after itself before it dies. For example, the vi text editor will save a copy of your file before exiting the program when it receives a hangup signal. To use the kill command with the -HUP option, type kill, space, -HUP, space and the PID number (Process Identification Number). The word "Hangup" is displayed as a confirmation that the process has been terminated.
% kill -HUP 5218
Hangup
The second option, -9 sends a kill signal to a process. The -9 option cannot be ignored and will always kill the process. The process will not have an opportunity to clean up after itself. To use the kill command with the -9 option, type kill, space, -9, space and the PID number.
% kill -9 10433
The kill command given with the C-shell id will kill all processes within that shell.
The C-shell provides additional commands that are used to control jobs or processes. You may suspend a job, display a list of currently executing jobs, send a job to the background or call a job to the foreground.
To suspend a job, type <Control z>.
The jobs command displays a list of jobs currently running. The information displayed includes job number and the status and name of the job.
% jobs
[1] + Stopped vi neon
After a job has been stopped, you are free to do another task, for example reading your mail. After you finish this task, you can resume the suspended job exactly where the job left off. To resume a job, you can use the foreground command. A job can be moved to the foreground by typing only fg or fg, space and the job number. Any job output or screen display is immediately seen on your screen.
% fg %1
Before you
logout, you must resume and finish or kill suspended jobs.
Often, you may start a time consuming job and wish to continue working on other projects at the same time. In the example below, you will format a long document using the troff text formatting program.
% troff -man csh.man > csh.dvi
After
you start a job, suspend it with <Control-z>.
[1] +
Stopped troff - man
csh.man
Then,
the job can be sent to the background using the bg command. The background
command is used by typing bg, space
and the job number.
% bg %1
[1] troff -man csh.man
&
The c shell keeps a list of commands that have been executed during this login session. The list is called history and is stored in the memory of the system. History is listed in the order of execution (with your last command listed last). The number of commands stored and displayed in history is set using variables. To see the current history, type history at the prompt.
% history
Each event listed in the history is numbered A command from history may be substituted at your command prompt by using an event selector. An event selector always begins with an exclamation point (! also known as "bang"). A list of the valid event selectors is provided in the following table.
Event
Selector |
Detail |
!31 |
Selects command number 31 |
!ec |
The last command beginning with the letters “ec” |
!?xy? |
The last command containing “xy” |
!-3 |
Three commands ago. |
To use an event selector, type the appropriate selector at the prompt.
% ! 15
Exercise:
1. Use the history command to get a list of your previous commands.
2. Use each of the event selectors listed in the table above to execute a command from history.
An alias is used to create shorthand commands. To display the current aliases that have been defined, type alias.
% alias
h history
l ls -FC
To create a new alias, type alias, space, the new alias, space and the name of the command.
% alias
d date
To delete an alias, type unalias, space and the name of the alias you wish to remove.
% unalias d
1. Create an alias for the date command.
2. Create an alias for the who command.
3. Delete both aliases.
As mentioned previously, the shell , our command interpreter, implements a programming language. You can place these language constructs or programs in a file. These files are called shell scripts.
The shell programming language implements variables, control structures (if statement, while loops, for statements), parameter passing and interrupt handling. Examples of shell scripts are the .login, .cshrc and .logout files.
To create your own command follow the steps listed below:
Create a shell script
Make the file executable (chmod command)
Place the file in a directory, usually the $HOME/bin directory is used
Alter the PATH variable to include that directory
Type rehash to rebuild the C-shell's internal table (csh only)
To execute the new command, type the command name.
There are many different types of terminals that can be used to log in to a Computation Center system. It is important to correctly specify the terminal type so that programs may properly control your terminal. This is most important for full screen editors and menu systems. This information is passed to programs via the global shell variable TERM. You can use the setenv command to set this variable.
% setenv TERM vt100
See the list below for some of the common terminal types at UT Austin.
Terminal |
Type |
DEC
VT 100 |
vt100 |
Kermit |
vt100 |
Micro-Term
Ergo Series* |
vt100 |
Micro-Term
MIME-2A emulating enhanced VT52 |
vt52 |
Sun
Workstation |
sun |
Televideo
950 |
tvi950 |
*These terminals can emulate a
variety of terminals. The most common
terminal setting used at UT is vt100.
If your terminal is not listed, you can leave your terminal unidentified without serious problems.
Note: You will not be able to use a screen editor without specifying your terminal type.
The stty command is used to display the keyboard settings for your terminal. To use the stty command to display your current terminal options, type stty all at the prompt.
To change a terminal option, type stty, space, the name of the option, space, the "^" character and the letter you have chosen to assign. Avoid using letters that are already in use. The best choices are control characters. You can either type the control character directly or represent it by typing "^" followed by a capital letter.
% stty erase ^P
The erase character is the character that allows you to
backspace and correct typing mistakes.
The kill character allows you to erase an entire line. The interrupt character aborts execution of
the currently executing program.