Welcome
- These notes are part of a course on software development skills for scientists and engineers being prepared by Greg Wilson for the Python Software Foundation
- Please use the links included with the slides to provide comments and feedback
- Comment on this slide
Course Outline
Acknowledgments
- The
Python Software Foundation, for the grant that made this work possible - The
University of Toronto, for letting me test this version of this course on its students - Its
Department of Computer Science, for giving me a home, and making me feel welcome - Brent Gorda, who helped create the first version of this course
- Heather Mayer, for the artwork
YesLogic, for generously donating licenses for Prince (their XML-to-PDF converter)- The many people who commented on this material, and suggested ways to improve it:
| Hossein Bidhendi | Stephane Bortzmeyer | Michelle Craig | Simon Duane | Paul Dubois | Hans Fangohr | Brent Gorda |
| Adam Goucher | Perry Greenfield | Paul Gries | Brandon King | Catherine Letondal | Michelle Levesque | Andy Lumsdaine |
| Laurie MacDougall | Keir Mierle | Kit-Sun Ng | Dirkjan Ochtman | Victor Putz | Irving Reid | Karen Reid |
| Paul Salvini | Diomidis Spinellis | Bill Spotz | Tom Van Vleck | Jim Vickroy | | |
- Frank Willison—I'm sorry this one was finished too late for you to tune up
- John Scopes, and everyone else with the courage to fight for the idea that the truth is more important than doctrine
- Comment on this slide
Introduction
Motivation
- Computers are as important to scientists as telescopes and test tubes
- Analyze problems that are too complex for traditional means
- Simulate things that can't be studied in laboratories
- Many scientists now spend much of their professional lives writing and maintaining software
- A quarter of graduate students in science and engineering spend 25-50% of their time programming
- But most scientists have never been taught how to do this efficiently
- It's a long way from the loops and arrays of first year to simulating bone development in foetal marsupials…
- Like being shown how to differentiate polynomials, then expected to invent the rest of calculus
- This course will teach you how to design, build, maintain, and share programs more efficiently
- Focus: tools and techniques appropriate for half a dozen people working together for a year
- Everything you do at that scale will also make you more productive when you're working on your own for a week
- Will not turn you into a computer scientist
- Far too many of them around anyway
- Instead, goal is to teach you the equivalent of good laboratory technique for computational science
- The 20% of ideas that account for 80% of real world use
- Software carpentry, rather than software engineering
- Comment on this slide
Meeting Standards
- Experimental results are only publishable if they are believed to be correct and reproducible
- Equipment calibrated, samples uncontaminated, relevant steps recorded
- In practice, almost always rely on the professionalism of the people doing the work
- How well do computational scientists meet these standards?
- Correctness of code rarely questioned
- We all know programs are buggy, but when was the last time you saw a paper rejected because of concerns over the quality of the software used to produce the results?
- Reproducibility often nonexistent
- How many people can reproduce, much less trace, each computational result in their thesis?
- The times they are a-changing
- Standards for computational work can only go up
- Change can happen almost overnight
- Like the American car market when German and Japanese imports appeared in the 1970s
- Comment on this slide
The Most Important Idea in This Course
- Improving quality improves productivity
- I.e., the tools and techniques you must adopt to produce better code also help you write more code, faster
- Comment on this slide
Who You Are
- User stories
- Important part of designing user interfaces for mass-distribution software
- Helps make discussion of features and usability more concrete
- Harry
- 27; B.Sc. in zoology
- Did an introductory Fortran course nine years ago, and attended a workshop on web-based bioinformatics tools when he started his job
- Now developing fuzzy pattern-matching algorithms for Genes'R'Us, a biotech firm with labs in four countries
- Ron
- 24; B.Eng in mechanical engineering, now doing an M.Sc. part time
- Did C in first year; has been using MATLAB ever since
- Modeling thermal degradation (a.k.a. “melting”) of firefighters' helmets
- Hermione
- 34; Ph.D. in physics
- Took two courses on C and two on numerical analysis as an undergrad, and a computer graphics course as a graduate student
- Now in charge of the 5-person flywheel braking group at Yoyodyne Inc.
- Ginny
- 22; finished a B.Eng. in chemical engineering last year, now doing an M.Sc. in biochemistry
- Did C in first year, and has built a personal web site (static HTML only)
- Thesis topic is improving the yield of organic fullerene production
- Albus
- 47; Ph.D. in mathematics; studies large random graphs using both analytic and computational techniques
- Undergraduate degree in Mathematics and Computer Science in the 1970s; programs mainly in C++
- Professor at Euphoric State University; former chair of graduate studies
- Comment on this slide
A Quick Self-Test
- Adapted from
[Spolsky 2004]
and
[McConnell 1997]
- 1 for “yes”, 0 for “no”, and -1 if you don't understand the question
- So:
- Do you use version control?
- Can you rebuild everything in one step?
- Do you have an automated test suite?
- Do you build the software, and run the test suite, daily?
- Do you have a bug database?
- Do you use a symbolic debugger?
- Do you use a style checker to ensure that your software is written in a uniform, readable way?
- Can you trace everything you release back to the software that produced it?
- Do you document as you program, and keep your documentation in your source files?
- Can you set up a development environment (including any libraries you need) on a fresh machine without heroic effort?
- Do you have a schedule with small binary milestones?
- Do you estimate how long tasks will take before you start, and compare that with how long they actually took?
- If you're working with others, or developing software that other people may use:
- Do you test your interfaces using paper prototypes before implementing them?
- Is there a searchable archive of discussions about the project?
- Do team members write and share small tools for automating common tasks?
- Does your schedule allow for infrastructure development, training, sick time, etc.?
- And your score is?
- Comment on this slide
Learn by Building
- So why are we where we are?
- It's difficult to learn these things from academic computer scientists
- CS research is more concerned with rapid prototyping than with reliability
- People are naturally sceptical of innovation
- Particularly after they've seen a few bandwagons roll through
- Glass's Law
[Glass 2002]
: any new way of doing things initially slows you down
- You only have to be as good as the competition
- American auto makers in the 1970s
- This course's approach:
- Introduce some basic tools
- Students immediately see benefit of taking the course
- Tools can be used to manage the course itself
- Show students how to build tools like these
- Where “how” includes both what goes into the software, and how to create it
- Solidifies understanding of tools' capabilities and limitations
- Makes discussion of technique more concrete
- Show students what else they can do with their new skills
- The right way to tackle issues that come up over and over again
- Key point: avoid overload
- People who already know these things tend to underestimate how hard they are to learn
- No point preaching to the converted
- Try instead to move the middle of the bell curve to the right
- Why is the course called software “carpentry”?
- Because it focuses on the details of the craft
- If software engineering is about building an electronic version of the Channel Tunnel, this stuff is the equivalent of putting an extension on the house
- Comment on this slide
Topics
- Shell programming
- Version control
- Automating builds
- Python (4 lectures)
- Systematic debugging
- Higher-level programming (2 lectures)
- Testing (2 lectures)
- Coding style and reading code
- Data crunching (4 lectures)
- Web programming and security (3 lectures)
- Software development project tools and processes (4 lectures)
- Comment on this slide
Setting Up
- Some previous programming experience
for loops, if/then/else- Function calls
- Arrays
- File I/O
- Compilation
- Individual setup
- Python (version 2.4 or higher)
- Cygwin (on Windows)
- An editor
- We'll look at smart ones later in the course
- Subversion
- We'll spend a lecture on this next week
- Time
- Expect to spend 2-3 hours outside class for each lecture
- Comment on this slide
Recommended Reading
- “If you only have time to read one book, make time to read two”
-
[Glass 2002]
summarizes what we actually know about programmers' productivity
-
[Hunt & Thomas 1999]
and
[Gunderloy 2004]
are about the things that distinguish good programmers from bad ones
-
[Lutz & Ascher 2003]
is the standard introduction to Python
-
[Langtangen 2004]
is a comprehensive introducton to Python aimed squarely at scientists and engineers
- Goes into much more detail than this course will
- But doesn't address broader issues, such as programming practices
- See the Bibliography for others
- Check out some of the Online Resources as well
- Comment on this slide
Typographic Conventions
Version Control
Problem #1: Synchronizing Files
- Want to work on one set of files on three different machines
- Option 1: use a shared file system
- Difficult to set up
- And even more difficult to make secure
- Inflexible: what if you're on the road?
- Option 2: Carry around a floppy or USB dongle
- Have to remember to copy files onto it and then off again
- What if your project is too big?
- Option 3: mail, FTP, SCP, etc.
- Still have to remember to push and pull exactly the right files at exactly the right time
- Option 4: get the computer to do the work
- Keep a master copy in one place
- Use a program to synchronize working copies when and as needed
- Comment on this slide
Problem #2: Undoing Changes
- Often want to undo changes to a file
- Start work, realize it's the wrong approach, want to get back to starting point
- Like “undo” in an editor…
- …but longer-lived—keep the whole history of every file, forever
- Similarly, often want to see who changed what, and when
- When working in teams, want to see what your partners did
- Bugs are more likely to be in fresh code than in code that's been running for a long time
- Comment on this slide
Solution: Version Control
- Solve both problems at once by using a version control system (VCS)
- Mechanics:
- Keep the master copy of every file in a central repository
- Actually, keep all old versions of every file
- Everyone on the development team does their work in a working copy
- When you're ready to share your changes, you commit them to the repository
- The VCS saves the old version of the file, then writes your changes on top
- Also records the time of the change, who made it, and a comment
- Take a look in a moment at what happens if two or more people try to make changes at the same time
- Comment on this slide
CVS and Subversion
- Two open source version control systems in widespread use
- Many others available commercially
- If you can afford it, use Perforce
- CVS (Concurrent Version System)
- Invented in the 1980s
- Very popular, but showing its age
- Flaw #1: it keeps track of each file separately
- But authors often change several files in tandem
- Since CVS has no notion of a “batch submit”, there's no reliable way to say, “What other changes were made in conjunction with this one?”
- Flaw #2: you can create new directories, but can't ever delete old ones
- Subversion
- Designed as a backward-compatible replacement for CVS starting in 2000
- Fixes both of the major flaws in CVS
- Many open source projects have switched, or are switching…
- …so we'll use it in this course
- See the
Subversion site for details
- Comment on this slide
Basic Use
- Assume for a moment that a repository has been created, and that Ron and Hermione already have working copies
- Ron wants to make changes to the rotor spin simulation
- Runs
svn update to bring his copy up to date with the repository - Edits
spin.c and spin.h - Runs
svn commit to save those changes in the repository- Note: the whole repository gets a new version number
- Ron realizes that he forgot to make one change
- Runs
svn update again, just in case Hermione has also been making changes (she hasn't) - Edits
spin.c again - Runs
svn commit a second time
- Several hours later, Hermione runs
svn update on her working copy- Subversion copies Ron's changes into her directory
![[Basic Use]](img/version/basic_use.png)
Figure 3.1: Basic Use |
- Comment on this slide
How To Do It
- One way to use Subversion is to type commands in a shell
- Guaranteed to work everywhere without anything else being installed
- But there are several good graphical interfaces for Subversion too
RapidSVN runs on Windows, Linux, and Mac- Well, maybe “walks” is a better description—as of Version 0.9, it's not the fastest thing in the world
![[RapidSVN]](img/version/rapidsvn.png)
Figure 3.2: RapidSVN |
TortoiseSVN is a Windows shell extension- Which means that it integrates with the Windows file browser, rather than running separately
- But that also means that it doesn't work on Linux or Mac
![[TortoiseSVN]](img/version/tortoisesvn.png)
Figure 3.3: TortoiseSVN |
- And if you're on a Macintosh, there's
SmartSVN - Comment on this slide
Working Together
- What if two (or more) people want to edit the same file at the same time?
- Option 1: prevent it
- Only allow one person to have a writeable copy of the file at once
- Pessimistic concurrency
- Microsoft Visual SourceSafe
- Option 2: patch up afterwards
![[Merging Conflicts]](img/version/conflict_merge.png)
Figure 3.4: Merging Conflicts |
- Ron and Hermione both have copies of
spin.c version 15int maxRotateSetting(int * available, int length)
{
int i, maxFound;
if (length == 0) {
return 0;
}
for (i=0; i<length; i+=1) {
if (available[i] > maxFound) {
maxFound = available[i];
}
}
return maxFound;
}
- Ron commits his changes
- Creates
spin.c version 16 // Find maximum rotation setting from those available,
// or 0 if none are available.
int maxRotateSetting(int * available, int length)
{
int i, maxFound;
if (length == 0) {
return 0;
}
for (i=0; i<length; i+=1) {
if (available[i] > maxFound) {
maxFound = available[i];
}
}
return maxFound;
}
- Meanwhile, Hermione is editing her copy, and produces this:
// Find maximum rotation setting, or 0.
int maxRotateSetting(int * available, int length)
{
int i, maxFound = 0;
for (i=0; i<length; i+=1) {
if (available[i] > maxFound) {
maxFound = available[i];
}
}
return maxFound;
}
- Tries to submit her changes: conflict!
- Subversion puts both Ron's and Hermione's changes in Hermione's working copy, with markers
<<<<<<< .mine
// Find maximum rotation setting, or 0.
=======
// Find maximum rotation setting from those available,
// or 0 if none are available.
>>>>>>> .r471
int maxRotateSetting(int * available, int length)
{
int i, maxFound = 0;
for (i=0; i<length; i+=1) {
if (available[i] > maxFound) {
maxFound = available[i];
}
}
return maxFound;
}
- Also creates
spin.c.mine, spin.c.15, and spin.c.16 for reference - Hermione must decide what to keep, and what to throw away
- Subversion won't let her commit his changes until all the conflict markers are eliminated
- Once Hermione is done editing, she:
- Runs
svn resolved spin.c to tell Subversion the conflict has been fixed - Runs
svn commit spin.c to create spin.c version 17
- Comment on this slide
What Versions Actually Mean
- The discussion above referred to “version 16 of
spin.c”, but in fact there is no such thing - Instead, there's version 16 (or 17, or 18…) of the repository
- Users are supposed to try to keep the files in the repository in a consistent state
- I.e., don't submit things that are half-done
- Since the next person to do an update would then be in the same half-done state you are
- Subversion therefore updates the version number on the whole repository every time a set of changes is submitted
- Each change set can affect any number of files (including adding or deleting files)
- The phrase “version 229” therefore uniquely identifies an entire set of files
- Unlike CVS and other systems, where version 319 of one file might correspond to version 107 of another, and version 794 of a third
- Comment on this slide
Warning: Binary Files
- Subversion can only mark conflicts this way in text files
- I.e., files that store lines of human-readable characters
- Source code, HTML—basically, anything you can edit with Notepad, Vi, or Emacs
- Images, video clips, Microsoft Word, and many other formats aren't
- When there's a conflict, Subversion saves your copy and the master copy side by side in your working directory
- Up to you to resolve the differences
- Comment on this slide
Rolling Back Changes
- Suppose Ron decides that he doesn't like his recent changes
svn diff will show him which files he has changed, and what those changes are- He hasn't committed anything yet, so he can use
svn revert to re-synchronize with the master copy - If you find yourself doing this repeatedly, you should probably go and do something else for a while…
- Now suppose that Ron decides he doesn't like the changes Hermione just made to
spin.c
- Wants to do the equivalent of “undo” on several files
svn log shows recent history- He decides he wants to revert to version 16 of the repository
svn merge -r 17:16 spin.c means “merge changes, going from version 17 to version 16” (i.e., backwards)
- Can obviously go back to even earlier versions to undo more changes
![[Undoing Changes]](img/version/merge_undo.png)
Figure 3.5: Undoing Changes |
- Comment on this slide
And Finally, Getting Started
- To create a repository:
- Decide where to put it (e.g.,
/rotor/repo) - Go into the containing directory:
cd /rotor svnadmin create repo
- Can then interact with repository in two ways
- Directly through the file system:
file:///rotor/repo
- Use this if you're working on the same machine the repository is on
- Through a web server:
https://your.host.name/rotor/repo
- Use this if the repository is on a remote machine
- Note: requires your system administrator to configure the web server properly
https (instead of http) means “use a secure connection”
- To get a working copy (assuming you're using a web server):
svn checkout https://your.host.name/rotor/repo- Creates a new directory
repo
- Common to give it a more informative name using
svn checkout https://your.host.name/rotor/repo rotorproject
- Important: only use
svn checkout once, to initialize your working copy
- Comment on this slide
Subversion Command Reference
| Name | Purpose | Example |
|---|
svn add | Add files and/or directories to version control. | svn add newfile.c newdir |
svn checkout | Get a fresh working copy of a repository. | svn checkout https://your.host.name/rotor/repo rotorproject |
svn commit | Send changes from working copy to repository (inverse of update). | svn commit -m "Comment on the changes" |
svn delete | Delete files and/or directories from version control. | svn delete oldfile.c |
svn help | Get help (in general, or for a particular command). | svn help update |
svn log | Show history of recent changes. | svn log --verbose *.c |
svn merge | Merge two different versions of a file into one. | svn merge -r 18:16 spin.c |
svn mkdir | Create a new directory and put it under version control. | svn mkdir newmodule |
svn rename | Rename a file or directory, keeping track of history. | svn rename temp.txt release_notes.txt |
svn revert | Undo changes to working copy (i.e., resynchronize with repository). | svn revert spin.h |
svn status | Show the status of files and directories in the working copy. | svn status |
svn update | Bring changes from repository into working copy (inverse of commit). | svn update |
Table 3.1: Common Subversion Commands
- Comment on this slide
How to Read Subversion Output
svn status compares your working copy with the repository, printing one line for each file that's worth talking about$ svn status
M spin.c
MC readme.txt
spin.c has been modifiedreadme.txt has been modified, and has conflicts
svn update prints one line for each file or directory it does something to$ svn update
A newspin.c
U spin.c
C spin.h
newspin.c has been addedspin.c has been updated (i.e., someone else modified it)- There's a conflict in
spin.h
- Which you'll have to resolve before you can commit your changes
- Comment on this slide
Branching and Merging
- Sometimes want to work on several different versions of software at once
- Example: need to do bug fixes on Version 3 while making incompatible changes toward Version 4
- Or want two sets of developers to be able to write and test large changes independently, then put things back together
- All modern version control systems allow you to branch a repository
- Create a “parallel universe” which is initially the same as the original, but which evolves independently
![[Branching and Merging]](img/version/branch_and_merge.png)
Figure 3.6: Branching and Merging |
- Much better than just copying all the source files: the version control system remembers where the branch came from, and can trace its history back
- Can later merge changes from one branch to another
- Example: fix a bug on one branch, merge the changes into other branches that have the same bug
- Again, much better than copying by hand, since the version control system can keep track of where things came from, and where they went
- Warning: many people become over-excited about branching when they first start to use it
- Keeping track of what's going on where can be a considerable management overhead
- On a small project, very rare to need more than two active branches
- Comment on this slide
Exercises
Exercise 3.1:
Follow the instructions given to you by your instructor to
check out a copy of the Subversion repository you'll be using in
this course. Unless otherwise noted, the exercises below
assume that you have done this, and that your working copy is in
a directory called course. You will submit all of your
exercises in this course by checking files into your
repository.
Exercise 3.2:
Create a file course/ex01/bio.txt (where
course is the root of your working copy of your
Subversion repository), and write a short biography of yourself
(100 words or so) of the kind used in academic journals,
conference proceedings, etc. Commit this file to your
repository. Remember to provide a meaningful comment when
committing the file!
Exercise 3.3:
What's the difference between mv and svn
mv? Put the answer in a file called
course/ex01/mv.txt and commit your changes.
Once you have committed your changes, type svn
log in your course directory. If you didn't know
what you'd just done, would you be able to figure it out from
the log messages? If not, why not?
Exercise 3.4:
In this exercise, you'll simulate the actions of two
people editing a single file. To do that, you'll need to check
out a second copy of your repository. One way to do this is to
use a separate computer (e.g., your laptop, your home computer,
or a machine in the lab). Another is to make a temporary
directory, and check out a second copy of your repository there.
Please make sure that the second copy isn't inside the first, or
vice versa—Subversion will become very confused.
Let's call the two working copies Blue and Green. Do the
following:
a) Create Blue/ex01/planets.txt, and add the
following lines:
Mercury
Venus
Earth
Mars
Jupiter
Saturn
Commit the file.
b) Update the Green repository. (You should get a copy of
planets.txt.)
c) Change Blue/ex01/planets.txt so that it reads:
1. Mercury
2. Venus
3. Earth
4. Mars
5. Jupiter
6. Saturn
Commit the changes.
d) Edit Green/ex01/planets.txt so that its contents
are as shown below. Do not do svn update
before editing this file, as that will spoil the
exercise.
Mercury 0
Venus 0
Earth 1
Mars 2
Jupiter 16 (and counting)
Saturn 14 (and counting)
e) Now, in Green, do svn update. Subversion
should tell you that there are conflicts in planets.txt.
Resolve the conflicts so that the file contains:
1. Mercury 0
2. Venus 0
3. Earth 1
4. Mars 2
5. Jupiter 16
6. Saturn 14
Commit the changes.
f) Update the Blue repository, and check that
planets.txt now has the same content as it has in the
Green repository.
Exercise 3.5:
Add another line or two to course/ex01/bio.txt and
commit those changes. Then, use svn merge to restore
the original contents of your biography
(course/ex01/bio.txt), and commit the result. When you
are done, bio.txt should look the way it did at the end
of the first part of the previous exercise.) Note: the purpose
of this exercise is to teach you how to go back in time to get
old versions of files—while it would be simpler in this
case just to edit bio.txt, you can't (reliably) do that
when you've made larger changes, to multiple files, over a
longer period of time.
Shell Basics
Introduction
- Most modern tools have a graphical user interface (GUI)
- Because they're easier to use
- But command-line user interfaces (CLUIs) still have their place
- Easier (faster) to build new CLUI tools
- Building a GUI takes time
- Building a good GUI takes a lot of time
- Higher action-to-keystroke ratio
- Once you're over the (steeper) learning curve
- Easier to see and understand what the computer is doing on your behalf
- Which is part of what this course is about
- Most important: it's easier to combine CLUI tools than GUI tools
- Small tools, combined in many ways, can be very powerful
-
[Ray & Ray 2003]
is a good introduction for newcomers
- How to tell if you can skip this lecture
- Do you know what a shell is?
- Do you know the difference between an absolute path and a relative path?
- Do you know what a process is?
- Do you know what a pipe is?
- Do you know what
$PATH is? - Do you know what
rwxr-xr-x means?
- Comment on this slide
The Shell vs. the Operating System
- The most important command-line tool is the command shell (often just called “the shell”)
- Manages a user's interactions with the operating system by:
- Reading commands from the keyboard
- Figuring out what programs the user wants to run
- Running those programs
- Displaying their output on the screen
- Looks (and works) like an interactive terminal circa 1980
![[A Shell in Action]](img/shell01/shell_screenshot.png)
Figure 4.1: A Shell in Action |
- The shell is just one program among many
- Many different ones have been written
sh was the first for Unix- Most others extend its capabilities in various ways
- Which means that it's the lowest common denominator you can always rely on
- We'll use
bash (the Bourne again shell) in this course- Available just about everywhere
- Even on Windows (thanks to
Cygwin)
- In contrast, the operating system is not just another program
- Automatically loaded when the computer boots up
- The only program that can talk directly to the computer's hardware
- I.e., read characters from the keyboard, or send drawing commands to the screen
- Manages files and directories on the disk
- Keeps track of who you are, and what you're allowed to do
- You can run many instances of the shell on a computer at once, but it can only run one operating system at a time
![[Operating System and Shell]](img/shell01/os_shell.png)
Figure 4.2: Operating System and Shell |
- Comment on this slide
The File System
- The file system is the set of files and directories the computer can access
- “Everything that stays put when you turn the computer off and restart it”
- Data is stored in files
- By convention, files have two part names, like
notes.txt or home.html - Most operating systems allow you to associate a filename extension with an application
- E.g.,
.txt is associated with an editor, and .html with a web browser
- But this is all just convention: you can call files (almost) anything you want
- Files are stored in directories (often called folders)
- Directories can contain other directories, too
- Results in the familiar directory tree
![[A Directory Tree]](img/shell01/directory_tree.png)
Figure 4.3: A Directory Tree |
- Everything in a particular directory must have a unique name
- Otherwise, how would you identify it?
- But items in different directories can have the same name
- On Unix, the file system has a unique root directory called
/
- Every other directory is a child of it, or a child of a child, etc.
- On Windows, every drive has its own root directory
- So
C:\home\gvwilson\notes.txt is different from J:\home\gvwilson\notes.txt - When you're using Cygwin, you can also write
C:\home\gvwilson as c:/home/gvwilson - Or as
/cygdrive/c/home/gvwilson
- Some Unix programs give
":" a special meaning, so Cygwin needed a way to write paths without it…
- A path is a description of how to find something in a file system
- An absolute path describes a location from the root directory down
- Equivalent to a street address
- Always starts with
"/" - E.g.,
/home/gvwilson is my home directory, and /courses/swc/lec/shell.swc is this file
- A relative path describes how to find something from some other location
- Equivalent to saying, “Four blocks north, and seven east”
- E.g., from
/courses/swc, the relative path to this file is lec/shell.swc
- Every program (including the shell) has a current working directory
- “Where am I?”
- Relative paths are deciphered relative to this location
- It can change while a program is running
- Finally, two special names:
"." means “the current directory”".." means “the directory immediately above this one- Also called the parent directory
- In
/courses/swc/data, .. is /courses/swc - In
/courses/swc/data/elements, .. is /courses/swc/data ![[Parent Directories]](img/shell01/parent_directory.png)
Figure 4.4: Parent Directories |
- Comment on this slide
A Few Simple Commands
- Easiest way to learn basic Unix commands is to see them in action
- First, I type
pwd (short for "print working directory”) to find out where I am- Unfortunately, most Unix commands have equally cryptic names
- I then type
ls (for “listing”) to see what's in the current directoryls
LICENSE.txt admin data graphics lec pdf scraps util
Makefile cgi-bin etc img mp3 publ src web
- What actually happens when I type
ls is:
- The operating system reads characters from the keyboard
- Passes them to the shell (because it's the currently active window on my desktop)
- The shell breaks the line of text it receives into words
- Looks for a program with the same name as the first word (i.e., the command to run)
- Describe in a moment how the shell knows where to look
- Runs that program
- Reads the program's output and sends it back to the operating system for display
![[Running a Program]](img/shell01/shell_running_program.png)
Figure 4.5: Running a Program |
- I can tell
ls to produce more informative output by giving it some flags
- By convention, flags start with
"-", as in "-c" or "-l" - Show directories with trailing slash
ls -F
LICENSE.txt admin/ data/ exer/ lec/ publ/ soln/ util/
Makefile cgi-bin/ etc/ img/ pdf/ scraps/ src/ web/
- Show all files and directories, including those whose names begin with
.
ls -a
. .. .svn admin data
exer img lec publ soln
src util tmpl README.txt license.txt
todo.txt
- By default,
ls doesn't show anything whose name begins with . - Note the
.svn directory: this is where Subversion keeps administrative information - Do not edit anything in this directory yourself, or Subversion will become very confused
- Comment on this slide
Creating Files and Directories
- Rather than messing with the course files, let's create a temporary directory and play around in there
- Note: no output
- The
-v (“verbose”) flag tells mkdir to print a confirmation message
- Now go into that directory
- Changes the shell's notion of our current working directory
pwd
/home/gvwilson/swc/temp
- No files there yet:
- Use the editor of your choice to create a file called
earth.txt with the following contents:
Name: Earth
Period: 365.26 days
Inclination: 0.00
Eccentricity: 0.02
- Notepad (on Windows) runs in a window of its own
- Pico (on Unix) takes over the shell window temporarily
- We'll look at more advanced editing tools for programming in a few lectures
- Easiest way to create a similar file
venus.txt is to copy the one we havels -t
venus.txt earth.txt
- Note: the
-t option tells ls to list newest first
- Check the contents of the file using
cat (short for “concatenate”)
- Just prints the contents of a file to the screen
cat venus.txt
Name: Earth
Period: 365.26 days
Inclination: 0.00
Eccentricity: 0.02
- Edit the file so that it looks like this:
Name: Venus
Period: 224.70 days
Inclination: 3.39
Eccentricity: 0.01
- Compare the sizes of the two files using
wc (for “word count”)
wc earth.txt venus.txt
4 9 69 earth.txt
4 9 69 venus.txt
8 18 138 total
- Columns show lines, words, and characters
- Comment on this slide
Wildcards
- Some characters (called wildcards) mean special things to the shell
* matches zero or more characters- So
ls *.f77 lists all the Fortran-77 files in a directory
wc *.txt
4 9 69 earth.txt
4 9 69 venus.txt
8 18 138 total
? matches any single character- So
ls ??.txt lists all the text files with two-letter prefixes - And
ls ??.* lists all the files with two-letter prefixes, and any extension
~ on its own means “my home directory”
- I.e., the one I'm in when I first log in
~harry means “Harry's home directory”
- Note: the shell expands wildcards before running commands
- There's no way for
ls to know whether it was invoked as ls *.txt or rm earth.txt venus.txt
- Comment on this slide
Exercises
Exercise 4.1:
Suppose you are in your home directory, and ls shows
you this:
Makefile biography.txt data
enrolment.txt programs thesis
What argument(s) do you have to give to ls to get it
to put a trailing slash after the names of subdirectories, like
this:
Makefile biography.txt data/
enrolment.txt programs/ thesis/
If you run ls data, it shows:
earth.txt jupiter.txt mars.txt
mercury.txt saturn.txt venus.txt
What command should you run to get the following output:
data/earth.txt data/jupiter.txt data/mars.txt
data/mercury.txt data/saturn.txt data/venus.txt
What if you want this (note that an extra entry is being
displayed):
total 7
drwxr-xr-x 7 someone 0 May 6 08:27 .svn
-rw-r--r-- 1 someone 2396 May 6 08:38 earth.txt
-rw-r--r-- 1 someone 1263 May 6 08:38 jupiter.txt
-rw-r--r-- 1 someone 1015 May 6 08:43 mars.txt
-rw-r--r-- 1 someone 946 May 6 08:41 mercury.txt
-rw-r--r-- 1 someone 1714 May 6 08:40 saturn.txt
-rw-r--r-- 1 someone 881 May 6 08:40 venus.txt
Note: the command will display your user ID, rather than
someone. On some machines, the command will also display a
group ID. Ignore these differences for the purpose of this
question.
Exercise 4.2:
According to the listing of the data directory above, who
can read the file mercury.txt? Who can write it (i.e., change
its contents or delete it)? When was mercury.txt last
changed? What command would you run to allow everyone to edit or
delete the file?
Exercise 4.3:
Suppose you want to remove all files whose names (not including
their extensions) are of length 3, start with the letter a, and
have .txt as extension. What command would you use? For
example, if the directory contains three files a.txt,
abc.txt, and abcd.txt, the command should remove
abc.txt , but not the other two files.
Exercise 4.4:
What does the command cd ~ do? What about cd
~gvwilson?
Exercise 4.5:
What's the difference between the commands cd HOME
and cd $HOME?
Exercise 4.6:
Suppose you want to list the names of all the text files in the
data directory that contain the word "carpentry". What
command or commands could you use?
Exercise 4.7:
Suppose you have written a program called analyze. What
command or commands could you use to display the first ten lines of
its output? What would you use to display lines 50-100? To send
lines 50-100 to a file called tmp.txt?
Exercise 4.8:
The command ls data > tmp.txt writes a listing of
the data directory's contents into tmp.txt. Anything
that was in the file before the command was run is overwritten. What
command could you use to append the listing to tmp.txt
instead?
Exercise 4.9:
What command(s) would you use to find out how many
subdirectories there are in the lectures directory?
Exercise 4.10:
What does rm *.ch? What about rm
*.[ch]?
Exercise 4.11:
What command(s) could you use to find out how many instances of
a program are running on your computer at once? For example, if you
are on Windows, what would you do to find out how many instances of
svchost.exe are running? On Unix, what would you do to
find out how many instances of bash are running?
Exercise 4.12:
What do the commands pushd, popd,
and dirs do? Where do their names come from?
Exercise 4.13:
How would you send the file earth.txt to the
default printer? How would you check it made it (other than
wandering over to the printer and standing there)?
Exercise 4.14:
A colleague asks for your data files. How would you
archive them to send as one file? How could you compress them?
Exercise 4.15:
The instructor wants you to use a hitherto unknown command
for manipulating files. How would you get help on this command?
Exercise 4.16:
You have changed a text file on your home PC, and mailed
it to the university terminal. What steps can you take to see
what changes you may have made, compared with a master copy in
your home directory?
Exercise 4.17:
How would you change your password?
Exercise 4.18:
grep is one of the more useful tools in the
toolbox. It finds lines in files that match a pattern and
prints them out. For example, assume I have files
earth.txt and venus.txt containing lines like
this:
Name: Earth
Period: 365.26 days
Inclination: 0.00
Eccentricity: 0.02
If I type grep Period *.txt in that
directory, I get:
earth.txt:Period: 365.26 days
venus.txt:Period: 224.70 days
Search strings can use regular expressions, which will be
discussed in a later lecture.
grep takes many options as well; for example,
grep -c /bin/bash /etc/passwd reports how many lines
in /etc/passwd (the Unix password file) that contain the
string /bin/bash, which in turn tells me how many users
are using bash as their shell.
Suppose all you wanted was a list of the files that
contained lines matching a pattern, rather than the matches
themselves—what flag or flags would you give to
grep? What if you wanted the line numbers of
matching lines?
Exercise 4.19:
diff finds and displays the differences between
two files. It works best if both files are plain text (i.e.,
not images or Excel spreadsheets). By default, it shows the
differences in groups, like this:
3c3,4
< Inclination: 0.00
---
> Inclination: 0.00 degrees
> Satellites: 1
(The rather cryptic header "3c3,4" means that line 3
of the first file must be changed to get lines 3-4 of the
second.)
What flag(s) should you give diff to tell it to
ignore changes that just insert or delete blank lines? What if
you want to ignore changes in case (i.e., treat lowercase and
uppercase letters as the same)?
Exercise 4.20:
Suppose you wanted ls to sort its output by
filename extension, i.e., to list all .cmd files before
all .exe files, and all .exe's before all
.txt files. What command or commands would you
use?
More Shell
Redirecting Input and Output
- A running program is called a process
- By default, every process has three connections to the outside world:
- You can tell the shell to connect standard input and standard output to files instead
command < inputFile reads from inputFile instead of from the keyboard- Don't need to use this very often, because most Unix commands let you specify the input file (or files) as command-line arguments
command > outputFile writes to outputFile instead of to the screen- Only “normal” output goes to the file, not error messages
command < inputFile > outputFile does both![[Redirecting Standard Input and Output]](img/shell02/redirecting_stdio.png)
Figure 5.2: Redirecting Standard Input and Output |
- Example: save number of lines in all text files to
words.len
$ wc -w *.txt > words.len
- Nothing appears, because output is being sent to the file
words.len $ ls -t
words.len venus.txt earth.txt
$ cat words.len
4 9 69 earth.txt
4 9 69 venus.txt
3 12 62 words.len
11 30 200 total
- Try typing
cat > junk.txt
- No input file specified, so
cat reads from the keyboard - Output sent to a file
- Voila: the world's dumbest text editor
- When you're done, use
rm junk.txt to get rid of the file- Don't type
rm * unless you're really, really sure that's what you want to do…
- Comment on this slide
Pipes
- Suppose you want to use the output of one program as the input of another
- E.g., use
wc -w *.* to count the words in some files, then sort -n to sort numerically
- Option 1: send output of first command to a temporary file, then read from that file
wc *.txt > temp
sort -n < temp
- Option 2: use a pipe to connect the two programs
- Written as
"|" - Tells the operating system to send what the first program writes to its stdout to the second program's stdin
wc -w *.* | sort -n
9 earth.txt
9 venus.txt
12 words.len
30 total
![[Pipes]](img/shell02/pipes.png)
Figure 5.3: Pipes |
- More convenient (and much less error prone) than temporary files
- Can chain any number of commands together
- And combine with input and output redirection
wc *.txt | sort -n | head -5 > shortest.files
- Any program that reads from standard input and writes to standard output can use redirection and pipes
- Programs that do this are often called filters
- If you make your programs work like filters, you can easily combine them with others
- A combinatorial explosion of goodness
- Comment on this slide
Environment Variables
- Like any other program, the shell has variables
- Since they define a user's environment, they are usually called environment variables
- Type
set at the command prompt to get a listing:
$ set
ANT_HOME=C:/apache-ant-1.6.2
BASH=/usr/bin/bash
COLUMNS=80
COMPUTERNAME=ISHAD
HISTFILESIZE=500
- Get a particular variable's value by putting a
"$" in front of its name- E.g., the shell replaces
"$HOME" with the current user's home directory - Often use the
echo command to print this out$ echo $HOME
/home/gvwilson
- Question: why must you type
echo $HOME, and not just $HOME?
- To set or reset a variable's value temporarily, use this:
- Only affects the current shell (and programs run from it)
- To set a variable's value automatically when you log in, set it in
~/.bashrc
- Remember,
"~" is a shortcut meaning “your home directory”
- For me, right now,
~/.bashrc is /home/gvwilson/.bashrc
- Important environment variables
| Name | Typical Value | Notes |
|---|
HOME | /home/gvwilson | The current user's home directory |
HOMEDRIVE | C: | The current user's home drive (Windows only) |
HOSTNAME | "ishad" | This computer's name |
HOSTTYPE | "i686" | What kind of computer this is |
OS | "Windows_NT" | What operating system is running |
PATH | "/home/gvwilson/bin:/usr/local/bin:/usr/bin:/bin:/Python24/" | Where to look for programs |
PWD | /home/gvwilson/swc/lec | Present working directory (sometimes CWD, for current working directory) |
SHELL | /bin/bash | What shell is being run |
TEMP | /tmp | Where to store temporary files |
USER | "gvwilson" | The current user's ID |
Table 5.1: Important Environment Variables
- Comment on this slide
How the Shell Finds Programs
- The most important of these variables is
PATH
- The search path that tells the shell where to look for programs
- When you type a command like
tabulate, the shell:
- Splits
$PATH on colons to get a list of directories - Looks for the program in each directory, in left-to-right order
- Runs the first one that it finds
- Example
PATH is /home/gvwilson/bin:/usr/local/bin:/usr/bin:/bin:/Python24- Both
/usr/local/bin/tabulate and /home/gvwilson/bin/tabulate exist /home/gvwilson/bin/tabulate will be run when you type tabulate at the command prompt- Can run the other one by specifying the path, instead of just the command name
- Warning: it is common to include
. in your path- This allows you to run a program in the current directory just by typing
whatever, instead of ./whatever - But it also means you can never be quite sure what program a command will invoke
- Though you can use the command
which program_name, which will tell you
- Common entries in
PATH include:
/bin, /usr/bin: core tools like ls
- Note: the word “bin” comes from “binary”, which is geekspeak for “a compiled program”
/usr/local/bin: optional (but common) tools, like the gcc$HOME/bin: tools you have built for yourself- Remember,
$HOME means “the user's home directory” - So this is equivalent to
~/bin
- Cygwin does things a little differently
- Uses the notation
/cygdrive/c/somewhere instead of Windows' c:/somewhere
- The colon in
c:/somewhere would clash with the colons in the PATH variable
- By default, Cygwin treats
c:/cygwin as the root of its file system- So
/home/aturing is a synonym for c:/cygwin/home/aturing - Yes, it can be confusing, but remember: we're trying to run one operating system's tools on top of another
- Comment on this slide
Basic Tools
man | Documentation for commands. |
cat | Concatenate and display text files. |
cd | Change working directory. |
chmod | Change file and directory permissions. |
clear | Clear the screen. |
cp | Copy files and directories. |
date | Display the current date and time. |
diff | Show differences between two text files. |
echo | Print arguments. |
env | Show environment variables. |
head | Display the first few lines of a file. |
ls | List files and directories. |
mkdir | Make directories. |
more | Page through a text file. |
mv | Move (rename) files and directories. |
od | Display the bytes in a file. |
passwd | Change your password. |
pwd | Print current working directory. |
rm | Remove files. |
rmdir | Remove directories. |
sort | Sort lines. |
tail | Display the last few lines of a file. |
uniq | Remove duplicate lines. |
wc | Count lines, words, and characters in a file. |
which | locate a command |
Table 5.2: Basic Command-Line Tools
- Comment on this slide
Ownership and Permission: Unix
- On Unix, every user belongs to one or more groups
- The
groups command will show you which ones you are in
- Every file is owned by a particular user and a particular group
- Owner can assign different read, write, and execute permissions to user, group, and others
- Read: can look at contents, but not modify them
- Write: can modify contents
- Execute: can run the file (e.g., it's a program)
ls -l will show all of this information- (Along with the file's size and a few other things)
- Permissions displayed as three
rwx triples - “Missing” permissions shown by
"-" - Example:
rw-rw-r-- means “user and group can read and write; everyone else can read; no one can execute”
- Change permissions using
chmod
- Example:
chmod u+x something.exe gives the user execute permission to something.exe - Example:
chmod o-r notes.txt takes away the world's read permission for notes.txt
- Permissions mean something a little different for directories
- Execute permission means you can “go into” a directory, but does not mean you can read its contents
- So if a directory called
tools has permission rwx--x--x (i.e., owner can do anything, but everyone else only has execute permission), then:
- If someone other than the owner does
ls tools, permission is denied - But if there's a useful program called
tools/findanswers, other users can still run it
- Comment on this slide
Ownership and Permission: Windows
- Of course, it all works differently on Windows
- Not better or worse, just differently
- Windows XP uses access control lists (ACLs)
- Instead of describing users as “file owner, group member, or something else”, ACLs let you specify exactly what any particular user, or set of users, can do to a file, directory, device, etc.
- Older versions of Windows (such as Windows 95 and Windows 2000) are fundamentally insecure, and shouldn't be used
- Cygwin does its best to make the Windows model look like Unix's
- If you trip over the differences, please consult a system administrator
- Comment on this slide
More Advanced Tools
du | Print the disk space used by files and directories. |
find | Find files that match a pattern. |
grep | Print lines matching a pattern. |
gunzip | Uncompress a file. |
gzip | Compress a file. |
lpr | Send a file to a printer. |
lprm | Remove a print job from a printer's queue. |
lpq | Check the status of a printer's queue. |
ps | Display running processes. |
tar | Archive files. |
which | Find the path to a program. |
who | See who is logged in. |
xargs | Execute a command for each line of input. |
Table 5.3: Advanced Command-Line Tools
- Comment on this slide
Exercises
Exercise 5.1:
You're worried your data files can be read by your
nemesis, Dr. Evil. How would you check whether or not he can,
and if necessary change permissions so only you can read or
write the files?
Basic Scripting
Why Python?
- Two factors determine time to solution:
- How long it takes to write a program (human time)
- How long it takes that program to run (machine time)
- Different languages make different tradeoffs between programming speed and execution speed
- Programmers write the same number of lines of code per day no matter what language they're using, so high-level languages are good
- But the more abstract the language, the more work it is for the computer to figure out what you want it to do
![[Nimble vs. Sturdy Languages]](img/py01/nimble_vs_sturdy.png)
Figure 6.1: Nimble vs. Sturdy Languages |
Python is:
- Like the shell, only better
- Freely available for many platforms
- Widely used
- Well documented
- (Much) easier to read than Perl
- Material that took three days to teach in Perl took only two to teach in Python
- Follow-up surveys showed significantly higher retention rates
- (Much) slower than C/C++ or Fortran
- 10-100 times slower than compiled, optimized code
- But it's relatively easy to call C/C++ and Fortran libraries from Python
- Doesn't have all of MATLAB's numerical tools
- But its
Numeric package isn't bad - And it has a lot of things MATLAB doesn't
- This course isn't really about Python
- It's about solving simple problems with the least effort
- But we have to write the examples in something
- So we might as well choose something simple and useful
- For more information:
- Comment on this slide
Running Python Interactively
- Running a program in a sturdy language is a two-step process:
- Compiler translates source code into something that can run
- That “something” then runs
- Directly on the hardware (C, C++, Fortran)
- On a virtual machine (Java)
- Some combination of the two (C#)
- Nimble languages typically combine the compiler and the virtual machine
- A single program reads the user's code, translates it into something executable, and executes it right away
![[Sturdy vs. Nimble Execution]](img/py01/sturdy_vs_nimble_execution.png)
Figure 6.2: Sturdy vs. Nimble Execution |
- This means that most nimble languages can run interactively, like a shell
$ python
>>> 3 + 5
8
>>> x = 5 * 3 ** 2 # assign 5 times 3 squared to x
>>> print x
45
>>> print 'some' + 'thing' # concatenate strings
something
- Comment on this slide
Running Saved Programs
- Obviously don't have to retype program every time you want to run it
- Option 1: save program in a file with a
.py extension, and type python filename.py
- Python reads and executes the commands in the file exactly as if they'd been typed in interactively
- Option 2 (Unix only): make the following the first line of the
.py file- This tells Unix to look up a program called
python, and run it with the rest of the file as its input
- Option 3 (Windows only): associate
.py files with Python- Double-clicking on anything ending in
.py will then run it - Happens automatically when you run the Python Windows installer
- Example
- Using an editor, put the following in
powers.py print 2, 2**2, 2**3, 2**4
print 3, 3**2, 3**3, 3**4
print 4, 4**2, 4**3, 4**5
- Run using
python powers.py - Should see the following
2 4 8 16
3 9 27 81
4 16 64 1024
- Comment on this slide
Variables
- Variables are names for values
- No declarations: variables are created when something is assigned to them
![[Variables Refer to Values]](img/py01/vars_refer_to_values.png)
Figure 6.3: Variables Refer to Values |
- No types: a variable is just a name, and can refer to different types of values at different times
- Although your code will be easier to understand if you don't abuse this
- Must give a variable a value before using it
- Unlike some languages, Python doesn't try to guess a “sensible” value
print something
Traceback (most recent call last):
File "undefined.py", line 1, in ?
print something
NameError: name 'something' is not defined
- While variables don't have types, values do
- If you try to operate on incompatible values, Python will complain
x = 'two' # 'two' is a string
y = 2 # 2 is an integer
print x * y # multiplying a string concatenates it repeatedly
print x + y # but you can't add an integer and a string
twotwo
Traceback (most recent call last):
File "add_int_str.py", line 4, in ?
print x + y # but you can't add an integer and a string
TypeError: cannot concatenate 'str' and 'int' objects
- Comment on this slide
Printing and Quoting
- The
print statement prints zero or more values- Separated by spaces
- Automatically puts a newline at the end
- So
print on its own just prints a blank line
- Use either single or double quotes to create strings
- Must be consistent: if a string starts with one kind of quote, it must end with that kind of quote
- But different strings in the same program can use different kinds of quotes
- This was not necessarily a good design decision…
print 'a', "b", '"c"', "'d'"
a b "c" 'd'
- The built-in function
str converts things to stringsprint 'carbon-' + str(14)
carbon-14
- Use similar functions called
int, float, etc. to convert values to other types
- Use escape sequences to put special characters in strings
- Single quotes in a single-quoted string:
'This isn\'t unusual.' - Double quotes in a double-quoted string:
"She said, \"You can quote me on that!\"" - Tab and newline characters:
'\tIndented line\n'
- Comment on this slide
Numbers and Arithmetic
- The usual numeric types
14 is an integer (32 bits long on most machines)14.0 is floating point (double precision, i.e., 64 bits long)
- Two unusual numeric types
1+4j is a complex number (two 64-bit values)123456789L is a long integer- Arbitrary length: uses as much memory as it needs to
- Operations are several times slower
- Python automatically switches to long-integer mode when it needs to
- Python borrows C's numeric operators
| Name | Operator | Use | Value | Notes |
|---|
| Addition | + | 35 + 22 | 57 | |
| | 'Py' + 'thon' | 'Python' | |
| Subtraction | - | 35 - 22 | 13 | |
| Multiplication | * | 3 * 2 | 6 | |
| | 'Py' * 2 | 'PyPy' | 2 * 'Py' is illegal |
| Division | / | 3.0 / 2 | 1.5 | |
| | 3 / 2 | 1 | Integer division rounds down: -3 / 2 is -2, not -1 |
| Exponentiation | ** | 2 ** 0.5 | 1.4142135623730951 | |
| Remainder | % | 13 % 5 | 3 | |
Table 6.1: Numeric Operators in Python
- Python also supports C's in-place operators
x += 3 does the same thing as x = x + 35 += 3 is an error, since you can't assign a new value to 5…
- Comment on this slide
Booleans
True and False are true and false (d'oh)- Empty string, 0, and
None are equivalent to false- Just as 3 is equivalent to 3.0
- (Almost) everything else is true
- Combine Booleans using
and, or, not
and and or are short-circuit operators- Evaluate expressions left to right, and stop as soon as they know the answer
- Result is the last thing evaluated, rather than
True or False | Expression | Result | Notes |
|---|
True or False | True | |
True and False | False | |
'a' or 'b' | 'a' | or is true if either side is true, so it stops after evaluating 'a' |
'a' and 'b' | 'b' | and is only true if both sides are true, so it doesn't stop until it has evaluated 'b' |
0 or 'b' | 'b' | 0 is false, but 'b' is true |
0 and 'b' | 0 | Since 0 is false, Python can stop evaluating there |
0 and (1/0) | 0 | 1/0 would be an error, but Python never gets that far |
(x and 'set') or 'not set' | It depends | If x is true, this expression's value is 'set'; if x is false, it is 'not set' |
Table 6.2: Boolean Operators in Python
- Comment on this slide
Comparisons
- Python borrows C's comparison operators, too
- But allows you to chain comparisons together, just as in mathematics
| Expression | Value |
|---|
3 < 5 | True |
3.0 < 5 | True |
3 != 5 | True |
3 == 5 | False |
3 < 5 <= 7 | True |
3 < 5 >= 2 | True (but please don't write this—it's hard to read) |
3+2j < 5 | Error: can only use == and != on complex numbers |
Table 6.3: Comparison Operators in Python
- Note the difference between assignment and testing for equality
- Use a single equals sign
= for assignment - Use a double equals sign
== to test if two things have equal values
- String comparison may not do what you expect
- Characters are encoded as numbers: digits come before uppercase letters, all of which come before lowercase letters
- Punctuation is mixed in between, just to make matters difficult
- Strings are compared character by character from first to last until:
- One character is less than another
- One string runs out of characters
| Expression | Value |
|---|
'abc' < 'def' | True |
'abc' < 'Abc' | False |
'ab' < 'abc' | True |
'0' < '9' | True |
'100' < '2' | True |
Table 6.4: Python String Comparisons
- Comment on this slide
Conditionals
- Python uses
if, elif (not else if), and else - Use a colon and indentation to show nesting
a = 3
if a < 0:
print 'less'
elif a == 0:
print 'equal'
else:
print 'greater'
- Why indentation?
- Based on studies from the 1980s, it's what everyone looks at anyway
- Just count the number of warnings in C/Java books about misleading indentation
- Doesn't matter how much you use, but:
- Everything in the block must be indented the same amount
- Tabs are expanded so that they're equivalent to (up to) eight characters
- So don't ever indent with tabs, since your editor may interpret them differently
- Many editors understand Python indentation, and will help you get it right
- If you're fond of legacy character-oriented editors,
Emacs and Vim both work well - If you want a free integrated development environment (IDE), try
SPE or PyDev - If you'd prefer something with commercial support,
Komodo and WingIDE are my favorites
- Comment on this slide
While Loops, Break, and Continue
- Do something repeatedly as long as some condition is true
- Again, use colon and indentation to show nesting
a = 3
while a > 0:
print a
a -= 1
3
2
1
- Do the “something” zero times if the condition is false the first time it is tested
print 'before'
a = -1
while a > 0:
print a
a -=1
print 'after'
before
after
- If the condition is always true, the loop never ends
a = 3
while a > 0:
print a
# oops --- forgot to subtract one from a
3
3
3
- Can break out of the middle of a loop using
break
a = 30
while True:
print a
a -= 1
if (a % 5) == 0: # if a is divisible by 5
break
30
29
28
27
26
- Don't abuse this: put the test at the top of the loop unless there's a really good reason not to
- Can skip to the next iteration using
continue
a = 5
while a > 0:
print 'top of loop', a
a -= 1
if (a % 2) == 0:
continue
print '...bottom of loop', a
top of loop 5
top of loop 4
...bottom of loop 3
top of loop 3
top of loop 2
...bottom of loop 1
top of loop 1
- Again, don't abuse this by writing loops that are hard for other people to figure out
- If only because “other people” includes you three months from now
- Comment on this slide
Strings, Lists, and Files
Where We Just Were
- Python is a nimble language with:
- Numbers, strings, and Booleans
- Conditionals (
if, elif, and else) while loops (with break and continue)
- This lecture describes:
- Lists, for storing collections of values
- Functions, which reduce redundancy, and make programs easier to read
- Reading and writing files
- Comment on this slide
But First, Strings
- A string is an immutable sequence of characters
- Immutable means that it cannot be modified once it has been created
- I.e., you cannot change individual characters in place
str = 'abc'
print 'str is', str
str[0] = 'x'
print 'str is now', str
str is abc
Traceback (most recent call last):
File "immutable_err.py", line 3, in ?
str[0] = 'x'
TypeError: object does not support item assignment
- Though you can of course assign a new string value to a variable
str = 'abc'
print 'str is', str
str = 'xyz'
print 'str is now', str
str is abc
str is now xyz
- Sequence means that it can be indexed
- Indices start at 0 (as they do in C)…
- …so
text[0] is the first character of the string text - The built-in function
len returns the length of a string… - …so the index of the last character of
text is len(text)-1
- Note: there is no separate data type for characters
- A character is simply a string of length 1
element = "boron"
i = 0
while i < len(element):
print element[i]
i += 1
b
o
r
o
n
- Comment on this slide
Slicing, Bounds, and Negative Indices
text[start:end] takes a slice out of text
- Creates a new string containing the characters of
text from start up to (but not including) end val = "helium"
print val[1:3], val[:2], val[4:]
el he um
- Sometimes helps to think of indices as being between elements
![[Visualizing Indices]](img/py02/index_between.png)
Figure 7.1: Visualizing Indices |
- A few logical consequences:
text[1:2] is either the second character in text, or the empty string (if text doesn't have a second character)- So
text[1:1] is always the empty string- From index 1, up to but not including index 1
- And
text[2:1] is always the empty string
- Negative indices count backward from the end of the string
x[-1] is the last character- A lot easier to read than
x[len(x)-1]
x[-2] is the second-to-last character![[Visualizing Negative Indices]](img/py02/index_between_negative.png)
Figure 7.2: Visualizing Negative Indices |
val = "carbon"
print val[-2], val[-4], val[-6]
o r c
- Bounds checking rules:
- Python always does an out-of-bounds check when you index a single item
- But it truncates out-of-range indices when you take a slice
val = "helium"
print val[1:22]
x = val[22]
elium
Traceback (most recent call last):
File "bounds.py", line 3, in ?
x = val[22]
IndexError: string index out of range
- Comment on this slide
String Methods
- A method is a function that's tied to a particular object
- Invented to help programmers organize their code
- To call method
M of object X, type X.M() - We'll see how to create objects and methods of our own later
- Almost everything in Python has methods
- Numbers are the only important exception
- String methods
| Method | Purpose | Example | Result |
|---|
capitalize |