| :-D* Person
laughing so hard that he or she does not notice that a 5-legged
spider is hanging from his or her lip. -- one of Dave Barry's emoticons. |
Unix.
The world's first computer virus.
Title of Chapter 1 of
The Unix Haters Handbook,
ISBN: 1-56884-203-1 |
Table of Contents
Executive Summary
Web surfing has become a common daily experience. These notes
and the lab experiments help surf the Web more
productively. We also explain how a home PC can be set up to
access WSU resources on the Web.
Educational Objectives
After studying the lecture materials, and performing these
experiments, students should be able to:
- Surf and search the Web more productively
- Understand what a URL is
- Understand the role of search engines
- Be aware of privacy and security issues on the Web
- Login to Unix
- Send and receive e-mail
- Set up a Windows based PC to connect to WSU
Suggested Preparation
Prior to performing the experiments of this Lab, visit the
sites listed in Appendix B: Further Reading Links
and explore.
Background Information
Please refer to the Appendix A if you would
like to find out what an acronym used below stands for.
Web and Internet are two different things. The Internet is the
world-wide collection of network-connected computers. The World Wide Web
is a collection of interlinked documents that work together using
protocols such as HTTP running on computers connected to the Internet.
Browsing
The Web uses a metaphor of individual pages, usually combined to make up
sites. Web pages are written in HTML (Hyper-Text Markup Language), which
tells the Web browser how to display the page and its elements. The
enabling feature of the Web is its ability to connect to one another
text pages, as well as to audio, video, and image files with hyperlinks.
Most Web browsers allow the user to specify a URL and connect to that
document or service. When selecting hypertext in an HTML document, the
user is actually sending a request to open a URL. In this way,
hyperlinks can be made not only to other texts and media, but also to
other network services. Web browsers are not simply Web clients, but are
also full-featured FTP, Gopher, and telnet clients.
URLs
The following are examples of URLs (Uniform Resource Locators).
The first part of the URL (before the two slashes) specifies the
method of access. The second is typically the address of the computer
the data or service is located. Further parts may specify the names of
files, the port to connect to, or the text to search for in a database.
A URL is always a single unbroken line with no spaces. Here are some
more examples of URLs (but are set to be inactive).
- file://www.eff.org/directory/ - Displays a directory's
contents.
- http://www.hcc.hawaii.edu/directory/book.html - Connects
to an HTTP server and retrieves an HTML file.
- ftp://www.xerox.com/pub/file.txt - Opens an FTP
connection to Xerox and downloads a text file.
- gopher://www.hcc.hawaii.edu - Connects to the Gopher at
Univeristy of Hawaii.
- telnet://www.hcc.hawaii.edu:23 - Telnets to
www.hcc.hawaii.edu at port 23.
- http://mc5.go2net.com/crawler?general=what+is+a+router\&
method=2\®ion=0\&rpp=20\&timeout=5\&hpe=10\&
format=regular\&sort=0
This complicated URL is the result of a search request made to
MetaCrawler.
- news:wright.egr.199 - Invokes a news reader
program. Reads the latest Usenet news by connecting to a
user-specified news server and lets you read the articles in the wright.egr.199
newsgroup in hypermedia format.
Search Engines
There are three primary types of search sites on the Web: search
engines, Web directories, and parallel and metasearch sites.
Search engines such as Excite and HotBot use automated software
called Web crawlers or spiders. These programs travel from Web site to
Web site, logging each site's title, URL, and some of its text content.
The crawlers reach millions of Web sites each day to stay as current
with them as possible. The result is a long list of Web sites placed in
a database, which users search by typing in a keyword or phrase.
Web directories such as Yahoo offer an editorially selected,
topically organized list of Web sites. These sites employ human editors
to examine new Web sites and work with programmers to categorize them
and build their links into the site's index.
Since both approaches make sense, all the major search engine sites
now have built-in topical search indexes, and most Web directories have
added a keyword search.
Parallel and metasearch sites ride piggyback on the crawler sites.
Parallel search programs, such as WebFerret, launch simultaneous
searches on all the popular search engine sites, returning all the
results in a single window. Metasearch sites go a step further. One of
the problems with searching on the Web is that the searching vocabulary
varies from search site to search site. For example, when you search for
"Cretaceous Mongolia" on Yahoo, the search term should look
just like that. But the same search performed at Infoseek would be more
effective if you entered Cretaceous +Mongolia; at Galaxy, it
should be Cretaceous AND Mongolia. Metasearch sites, such as metacrawler.com
and metasearch.com, take care of this for you. They let you
enter a term in a single field and then automatically account for all
the particulars for half a dozen or more popular search sites.
Newsgroups
Newsgroups are electronic discussion forums. In spite of their name,
there is hardly any ``news'' in them. The messages are opinions, facts,
and debates for people with shared interests. The Usenet is the world's
largest collection of public newsgroups. The newsgroups go by a complex
set of abbreviated names, with the first set of letters of a newsgroup's
name indicating its primary subject, such as rec (recreation), soc
(society), or comp (computers).
For example, wright.egr.199 is a local group for the
students and faculty of EGR199. The news server for WSU news is news.wright.edu.
The messages in newsgroups are stored on news servers owned by ISPs,
universities, companies all over the world. Most news servers keep only
the more recent posts; they'd soon run out of storage space
otherwise. There are also such things as local and private
newsgroups. A discussion group created on a corporate intranet is an
example of a private newsgroup. Most ISPs offer a handful of local
newsgroups where they make tech support announcements that no one but
their customers would want to see.
Your browser has a companion program, such as Netscape's Collabra or
Microsoft's Outlook Express, that will help you read the newsgroups.
Your newsreader lets you check newsgroups the way your browser lets you
surf Web sites. The messages are presented in a list, known as a
thread, that shows the original message, the responses to the message,
and the responses to the responses, so that you can follow an entire
discussion or just the parts you're interested in.
Privacy and Security
There are two types of trouble on the Net: threats to security and
threats to privacy. Potential security bogies include viruses contained
in file downloads, rogue ActiveX controls that can crash your computer,
malicious email attachments, and a host of other weaknesses in your
TCP/IP software.
Despite all the press reports, the odds are against your becoming the
random victim of a hacker. You're much more likely to run into a virus.
Avoid opening email attachments from people you don't know, and use good
judgment about paying with credit cards on the Web.
Threats to your privacy are more subtle.
Whenever you enter your name, address, and phone number in a form on
the Web, that information could be going to people you don't know, so
think twice before revealing personal info.
Make your email safe from prying eyes by using an encryption program
such as Pretty Good Privacy (PGP). Encryption software translates your
message into a secret code so that it can be read only by the person who
has the correct decryption key--that is, the person you're sending it
to.
Many Web sites are actually programmed to harvest information about
any visitor who comes through. They send small files called ``cookies''
to your hard drive. The cookie is read by the Web browser, at the
request of the server of the Web site, when you revisit the site, and
figure what you were doing last time. Among other things, cookies permit
Web sites to track your name, your email address, your ISP's name, the
last site you visited, your operating system, and your browser's
specific make and version number. Of course, they can also help you out
by storing passwords so that you can get into subscription-only sites
without having to type the password every time.
Information collected online can be used in more sinister ways, such
as stealing your identity, or sending you obscene emails.
We are assuming that your home computer is either a PC running
Windows 9x/NT, or a Mac. For home PCs running Linux, or for other
computers, contact your instructor.
Set up your dial up networking to use WSU as your ISP. WSU
maintains a bank of high speed modems at the phone number (937)
879-8720. For full instructions, read the WSU PPP document.
Install all the components of a comprehensive web software package
such as the Internet Explorer 5.0 or Netscape Communicator 4.6x.
Both are freely downloadable, but are in the 40 MB range. If you
explore the AOL, Prodigy, ... free CDs, you will find the browser
packages, and other useful utilities, in one of the subdirectories.
E-mail has been the most popular use of the Internet. In the U.S., more
e-mail is now exchanged than postal mail. Because e-mail
"letters" travel at electronic speed, people sometimes refer
to the traditional postal system as "snail mail."
E-Mail Terminology
- E-mail is short for "electronic mail."
- Header is a few lines of text about who sent the message,
when it was sent, the title of the message (in its subject line),
who else received the message, and how it got to your mailbox.
- Recipient is the person to whom you're sending the message.
- Subject line is a brief description of what the message is
about.
- Body The body is the content of the letter, generally text
but can be a graphic.
- cc field, a copy of the message -- a "carbon
copy" -- is sent to the ones named in the cc field, whose names
will appear in the received message's header.
- bcc field, a copy of the message -- a "blind carbon
copy" -- is sent to the names listed in the bcc field. Unlike a
standard carbon copy, however, the recipient's name will not appear
in the received message's header. This allows you to send a copy of
a message to someone without letting its other recipients know.
- Signature A standardized message closing that you can
design.
- HTML e-mail allows you to create messages with stylized
text and pictures so that your note looks like a Web page. The
recipient should have a mailer that can display such mail though.
- Actions (sometimes called Rules or Scripts) are specified
operations that are automatically performed by the mail program upon
received mail.
- Address book (sometimes called Contacts List) stores the
e-mail addresses.
- Importing addresses If you've previously used another
e-mail program, you can import your old addresses into your new
program. First, export the addresses from your old program. Then,
import them into the new program.
- Download Folder is a directory for the storage of received
attachments, allowing you to find the attachments and to scan the
selected location for viruses.
- Compression shrinks attachments to make them quicker to
send. Enable it.
- Encoding converts attachments into a sendable format.
Consult a local guru before you select a type. \item Helper
applications are other programs that can handle attachments.
- ``Emoticons'' are symbols at the end of sentences that
underscore the sentence's meaning. Examples: happy :-) sad
:-( wink ;-).
- Abbreviations provide a shorthand for commonly used phrase
and expressions.
- ASCII art is an illustration comprised entirely of letters
and typographic symbols.
- > The greater-than symbol indicates that the text that
follows has been copied, or "quoted," from another
message.
- Attachment A message may arrive with an accompanying file
called an attachment -- a word processing document or a picture, for
example.
E-mail on Unix: pine
Unix has many many mail programs. One of the easiest, and archaic, to
use is pine. Run it by just typing its name: pine
at the shell prompt. It will then present you with a help screen the
first time you use it. For a detailed introduction to pine, click
here.
E-mail Programs on your PC or Mac
If you are on a PC or Mac connected to the Internet either via a modem
or a LAN, you need to provide your e-mail program (such as Outlook
Express, Netscape Messenger, or pine) the following information.
Your e-mail address identifies your unique address on the Internet
for receiving e-mail. At WSU, it's in the form of: yourLastName.2@wright.edu.
Your Post Office Protocol (POP3) account name is generally the name of
your Unix (e-mail) account. Your POP server name, at WSU, is mailhost.wright.edu.
Simple Mail Transfer Protocol (SMTP) server handles the mail you send to
other people. At WSU, it is mailhost.wright.edu. In
general, POP server and SMTP servers may have different names.
Once you have this information set up, you don't need to change it
again.
Receive Messages
To receive your e-mail, you must be currently connected to the Internet.
Some e-mail programs automatically attempt to retrieve your messages
when the program is launched. If you're offline, the program can attempt
to connect your computer to the Internet. If your e-mail program is
already active, it will provide you a menu item or button that activates
the receiving of e-mail. Most e-mail programs provide icons or windows
for an Inbox for incoming (and hence unread) mail, and an Outbox for
mail messages that are ready to be sent.
Reply to, Write and Send a Message
In composing a message, use standard capitalization practices. A message
in ALL UPPER CASE LIKE THIS is considered "shouting" and is
regarded as a sign of inexperience. You should also feel free to
incorporate "emoticons" and e-mail abbreviations. You should
spell-check. All e-mail programs offer a Reply feature as a menu item or
button that you select while reading the message to which you want to
reply. When you do this, the program will create a new message addressed
to the original sender, and can copy the original message -- or any
currently selected text. You can then type in your responses wherever
you wish and send the message. When you've written a reply or a new
message, you can queue it for sending at a later time, or send it
immediately. When you do this, the program will store the message in its
Outbox. To actually send a message you must be connected to the
Internet. If you attempt to send an e-mail message when offline, many
e-mail program will automatically attempt to connect you to the
Internet.
 |
Unix operating system has been in use at universities for
three decades that it is a culture. It has matured nicely, and
even so is extremely agile compared to Windoze. Because it
typically ran, until recently, on expensive machines, the number
of Unix installations was small compared to Windows. In 1998, a
version of Unix, called Linux, represented 17 percent of
new-license shipments of server operating systems.
|
| bash |
Bourne-Again Shell |
| cat |
concatenates files |
| cd |
changes directories |
| chmod |
changes the permission on a file |
| cmp |
compares two files |
| cp |
copies files |
| date |
returns the date and time |
| diff |
display line-by-line differences between two text
files |
| echo |
echoes arguments to stdout |
| emacs |
the all-powerful text/binary editor; try xemacs
also |
| env |
lists the current environment variables |
| find |
finds a file |
| grep |
searches for a pattern within a file; see also
fgrep |
| kill |
stops a running process |
| ln |
creates a link between two files |
| ls |
lists the files in a directory; try ls -lisa |
| lynx |
WWW/News/Mail browser; try lynx news:wright.egr.199 |
| man |
show reference manual pages; try man -k |
| mkdir |
makes directory |
| more |
displays a data file to the screen |
| mv |
used to move or rename files |
| yppasswd |
changes your password |
| ps |
Lists the current processes running |
| pwd |
displays the name of the working directory |
| rm |
removes files |
| rmdir |
removes directories |
| script |
Makes a transcript of terminal session |
| set |
lists all the variables in the current shell |
| sort |
sorts files |
| spell |
checks for spelling errors in a file |
| tail |
displays the end of a file |
| tar |
copies all specified files into one |
| umask |
specify a new creation mask |
| vi |
screen-oriented (visual) display editor |
| wc |
word count, also line and char count |
| who |
info on other people online |
| w |
who is on the system, and what they are doing |
You can easily spend a whole day visiting the sites mentioned here!
So, watch out! But do visit as many of the sites as you can. Edit a file
called webExplore.txt to journal your visits to them, and to
include your thoughts. Include at the top and also at the bottom your
full name and SSN. Insert your answer to Items 1, 2, and 3 below. Insert
any feedback you wish to give us. Send the file to your TA via email,
with EGR199-webExplore as the subject.
This lab also aims to expose you to Unix machines. In most
universities, science and engineering education is carried out on Unix.
Currently, Unix machines are the backbone of the Internet.
Web Surfing
Visit the following sites and write a one-para summary of what is
available there. Use the browsers Netscape, Internet Explorer and the
the ascii-text based browser lynx at least once. Edit a
file, on the PC, called webExplore.txt to journal your visits
to them, and to include your thoughts. Include at the top and also at
the bottom your full name and SSN.
- http://www.cats.wright.edu/
- http://www.w3.org
- http://www.whatis.com
- http://wdvl.internet.com/Vlib/
- http://www.freebiedirectory.com/
- http://internet.com/
Search the Web
Prepare your answers to the items below as a text file called webSearch.txt
on the PC. Use any editor that you are comfortable with.
- How many high schools are there in Ohio?
- Collect at least five URLs of online computer tutorials that you
might use to further your self-study of computers.
- Discover at least two names of browsers not developed by Netscape
or Microsoft.
- Find out what ``dog-food'' means. Developers at Microsoft are
quoted as saying, "We have to dog-food this architecture before
we release it." At Rational, a software development company, a
developer is quoted as saying "We have to dog-food this
puppy."
- What is VoxML?
- Find a site that is offering a free personal web page. Write a
speculation as to why they are doing that.
Get Started with Unix
- Open a telnet session from your PC to paladin.
Log into your account. Read all of the documentation that comes up
on the screen. Change your password using the yppasswd
command.
- The commands script, ls, more, mv and man were
explained to you. Start a script of your session. Copy all the
files, including subdirectories, from the source directory /public/pmateti/EGR199/
to your home directory. Recursively list all files in your
directory, including hidden "dot"-files. End the script
session. Rename the script file as script0.txt.
- Learn to use as many of the commands listed, on the Short List of
Unix Commands page, as you can. Edit and save the file script0.txt
so that it now has at the top and also at the bottom your full name
and SSN. Just before these two last lines, insert a paragraph
describing which of the commands you have learned and used today,
and any feedback you wish to give us.
E-Mailing
See the Achievement Test section
below.
See the Achievement Test section
below.
| FTP |
File Transfer Protocol |
| HTML |
Hyper Text Markup Language |
| HTTP |
Hypertext Transfer Protocol |
| IP |
Internet Protocol |
| ISP |
Internet Service Provider |
| LAN |
Local Area Network |
| NNTP |
Network News Transfer Protocol |
| PGP |
Pretty Good Privacy |
| POP |
Post Office Protocol |
| PPP |
Point to Point Protocol |
| SMTP |
Simple Mail Transfer Protocol |
| URL |
Uniform Resource
Locator |
| WWW |
World Wide Web |
| X11 |
GUI server that originated in Unix, now on Windows also |
| A few acronyms and their expansions are collected in
the table here. If you are curious about an acronym or term not
listed, type it in the input box below, and then press
the button to look it up in the TechEncyclopedia. |
You may be searching, bur are you finding?
- Boolean search: A search allowing the inclusion or
exclusion of documents containing certain words through the use of
operators such as and, not and or.
- Concept search: A search for documents related conceptually
to a word, rather than specifically containing the word itself.
- Full-text index: An index containing every word of every
document cataloged, including stop words.
- Fuzzy search: A search that will find matches even when
words are only partially spelled or misspelled.
- Index: The searchable catalog of documents created by
search engine software. Also called "catalog."
- Keyword search: A search for documents containing one or
more words that are specified by a user.
- Precision: The degree in which a search engine lists
documents matching a query. The more matching documents that are
listed, the higher the precision. For example, if a search engine
lists 80 documents found to match a query but only 20 of them
contain the search words, then the precision would be 25%.
- Proximity search: A search where users to specify that
documents returned should have the words near each other.
- Query-By-Example: A search where a user instructs an engine
to find more documents that are similar to a particular document.
Also called "find similar."
- Recall: Related to precision, this is the degree to which a
search engine is returning all the matching documents in a
collection. There may be 100 matching documents, but a search engine
may only find 80 of them. It then has a recall of 80%.
- Relevancy: How well a document provides the information a
user is looking for, as measured by the user.
- Search Engine: The software that searches an index and
returns matches. Search engine is often used synonymously with
spider and index, although these are separate components that work
with the engine.
- Spider: The software that scans documents and adds them to
an index by following links. Spider is often used as a synonym for
search engine.
- Stemming: The ability for a search to include the
"stem" of words. For example, stemming allows a user to
enter "swimming" and get back results also for the stem
word "swim."
- Stop words are conjunctions, prepositions, articles (such
as "and", "to" and "a") and other
words that appear often in documents yet alone may contain
little meaning.
Achievement Test
- Send the file script0.txt to your lab TA via email using pine,
with EGR199-UnixLab as the subject.
- Send the files webExplore.txt and webSearch.txt
to your lab TA via email as attachments, with EGR199-webIntro
as the subject. Insert any feedback you wish to give us as the body.
Send comments to pmateti@cs.wright.edu
09/20/99