Imagine you are a florist and you can ship anywhere in the world. Why not advertise your flowers on the Internet? This is what Jennings Florists in Victoria, British Columbia is doing, with full-color pictures of its most popular gift baskets and flower arrangements. The only link you need to reach the Jennings Florists catalog on the World-Wide Web (see Figure 1-1 Jennings Florists Web Site is called a URL, or Uniform Resource Locator <http://www.islandnet.com/JenningsFlorists/>. Links such as this one appear throughout the book. They'll appear in italics after I've mentioned a particular site or resource. You'll learn how to read them later in this chapter.
Or suppose you love tennis, and you want to collect and make available tennis news, equipment tips, and player information and set up an Internet tennis specialty shop. This is exactly what Tenagra Corporation decided to do; their WWW Tennis Server includes articles and links to tennis information throughout the Internet. <http://www.tennisserver.com/>
Or suppose your company sells computer software or hardware, and your phone lines are constantly busy with help questions and requests for the latest updates or pricing schedules. Many top high-tech companies, including Sun Microsystems, IBM, Microsoft, and Novell, post technical notes, price lists, and even software upgrades on the Internet that their users can download immediately (see Figure 1-2 Sun Microsystems Web Site and Figure 1-3 IBM Web Site. Imagine the public relations benefit, if nothing else.
Or say you're a real estate broker, and you'd like to advertise your properties more widely. A company called Coolware has set up a site at which it is soliciting realtors and home owners to list real estate for sale. See Figure 1-4 Palo Alto Real Estate Web Site for the sample of listings it has posted for the City of Palo Alto, California. <http://none.coolware.com/real/realestate.html>
And the Encyclopædia Britannica is starting to provide the full text of all its volumes across the Internet (see Figure 1-5 Britannica Online Web Site for a fee. The company is marketing the service to colleges and universities but plans to make it inexpensive enough that individuals will subscribe. <http://www.eb.com/>
As you can see, it's a new world in publishing. Publishing is defined here as making information publicly available for personal, educational, organizational, or commercial purposes. No longer do you have to spend millions of dollars to reach millions of people. You don't even have to have a computer if you want to hire someone to publish on the Internet for you. And many services offer space for WWW pages for as little as $15 per month.
Publishing on the Internet simply means putting information on one computer where it can be seen by others on the Internet. You might put stories, articles, poems, pictures, or music on the Internet. You can sell what you publish or give it away. Although the commercial and marketing possibilities are creating enormous interest, the bulk of what is published today on the Internet is available for free. Companies are finding that traditional advertising techniques don't transfer well to the Internet; instead they are supplying detailed information and participating in technical forums in which their products are discussed.
Publishing on the Internet basically consists of making computer files available on one computer, usually called a server, and allowing others to view or download them via other computers, usually called clients. Finding and looking at these files always involves several steps:
Programmers and computer technical administrators have used this procedure for many years. Now Gopher, WWW, and WAIS make this process incredibly easy and much more efficient for everyone else. In fact, Gopher, WWW, and WAIS take very similar steps behind the scenes as you browse the Internet, blissfully oblivious to the details.
File Transfer Protocol, or FTP, is the original method of using the Internet to transfer files between different computer systems. FTP is available on many systems and allows you to reach out from one computer to another and retrieve files. FTP requires that you know the name of the computer to which you wish to connect and have a login ID and password for that computer. Friends or colleagues who have the passwords to each other's accounts can use FTP. Or you can set a computer up for anonymous use.
For years anonymous FTP was the method of choice for publishing on the Internet. Anonymous FTP (a term that's used as a noun and a verb on the Internet) does not require the client to have an ID for the publishing computer in order to connect and download files. Users login with the ID anonymous and usually are asked to give their e-mail address as a password. Even this is sometimes not required. Figure 1-6 is a sample FTP session that shows how complicated it can get.
Anonymous FTP allowed true publishing to occur on the Internet, because once you made your information available via anonymous FTP, anyone on the Internet could download it. This method proved so popular among the programmers and technical people (most Internet users in those days) that thousands of FTP sites (computers with files available for anonymous FTP downloads) sprang up over the years. A wealth of software, text files, and other information was scattered among thousands of computers. The problem was finding the site that had what you wanted and then locating the information in the thousands of files that might be stored on that same computer.
In 1990 Peter Deutsch and Alan Emtage, grad students at McGill University in Canada, came up with Archie, an interesting approach to solving this problem. (The name Archie actually derives from the word archives. When similar programs were created later, their authors used characters in the Archie comic books to name them. Veronica is actually an acronym for Very Easy Rodent-Oriented
Net-wide Index to Computer Archives, rodent being a reference to the Gopher servers for Veronica indexes. Jughead, another character, is short for Jonzy's Universal Gopher Hierarchy Excavation and Display.) Deutsch and Emtage set up one computer to connect automatically to a certain number of FTP servers every night and download their directory structures and indexes. They added these indexes to a database and then would allow anyone on the Internet to connect to their machine and run a search program. The search program allowed the user to search by file name and would return all the occurrences of a particular file name, complete with date, directory path, and FTP site address. The Archie program could also
e-mail the results to you. Clearly, this was a tremendous service to the Internet community and, as great ideas often do, it spread quickly. Refer to Figure 1-7 for a sample Archie search for the file list.com (an excellent DOS file lister from shareware author Vernon Buerg).
Although Archie allowed users to quickly find the FTP sites with the most recent copies of particular files, it wasn't especially user friendly. While connected to an FTP site, you could make your way through the various directories by typing in the appropriate change directory (or cd) command, but you couldn't view a file to determine whether it contained what you were looking for. For that, you first needed to use the FTP get command, download it, and then look at it on your own machine. <http://services.bunyip.com:8000/ products/archie/archie.html>
In 1991 the University of Minnesota made all this easier. In attempting to make information available campuswide, a small team of programmers led by Mark McCahill designed the Gopher protocol. This major innovation in Internet usability embodied several simple ideas:
This approach resulted in a menulike system that allows users either to see the contents of a file or to follow a link to other menus or files on another system. Thus was born the ability to "browse" the Internet, because Gopher allows you to read the text files you come across and, depending on your computer, to view pictures or hear sound embedded in the files you select. Gopher makes moving through directories and downloading programs much simpler than they are through FTP. See Figure 1-8 for a sample Gopher screen.
The success of Gopher's simple interface led to explosive growth in the number of Gopher servers in the United States and then around the world. And it led to extensions of the original Gopher protocol to allow for such things as electronic forms, abstracts, alternate data formats, or views, and the ability to store meta-information (or behind-the-scenes information) about the files (such as modification date, file size, language, and administrator of the file). These changes are incorporated in the Gopher+ protocol, which has increased the capabilities of the Gopher system. Unfortunately, not all Gopher client programs have been updated to work with Gopher+. See Chapter 2 for more information on Gopher and Gopher+.
Follow this link for a brief explanation of protocols.
The hundreds, then thousands, of Gopher servers springing up created the need to be able to find specific items among all the Gopher sites. Now the University of Nevada at Reno (UNR) got its chance to contribute to the quest for easy Net access. There, Steven Foster of UNR applied the Archie model to Gopherspace, the collection of all items in all Gopher servers in the world, and called it Veronica. That is, he set up a computer to connect to all the registered Gopher servers and directed it to follow their menus, collecting menu and file names as it went. The compilation was searchable, and what was especially nice was that the results showed up as one or more Gopher menus, which the user could follow directly to the files or Gopher servers themselves. As the obvious utility of this system became known, universities around the world began running Veronica servers, which soon forced each to introduce its offerings with an opening screen that allows the user to select a server and type of search to use. See Figure 1-9 for an example of the selection screen and Figure 1-10 for the results of a search on the word Gambia.
The dynamic nature of this database meant that you could do a Veronica search one day and find three matches and then the next day do the same search and find 10 or 12 more, depending on what had been added. Suddenly, Gopher became much more powerful--instead of working your way through menu after menu, you could connect to a Veronica server and search all the menus and files with a specific word or phrase that might appear in their titles. Searching their contents was another matter. For that you needed something like WAIS.
WAIS (pronounced ways), started in 1988, was an experimental project designed to come up with easier ways to search Internet files for content. The team was headed by Brewster Kahle, then of Thinking Machines Corporation (a producer of parallel-processing computers), in cooperation with Apple Computer, Dow Jones, and KPMG Peat Marwick. The goal was to create a way to easily search large amounts of text, images, or other files scattered among different computers.
Based on the ANSI standard search-and-retrieval protocol Z39.50, then under development, WAIS allows users to type in a question in their natural language--it does not force the user to learn and use a particular computer language or syntax. The WAIS protocol then does the dirty work of translating the query into a WAIS computer language query format and sends it off to various WAIS servers on the Internet. The servers in turn search their full-text indexes and return to the user a list of hits, or matches, ranked by how well they match the original query. When the user selects a hit, the WAIS server retrieves the document that matches.
The companies that had formed the development team made the details and program source code for WAIS publicly available because they realized that WAIS would be much more useful--and have a shot at becoming an industry standard--if it were commonly accepted and used by others across the Internet.
Although WAIS as a front end, or client, hasn't caught on nearly as well as Gopher or WWW, it has become the search mechanism of choice, running in the background of other front-end systems.
The World-Wide Web was invented by Tim Berners-Lee in 1989 in an attempt to efficiently store research data at CERN, the European Particle Physics Laboratory in Geneva, Switzerland. Berners-Lee, a consultant with a background in text-processing software development, wanted a system that would make it easy for various researchers to build up separate bodies of information and then link them electronically by matching the real links in the information (such as going from a file about horses to more specific files about thoroughbreds, quarter horses, or Olympic three-day eventing). He based the system on the concept of hypertext, or text with links that can be followed electronically to other documents, files, sounds, images, or even programs. The WWW system allows hypertext to link files on different computer systems. The main advantages, or power, of hypertext lie in its ability to link diff erent pieces of information in simple ways, at exactly the spot at which you thought of the connection. In a way, hypertext links are like footnotes, except that they are easier to follow and can be of any length. For example, if this page were hypertext, it could have links to the history of hypertext, examples of hypertext, and even a video of someone discussing hypertext. This can be distracting if you want to just read from beginning to end with no detours. But the beauty of hypertext is that no one forces you to follow all the links. You follow what interests you and ignore the rest.
The World-Wide Web system is known by various names. WWW, W3, and Web are intuitive, but because it uses the HyperText Transfer Protocol (HTTP) and HyperText Markup Language (HTML), the servers are technically called HTTP servers. Although we'll refer to them as Web or WWW servers, you will often see HTTP servers used in technical documentation.
One important contribution of WWW was the Uniform Resource Locator (URL). This address system allows you to declare the name and port number (a port is like a doorway, or loading dock, of a computer) of the host computer, the protocol (type of connection, such as FTP, Gopher, and so on), and the directory path and file name, all on one line. URLs have become a convenient method of passing a link on to some data source on the Internet. Net users made it common practice to include a URL for their home page (a personal spot on the Web) at the end of their e-mail messages. In fact, the URL system has been incorporated in the Gopher server software put out by the University of Minnesota.
The other main innovation of WWW was HTML, or HyperText Markup Language. Briefly (Chapter 4 goes into more depth), HTML is a relatively simple set of codes that turns ordinary text into hypertext when viewed by a WWW browser. The beauty of the system is that it need not be viewed on the same system that created it. In other words, an HTML file created on a Macintosh will look pretty much the same when viewed on a PC or a UNIX workstation. And what is more important, all the links function in exactly the same way. The ability to move a single file between different types of computer systems and have it work the same way in all of them is called portability, an essential ingredient when you want the whole world to use something.
Although WWW was seeing some success--the number of servers was increasing steadily--the creation of a WWW browser called Mosaic by the National Center for Supercomputing Applications (NCSA) in Champaign-Urbana, Illinois, led to an explosion of interest in WWW. This program, which was being given away free, was created first for X-Windows on UNIX and soon after for Macintosh and PCs running Microsoft Windows. Often called the killer application of the Internet, Mosaic made the Internet sexy by adding support for graphics. Gopher was nice and easy to use, but Mosaic was fun and much more impressive in what it could display, mostly because publishers could interweave text and images in documents to create the equivalent of a glossy brochure distributed on the Internet. Among the more popular early demonstrations of WWW was a tour of the Krannert Art Museum at the University of Illinois. Gopher could provide the same information but as menus, and the user could look at one element at a time but not the whole thing. That is, with Gopher you keep coming back to a menu from which you choose a text file, an image file, or a sound file. With WWW you can look at text and images together--perhaps several images side by side and surrounded by text.
This difference has a big effect on the visual effects of a document browsed on the Internet. Imagine being able to use buttons or links from the colorful WWW page that activate sound files or run movie files that start as soon as you select that button. This capability wows the user but puts a different load on the network and requires more sophisticated equipment. Gopher was designed for use by slow, relatively old computers, with no large drain on network resources. But WWW and Mosaic were designed in high-intensity computing environments, in which all participants had fast Internet connections and sophisticated workstations (although they will work on a 386 PC running Windows with a 14,400-baud modem).
When you start publishing on the Internet, it is good to remember that many people who want access to your information may in fact have slow Internet connections and not very powerful computers. If your company's home page takes 10 minutes to download and can only be appreciated on a top-of-the-line workstation, you may not get the audience you want. The reality of WWW is that graphic presentations are more attractive and exciting than plain text, but people won't see them if it takes five minutes or more for those images to appear on their screens. Your images won't slow down text-only Web browsers like Lynx (because Lynx simply ignores them), but those using text-mode browsers won't see and enjoy
your images.
Internet publishing consists of posting or putting material on the Internet where it can be viewed or downloaded from other computers. This process has evolved both in the user interface and in the tools that have been built to enable people to find them.
We've covered a lot of ground in this chapter, so let's review the terms--you're going to need them to understand the rest of this book:
Three of the most popular publishing protocols are Gopher, World-Wide Web (WWW), and Wide Area Information Server (WAIS). They are three different protocols, or techniques for doing basically the same thing--locating a computer that has something you want, connecting to it, finding the file you're interested in, and then downloading or viewing it. Each consists of computers that act as servers, which wait for requests from other computers (clients), and then send the files you request back over the Internet. Internet publishers build servers. Internet browsers and surfers use clients.