Internet Publishing Handbook - Copyright © 1995 by Mike Franks

CHAPTER 2: Groundwork

This chapter discusses the groundwork you'll need to do and some of the steps involved in publishing on the Internet. Take a look at the list in the sidebar on page 20, and then we'll go through the steps one by one. Later, in Chapters 3, 4, and 5 on Gopher, WWW, and WAIS, we'll go through these steps in detail for each type of Internet server and use examples. You need not go through all these steps now, and they are not meant to scare you off or make the process seem intimidating. Instead, they are meant as a guideline or sort of check list for you to use to keep track of all the various elements involved.

Joining the Internet Community

Think of the Internet as a place to which you are moving and that you want to fit into. It is a community with its own set of traditions and values. And as in any community (in this case a tremendously large community), smaller groups gather to discuss different issues. Your first job is to join the Internet community in the sense of learning how to make your way around and how to fit in. On the Internet you'll find a community of individuals, companies, and organizations that are discussing issues, modifying software, and creating new tools and techniques for publishing on the Internet, as well as publishing the most amazing variety of information. Keeping abreast of their activities through Usenet newsgroups and e-mail mailing lists and tracking the relevant subjects through Gopher and WWW indexes are essential parts of your publishing process because, like it or not, you are joining a community, and you need to be aware of what's going on out there.

The first step is to start getting to know the people, organizations, and issues that shape what you do on the Internet. You can simply watch and read without posting messages, if you want. Lurking, the Net term for eavesdropping, is an accepted and common practice on the Internet. When you feel more comfortable, you can start to offer your experiences, opinions, and questions.

Now we'll go through the kinds of resources you'll come across and explain them in a little more detail. The resources mentioned here have to do with the Internet or Internet publishing in general. Later on, in Chapters 3, 4, and 5 on Gopher, WWW, and WAIS, you'll find tables of resources for each kind of server. Chapters 7 and 9, on Internet commerce and copyright issues, also offer resource tables.

Usenet Newsgroups

Usenet News is an extremely large, international, cooperatively run system for exchanging messages. Usenet News is not the same as newspaper news, although there's a company called Clarinet <http://www.clarinet.com> that provides Associated Press and Reuters newsfeeds through Usenet News for a fee. Most of Usenet News consists of messages from individuals, which are like e-mail, except that they are divided by topics. Usenet News is organized into thousands of hierarchically arranged newsgroups, each of which focuses on a specific subject. Anyone can send and receive these messages. Depending on your site, you may get a full newsfeed or only partial information--certain newsgroups might not be carried. Disk space often is the limiting factor because a full Usenet newsfeed, consisting of all messages from all newsgroups all over the Internet, generates more than 5,400 megabytes per month.

A newsgroup is basically a subject area that is created so that those who are interested in that subject may exchange messages. They are hierarchically arranged, starting from some major categories:

Other categories are often specific to a country or a university. And by no means are all newsgroups in English. New newsgroups are created by vote, except in the alt newsgroups, where the process is less formal.

News-reading programs exist for almost any computer platform conceivable--from mainframes to plain vanilla DOS PCs. Usually, the first step is to verify that you have access to a newsfeed. Then download or buy news-reading software and point it at the Usenet News server you'll be using. You'll soon see a long list of newsgroups. (I saw more than 8,000 newsgroups when I did this last at the University of California at Los Angeles, but UCLA has a Clarinet feed, which adds several thousand newsgroups.) You then select those newsgroups to which you wish to "subscribe." (There's no charge unless you license a commercial service such as Clarinet.) This isn't a permanent decision--it just helps narrow the focus a little.

Once you've subscribed to some newsgroups, you'll see (depending on your news reader) a number of "threads" or discussion chains. Reading the messages is very much like reading e-mail. An excellent starting point is Usenet Info Center, run by Kevin Atkinson, a 17-year-old student from Maryland. <http://sunsite.unc.edu/usenet-b/home.html>

Usenet newsgroups are an extremely dynamic source of information on the Internet. The messages look like e-mail but won't pile up in your mailbox. People pick the newsgroups that interest them and periodically monitor their threads (discussions) to see what is happening. Asking questions is fine after you've monitored the group for a little while and looked at any FAQ (Frequently Asked Questions) files available.

Alternative newsgroups are the easiest to create, and they are often formed as an immediate reaction to something, either a new piece of software or a new protocol or just an idea. Later, if a subject gains wider interest, it might be voted in as an official newsgroup in comp or one of the other domains. The new newsgroup might even split off into subgroups, focusing on specific aspects of a subject. This usually happens because the experts don't want to be bothered by lots of beginner questions.

Here are some newsgroups to check out. More are listed in each chapter's resource table:

FAQ (Frequently Asked Questions) Files

FAQ files are often extremely useful repositories of beginner and intermediate questions in different areas. They are usually composed by the regular readers of a newsgroup, so that habitués aren't constantly distracted by questions from newcomers. By making all the usual questions and answers available, the newsgroup can concentrate on newer, more interesting issues.

FAQs pop up on the most amazing subjects, and they are periodically reposted to the Usenet newsgroup alt.answers. Check out Figure 2-1 for the first of 28 screens of the FAQ archive at the Massachusetts Institute of Technology (MIT). This directory of FAQs is located at <ftp://rtfm.mit.edu/pub/usenet/news.answers/>.

RFCs

RFCs (Requests for Comment) are a valuable resource for the more technically inclined; I mention them here because they demonstrate an important part of Internet culture: the development of new technology through open and widespread discussion. The RFC process works as follows. First, someone develops an idea (which could be a new protocol, standard, or even a new type of service or specific tool); when they start to get serious about it, they write up an RFC document, which describes the idea in detail. At that point people start using the material and suggest changes, improvements, and alternatives. This discussion can result in amendments to the original RFCs or new RFCs of their own. The titles may sound computer nerdish, but there's nothing like source documents. And that's what RFCs are.

Eventually, either the idea dies because of lack of interest, or it becomes a recognized guidepost (standard has a more official meaning) for Internet programmers, developers, and users. In this way the Internet grows in sophistication by encouraging the open development and sharing of ideas. See Figure 2-2 for a short list of recent RFCs.

RFCs are available online via Gopher, FTP, and e-mail. For retrieval information see <gopher://ds1.internic.net:70/00/rfc/rfc-retrieval. txt> or <ftp://ds.internic.net/rfc/>. Paper copies of all RFCs are available from InterNIC Information Services. For more information send e-mail to info@is.internic.net or call 800-444-4345 (choose prompt 3 from the InterNIC voicemail menu).

Mailing Lists

Mailing lists are e-mail-based systems for carrying on discussion groups. (Note: You'll hear these referred to as list servers or listservs as well, but list servers are actually the hardware/software for mailing list administration.) Basically, a computer is set up to accept mail at a certain address and redirect it to all members of the list. The list members do not even need to know each other, so long as they have "subscribed" to the list. Although many mailing lists are mirrored, or duplicated on Usenet newsgroups, many are available only as mailing lists. One advantage of mailing lists is that you will get the messages whether you look for them or not, usually within a few hours of when they are sent. So if you're the type who needs reminding, you'll like them. Mailing lists are also easy to create and so are useful for timely or focused topics with a limited audience. One-way mailing lists are often used for announcements of product upgrades.

Additionally, most mailing lists have archives, in which all their correspondence is saved. You should learn how to search these archives; they are an excellent resource. When you join a mailing list, you usually will receive a message that describes all the commands available for that list. If the list offers search features, they'll be mentioned there. If it doesn't, or if you don't want to subscribe to the list, try searching some Internet indexers to see if someone has put together an archive for that particular mailing list. I've found several this way. Mailing list archives can be a useful source of information and a record of discussions of issues of interest. Knowing how to retrieve all messages containing specific key words can be quite helpful. The following are some mailing lists that focus on Internet issues:

Magazines and Journals (Print)

Newsstand and postal subscription magazines and journals that focus entirely on the Internet are starting to arrive on the scene, and established magazines are paying more attention to the Internet and the subject of cyberspace. Even Newsweek has a regular section called Cyberscope, which often includes URLs for interesting sites. Here are some examples of print magazines, available by subscription or at the newsstand, about the Internet:

Internet Business Advantage
$99/year (12 issues). This magazine is published by Wentworth Worldwide Media. 800-638-1639; fax 717-393-5752; e-mail success@wentworth.com.
Internet Business Journal
$149/year (12 issues). This journal offers Internet business advice and includes a column called Internet Advertising Review <http://www.phoenix.ca/sie/iar-home.html>. It is published by Michael Strangelove, author of How to Advertise on the Internet. <http://www.phoenix.ca/sie/ibj-home.html>; 613-565-0982; fax 613-565-4433; e-mail @strangelove.comsubscriptions.
Internet World
$29/year (12 issues). This magazine is put out by MecklerMedia and offers interviews, columns, and advice for both new and experienced Internet users. It is available at newsstands. 815-734-1261; 800-573-3062; fax 800-896-1666; e-mail info@mecklermedia.com; <http://www.mecklerweb.com/>
NetGuide
$23/year (12 issues). This magazine, published by CMP Publishing, Inc., bills itself as the "Guide to Online Services and the Internet." It is available at newsstands. 800-829-0421; fax 516-562-7406; e-mail netmail@netguide.cmp. com; <http://techweb.cmp.com/net>.
Wired
$40/year (12 issues). This magazine has good interviews and prides itself on reporting on the latest and boldest in computer technology. It is available at newsstands. Back issues are available through its e-mail mailing list. Send e-mail to info@wired.com for details. An online version called HotWired is available at <http://www.hotwired.com/>. 800-SOWIRED; 415-222-6200 outside U.S.; fax 415-222-6399; e-mail talkzsubs@wired.com.

Online Magazines and Journals

Online magazines are growing rapidly. In the list that follows I've included both print magazines that are starting to offer some or all of their text online, as well as true online magazines and journals that are published entirely on the Internet. You should check them out for both content and style. Some are specifically oriented to Internet publishing issues, whereas others are computer magazines that talk about Internet issues, among other things. As with any online service you come across, elements of their layout style and design may be worth emulating. Hundreds of other online journals and "zines" focus on noncomputer topics. Following are just a few of the magazines and journals available online.

Computer-Mediated Communications Magazinei
CMC Magazine is a free, entirely online publication that does not accept advertising. It is edited by John December, co-author of World Wide Web Unleashed, and reports on people, events, technology, public policy, culture, practices, research, and applications of computer-mediated communication. Computer-mediated communication is the academic term for what goes on when people communicate with each other via computer. WWW, Gopher, and Internet servers are a subset of computer-mediated communication. E-mail, Usenet News, and mailing lists are other examples of computer-mediated communication. CMC Magazine also publishes opinions and essays about issues related to computer-mediated communication. <http://sunsite.unc.edu/cmc/mag/current/toc.html>
GNN (Global Network Navigator)
This WWW site developed by O'Reilly and Associates (publishers of Ed Krol's Whole Internet User's Guide and Catalog) is one of the best and most popular examples of Internet publishing. It combines an online journal with links to Internet services and the O'Reilly book catalog. A sign of their success and the importance of the Internet: America Online purchased GNN in June 1995. <http://www.digital.com/gnn/GNNhome.html>
HotWired
This is the online version of Wired magazine, and it is considerably different from the print version. HotWired has five channels that the editors call "doorways into the digital revolution." These include technology, "way new" journalism, the arts, commerce, and electronic conversation. Unfortunately, using them requires registration, which is free so far. Be aware that you have to remember your member name and password to get back in. I wonder how many people end up re-registering because they forget their password. <http://www.wired.com/>
Public-Access Computer Systems Review
This free journal published by the University of Houston is primarily aimed at librarians and others who maintain publicly accessible computers. <gopher://info.lib.uh.edu:70/11/articles/e-journals/uhlibrary/pacsreview>
San Jose Mercury News
This daily newspaper in California's Silicon Valley has tons of computer industry and Internet news and is available by online subscription. The stories online are updated hourly. <http://www.sjmercury.com/>
St. Petersburg Times Interactive Media
This is an experimental site for the Interactive Media Department of the St. Petersburg Times in Florida. <http://www.times.stpete.fl.us/ default.html>
TechWeb
This site by CMP Publications, Inc., offers links to many of the CMP magazines, including Communications Week, Comm Week International, Computer Reseller News, Computer Retail Week, Electronic Buyers News, Electronic Engineering Times, Home PC, Information Week , Interactive Age, Internet Business Report, Netguide, Network Computing, OEM Magazine, Open Systems Today, VAR Business, and Windows Magazine. As of May 1995 access to the last three months of its articles was free and entirely searchable. The search feature is WAIS based, and you can choose all magazines or just specific ones and then perform the search. <http://www.techweb.com>
Ziff-Davis Publishing
This site offers links to many of the Ziff-Davis magazines, including PC Magazine, PCWeek, PC/Computing, MacWeek, MacUser, Computer Shopper, and Windows Sources. Unfortunately, it does not provide full text, but only a sampling of what's in the print version. One convenience is the ability to download shareware mentioned in the magazines. <http://www.ziff.com/>

Also see this index of online journals: <http://www.w3.org/hypertext/DataSources/bySubject/Electronic_Journals.html>.

Internet Sites and Services of Interest to Internet Publishers

Look around the Internet for relevant standards and operating procedures. Sometimes these will come out in FAQ files or in comments on mailing lists or in newsgroups. Again, the point here is to know what the boundaries are, whether they are legal, institutional, or just "good taste." The point is to be a good Internet citizen. To do that you need to know the rules, written as well as unwritten.

The following sites offer Internet guidelines and reference information that you might find useful:

InterNIC
This site offers detailed information on the operation and resources of the Internet itself. The InterNIC is really three different services, maintained by separate companies under contract with the National Science Foundation. The service areas are Information Services (run by General Atomics), Directory and Database Services (run by AT&T), and Registration Services by Network Solutions, Inc. (NSI). <gopher://gopher.internic.net> or <http://www.internic.net> One particularly useful site provided by InterNIC is its archive, Internet Documentation (RFCs, FYIs [For Your Information files], and so on). <gopher://ds1.internic.net:70/ 11/.ds/.internetdocs>
Quality, Guidelines, and Standards for Internet Information Resources
This is a collaborative gathering of thoughts and ideas on the subject of improving information servers of all kinds. Among other lists and documents to which it provides links is "Top Ten Things Not to Do on a Web Page." <http://coombs.anu.edu.au/SpecialProj/QLTY/QltyHome.html>
The Directory of Electronic Journals, Newsletters, and Academic Discussion Lists
The Association of Research Libraries helps to maintain this site, which gathers links to online publications. <gopher://arl.cni.org:70/11/scomm/edir/>

Organizations and Associations

You may not be a joiner, but you should be aware of these organizations, because they will be helping to shape the Internet in the years to come:

The Internet Society
This international nonprofit society's principal purpose is to "maintain and extend the development and availability of the Internet and its associated technologies and applications--both as an end in itself, and as a means of enabling organizations, professions, and individuals worldwide to more effectively collaborate, cooperate, and innovate in their respective fields and interests." It offers both individual and organization memberships. Voice: 800-468-9507 (United States only) or 703-648-9888. <http://info.isoc.org/> <gopher://gopher.isoc.org/> <ftp://ftp.isoc.org/isoc>
The Electronic Frontier Foundation (EFF)
The EFF is a nonprofit civil liberties organization working to protect freedom of expression, privacy, and access to online resources and information. It was founded in July 1990 by John Barlow, Mitch Kapor (founder of Lotus), Steve Wozniak (cofounder of Apple), and others to ensure that the principles embodied in the U.S. Constitution and the U.S. Bill of Rights are protected as new communications technologies emerge. The Web site includes archives of articles and court cases pertaining to most major social, political, and legal online issues. <http://www.eff.org/>. 202-861-7700; fax 202-861-1258; e-mail ask@eff.org.
Internet Engineering Task Force (IETF)
The IETF is a large open international community of network designers, operators, vendors, and researchers concerned with the evolution of Internet architecture and the smooth operation of the Internet. It is open to any interested individual. The IETF organizes working groups for the technical work of designing and developing Internet protocols. <http://www.ietf.cnri.reston.va.us/>
The World-Wide Web Consortium (W3C)
W3C operates under the leadership of Tim Berners-Lee, the author of WWW, and was formed to document and encourage the development of a common set of tools and basic programs for continued WWW development. It is the closest thing to the "owner" of the Web. <http://www.w3.org/>
Electronic Privacy Information Center (EPIC)
EPIC, based in Washington, D.C., is a public interest organization established in 1994 to focus public attention on emerging privacy issues relating to the National Information Infrastructure (better known as the Information Superhighway), such as the Clipper Chip, the Digital Telephony proposal, medical records privacy, national identification systems, and the sale of consumer data. EPIC is sponsored by the Fund for Constitutional Government and Computer Professionals for Social Responsibility. EPIC publishes online newsletters and reports, pursues litigation under the Freedom of Information Act, and conducts policy research on emerging privacy issues. <http://epic.digicash.com/epic/> or send e-mail to info@epic.org.

Internet Indexers and Directories

Many excellent books and online guides are available if you need help finding what's out there. Indexes, catalogs, and resource directories abound on the Internet; some that you may find useful appear in Table 2-1.

Make notes as you search through this material, because later you will want to be sure that your service is listed in these indexes, directories, newsgroups, and mailing lists. Plus, the problems you have now will be similar to the challenges facing your potential users. Keep notes and learn from this experience. You'll never be less familiar with the Internet than you are today.

Also, learn from your competitors. Pay attention to how servers are set up and how documents are arranged, as well as what special features they provide to their users. For example, many sites allow you to search by key term, but one equipment manufacturer allows you to search by picture. You start with general categories and narrow down to the specific part. The company also has built and patented its own searching system.

Pay attention to feedback, searching, and charging mechanisms to see how easy and intuitive they are to use and which charging systems they are using.

Netiquette

Netiquette, short for network etiquette, is important. It is far easier to offend someone or tread on someone's toes when you can't see their face. Likewise, it is far easier to take offense because online interaction excludes nuances of rhythm, mood, and context and all body language. Here are some rules to live by:

  1. Never respond rashly to provocation on the Internet. Although "flame wars" (angry messages are called flames) have often broken out on unmoderated mailing lists and Usenet newsgroups, no one encourages the practice. If you must respond, do it offline (that is, in private e-mail, not to the group).
  2. Criticize ideas, not people. Try to be as constructive in your criticism as possible.
  3. Watch or lurk for a while in any mailing list or newsgroup to get a feel for the tone and to avoid asking "newbie" (newcomer) questions.
  4. Do your homework before asking questions in a mailing list or newsgroup. Look for the relevant FAQ files and archives to avoid asking questions that have already been answered millions of times.
  5. Think carefully before sending a message. Remember that you are making your reputation internationally through the messages you send out. And messages last far longer than you'd expect.
  6. When you ask for responses and say you'll summarize for the list, do it.
  7. Answer questions when you can, but check whether others have already supplied the information.
  8. Stay on the topic of the newsgroup or mailing list. No matter how strongly you feel about a subject, it is not appropriate to send information or opinions about it to unrelated groups.

Most books about exploring the Internet talk about Internet manners, but you might want to look at Netiquette by Virginia Shea, published by Albion Books. Both online and print versions are available. <http://www.bookport.com/Albion/catNetiquette.html>

Identifying Your Competition

A word about competition might be appropriate here. Remember that the Internet was built, and continues to expand, on the basis of cooperation. You should think about this if you find other services offer similar information or competing products on the Internet. Perhaps you can offer to link to their service if they'll link to yours. Or you might want to join forces with your competitor to form a more comprehensive subject-oriented server. Remember that this is a new medium and the old rules don't always apply.

Competition may not be quite the right word in your case, depending on the nature of the server and service you want to establish. But it is important to be aware of what similar servers already are doing on the Internet. Ideally, your server can work in combination with others to provide broader, more detailed resources to the Internet community instead of striving to kill someone off. Remember, the Internet has reached today's interesting state only because of the incredible spirit of cooperation that it has somehow inspired.

Finding what's out there in your area of expertise is essential. For example, suppose for some reason you decided that your goal in life was to put up a WWW site dedicated to the comic strip "Calvin and Hobbes" by Bill Watterson. Once you were finished, wouldn't you be surprised to learn that five other sites (one each in the United States, France, Holland, Sweden, and Norway) already are dedicated to "Calvin and Hobbes"? This is where the Internet gets cooperative: they all link to each other. <http://www.eng.hawaii.edu/Contribs/justin/Archive/Index.html>

Defining Your Goals

What can you sensibly hope to attain by publishing on the Internet? Why do you want to publish on the Internet? Do you really know what you hope to gain? Here are a few services that you might be able to offer:

Or you might want to publish on the Internet because doing so decreases the amount of time your staff spends searching for or providing information, because you want to experiment with new technology that may benefit you or your company, or because you want to sell things and make money.

One advantage of defining your goals early is that it enables you to look at how others are achieving similar goals. You can become more analytical as you scour the Internet, noting the techniques each site uses and adding them to your repertoire.

And I should say something about experimenting and keeping your goals loosely defined. The Internet is in its infancy as a method of communication, and all the implications are not yet clear. More than one company has found that a side benefit of publishing on the Internet became more important and potentially more profitable than its original goal. AMP, Inc., a builder of electrical connectors, started with the idea of putting its catalog of 80,000 parts on CD-ROMs. Then its customers (mostly high-tech companies) urged AMP to go to the Internet. In the process the company developed a unique scheme for finding exactly the right part by picture, name, or part number. Among other things, AMP found that its sales personnel started selling a wider range of parts than before. AMP has patented the interface it developed and is starting to sell it. You never can tell what you're going to learn and how useful it will be for others.

Identifying Your Audience

What do you know about your audience? If you want to publish information about Macintosh computers, you'll be able to store files in Macintosh formats and ignore other file formats. On the other hand, if you want all those interested in political science to read your work--no matter whether they're coming from a Macintosh, Sun Microsystems workstation, PC, or mainframe--you should plan to put all your documents in a portable format such as plain text or HTML or in a variety of formats.

Who is going to be happy to see your information on the Internet? Or whom do you want to come to your site? The time you spend thinking about your audience will pay off. If the members of your audience tend to flock to certain newsgroups or e-mail lists, you'll want to be sure to announce your server in those places. And you might want to consider offering to archive those newsgroups or mailing lists as a service (and good advertising). If someone is already doing this, you might consider offering to mirror, or duplicate, their service as a way to decrease the load on their server or staff. Here are some questions to ask yourself, broken into categories:

LOCATION

EQUIPMENT

CHARACTERISTICS

SECURITY AND ACCESS

What If I Don't Know?

If you don't know what kind of audience to expect, you should consider adding a poll to your server. That way you can learn more about the people who connect to your server and decide later whether to focus more narrowly or broadly. Depending on the audience, you might post some questions to a newsgroup or mailing list. Or you might persuade someone already running a server to let you post your questions. The responses could be e-mailed to you. Don't forget the possibility of contacting other, more experienced Web, Gopher, or WAIS providers to find out what they think you might expect.

Identifying Your Data

The point I'm trying to make is that you need to identify and analyze the type of information files you want to make available on the Internet. This decision has implications for your server type and size and for what sort of user interface to design. You will need to be able to answer the following questions for each type of information you plan to publish:

In addition, think through what happens after your server is set up. If your data are timely, you should plan to update regularly, and be sure that the date you last updated the file is a prominent feature. You will also have to think about what this means in terms of staff support.

Another part of identifying your data consists of finding and recruiting information providers, or those people or organizations that will provide you with what you plan to publish. For example, you might provide company press releases and arrange for the public relations departments to supply them to you with dates for posting and removal. You might start a literary magazine and collect short stories from high school students in your area. Or you might have a companywide telephone and e-mail directory that might be kept current by the department administrators. Be careful to verify the ownership of the files and data you publish and check for any copyright issues. Don't assume that because it's already on the Internet that it's freely available for your use. See Chapter 9 for a discussion of copyright and the Internet.

Identifying Your Needs

Internet publishing can be done with little cost and time, or it can be a massive million-dollar effort.

Hardware

Given the kind of information you want to publish and the size of the audience you think you might expect, what kind of computer system is available to you to meet those needs? You can set up a server with low-powered computers, and initially that may be the quickest approach. However, if your service becomes popular, it will demand much more computing power. In either case, consider the cost as well as your familiarity with each type of hardware.

For example, you might start out with an inexpensive PC running Windows or with a Macintosh. These systems are familiar to many and are easy to set up to run Gopher or Web server software (or both in combination). This will let you get your feet wet and explore the possibilities without investing a lot of money. Or you might make space on a PC running Microsoft Windows NT. Apple sells a version of its Macintosh that is optimized to run as a WWW server. If usage exceeds the capabilities of your machine, you can move into more powerful hardware.

Note that larger, more powerful UNIX-based machines are getting easier to use. Sun Microsystems and Silicon Graphics are starting to bundle some models with Web server software and management tools to simplify server administration. See Chapter 11 for a discussion of future trends, but expect that you may end up using a combination of servers of all shapes and sizes.

Network Connection

Do you have a dedicated Internet connection, or do you dial in for Internet access? In the latter case, you wouldn't want your home machine to be your server, because it wouldn't be available most of the time. So you'll need to determine whether Internet publishing is a service your Internet provider offers. Or you can negotiate with a service that, for a fee, will run your server on its equipment. Be sure to clarify what access you'll have for updating your information. See Chapter 8 for a discussion of finding and selecting an Internet service provider.

Software

It may be obvious to you which server (Gopher, WWW, or WAIS) or combination you want to use, but review your decision before you take the plunge. Look through Chapters 3 (Gopher), 4 (WWW), 5 (WAIS), and 6 (Other Tools)--including the servers you don't intend to use--and take a particularly close look at the examples in Chapter 10 for what others did in similar situations. And after you decide which type of server software you'll use, you'll have to decide which version is for you.

Human Resources

Do you have the expertise and time in-house to comfortably run your server? Depending on the size and complexity of the computer system you'll use for your server, you may need to recruit the assistance of a system administrator. The system administrator is the person in charge of all technical aspects of running the computer system. Some of the Gopher, Web, and WAIS software discussed in this book require a system administrator to get it set up. But once it's running, the data librarian steps in. This is the person or persons who do most of the work involved in collecting, organizing, and preparing the data and documents that go on your server. They don't need much system experience at all, because the process of loading the data into the correct directories can be made quite simple. You may indeed want everyone in your organization to add and keep track of their own files on your Gopher, WWW, or WAIS server. However, you should plan to entrust someone with the overall structure and organization of your server, and that person would be the data librarian.

Make sure you have the technical support you'll need. If you plan to run a server on a system with which you are not familiar, and your system administrator is too busy to help, you should think carefully about what support you'll need and where you'll get it. Check the documentation and the newsgroup or mailing list for the type of server you plan to use to get an estimate of the technical expertise it requires. Internet server software doesn't always require a system administrator, although it's often helpful. It is possible, even preferable, to have your system administrator set up and run the server, then give control of various sections to the data owners or maintainers. In many cases, new files can be published by simply moving the files to a particular directory on the server. Remember, one goal of each of the three main server protocols, Gopher, WWW, and WAIS, was to increase the ease and access to Internet publishing.

What level of staff effort will be required to keep your information current? Often this isn't an issue, because you'll simply put information on your server and replace it every few months or so when it gets updated. But if your company's reputation depends on the timeliness and accuracy of the information your server provides, it would be a good idea to make sure that the burden won't be overwhelming.

Will you need the services of a graphic designer? If you're planning a WWW site, hire or consult with a graphic designer. Many Web sites show the lack of design expertise. Others use graphic design to grab your attention.

Will you need management to change policies or procedures in order to get the information and updates your server will need? Will other departments need to be involved, either in providing or using the information on your server? Will your server change some systems that are in place? Start early in recruiting management support.

Identifying Available Resources

When identifying hardware for use as a server, think about a backup. What happens if the machine breaks down? Once your server becomes established, you won't want to have any down time, planned or otherwise. The Internet is international so people will be using your server at all hours of the day and night.

Don't underestimate the time and learning involved. The kind of people you recruit should be excited and eager. This will be a learning experience for your whole crew, so be sure that they (and you) are up for it.

Determining How to Make Up the Difference

If you don't have the right computer equipment available, it's often possible to piggyback on another server. The operations can easily be separated from the user's point of view. Users often don't pay attention to which computer they are connected. Another alternative is renting time and space on a commercial service (see Chapter 8).

One advantage of Internet publishing is that, for the most part, it requires no specialized skill to format and prepare the information. For that reason you might include a broader range of your staff and personnel as data suppliers than you might first have expected. Ideally, the author of each document can be responsible for putting it on your server and keeping it current.

Identifying Charging Mechanisms

How to charge for information--if at all--is an area that is changing rapidly. First, I'd like to make a case for providing some real content or information on your server for free. You don't have to give away your product to be part of the Internet community, but try to offer more than just advertising and order forms. You may well find that by putting up detailed and useful information, you increase your business in other ways. This is the strategy that GE Plastics takes in providing hard technical information about its products and its design guides on its site, which has no ordering or charge-account mechanisms. <http://www.ge.com/gep/homepage.html>

Microsoft provides access to much of its technical support Knowledge Base via the Web although it also sells the information on CD-ROM. The CD-ROM version does have some extras, but an enormous amount of information is available for free online. <http://www.microsoft.com/pages/kb/kb.htm> Putting the information on the Internet also provides customers with an alternative to Microsoft's phone and fax support.

Or you might use your company's server to sponsor a community organization or just provide information about your area as a gesture of community support. Some companies have offered specialized services, such as mortgage calculation or an index to 800 numbers.

But if you absolutely have to charge, some techniques are in use and others are in the advanced planning or testing stages. This is an area that is changing rapidly, so pay close attention to the sites and discussion groups you find that deal with this. Chapter 7 describes and summarizes the different approaches available.

Identifying Security Risks

Methods exist for limiting access to all or part of your server from certain Internet subnets. Be aware of this possibility when laying out your server, because you might want some sections to be available only to those on your system and others to be available to everyone. However, whenever you screen out IP addresses or subnets, you should be aware of the gateway chaining problem. Basically, this means that any gateways (for example, proxy, caching, and e-mail) to your Gopher, Web, or WAIS server should block out the same IP addresses that your server restricts. Otherwise you're leaving a back door open.

The extent to which security is an issue for you depends on the nature of your organization and computer system. For high security you could put up a firewall to protect the rest of your network from unwanted intrusion. A firewall is a method of isolating your company's or organization's computers behind a computer that acts as a gatekeeper, or firewall. All outgoing requests for information or services go to that one machine, which hides the sender's machine address but passes on the request. Any incoming information is sent to the firewall computer, which passes it on to the appropriate internal machine. See Firewalls and Internet Security by William R. Cheswick and Steven M. Bellovin, published by Addison-Wesley in 1994, for an in-depth discussion of this technique.

For lower security you still must ensure against unauthorized alteration of your server files and operating system. Several software developers for Gopher, WWW, and WAIS have already started posting detailed analyses of their server software's security holes and risks, as well as their recommendations. Those links, where available, are described Chapters 3, 4, and 5 on Gopher, WWW, and WAIS.

Estimating Your Costs

Like anything else, what you will spend depends on what you want to do and, in this case, how many people will come to see what you've done (i.e., browse your Gopher, Web, or WAIS server). Many people mistakenly assume that because dial-up Internet-browsing accounts can cost as little as $15 per month, Internet-publishing accounts are just as inexpensive. The type of Internet access account needed to do what's described in this book is much more expensive, sometimes running hundreds or even thousands of dollars per month, with additional hardware and line installations adding to the cost. That higher price has to do with the types of service available in your area and the Internet throughput (size of the pipeline) you'll need. See Chapter 8 for more information about finding, selecting, and working with an Internet service provider.

The Gopher, WWW, WAIS, or other server software you may use varies widely in price as well. Some of it is available for free, and other programs cost $5,000 to $15,000 plus annual maintenance charges of $1,000 or more. The free software may not be such a bargain when you factor in programming and debugging time.

The type of computer you use will also be an expense, but thankfully there is a movement toward using less expensive PCs and Macintoshes in the place of the larger, more support-intensive UNIX machines. Staff support costs are often underestimated. In the case of WWW servers, you'll want to consider hiring a graphic designer. And editorial staff is usually much more important in Internet publishing than programmers are.

Don't let these costs scare you off. In its June 19, 1995, issue InfoWorld published a poll that showed cost expectations vary widely. Nineteen percent of respondents expected to spend less than $5,000, 13 percent said more than $45,000, and 35 percent didn't know.

Planning the Presentation of Your Data

A dedicated individual, or a team of editorial consultants, graphic artists, and programmers, can develop a sophisticated well-organized information server. Christine Quinn, Stanford University's director of electrical engineering for computer and network services, describes Stanford's attempts to put a consistent "Stanford" face on its collection of Web servers in an article titled "From Grass Roots to Corporate Image--The Maturation of the Web," which she presented at the 1994 WWW conference in Chicago. Because Stanford was looking for a unifying theme, the university hired a graphic artist to insert pictures of Stanford's famous architecture in its home pages. Quinn says that if you're ready to spend thousands of dollars on a brochure that will be seen by hundreds, why not spend an equal amount to do it right on the Internet, where it will be seen by thousands? <http://www.ncsa.uiuc.edu/SDG/IT94/Proceedings/Campus.Infosys/quinn/quinn.html>

Dale Dougherty of O'Reilly and Associates, publishers of UNIX administration books, describes Internet publishing as a collaborative art. At the 1994 WWW conference in Chicago he recalled how his company's extremely popular Global Network Navigator (GNN) site needed more editorial and production people than technical staffers. Having a graphic artist on board was essential. According to Dougherty, a publisher "organizes an audience," and publishing on the Internet has the same requirements of any publishing enterprise:

Even the presentation of your menus and the design of your hypertext require consideration. Are your users quickly finding the information they want from your server, or are they getting sidetracked by confusing menus?

How the information is presented--known on the Net as the metaphor that is used--can make a big difference in how easily and quickly newcomers learn their way around your information server. A metaphor is a way to describe the information server or how it is arranged by comparing it with something commonly understood. For example, a directory structure is often described as treelike, with a root and many branches. A book is another metaphor; although a book can be emulated in hypertext and has certain advantages, the flexibility of Internet servers may lead to even easier ways to organize information. The Open Software Foundation Research Institute, a nonprofit research and development organization based in Cambridge, Massachusetts, has collected ideas, analogies, and metaphors that might be useful in describing data and its relationships. <http://riwww.osf.org:8001/www/InfoPresForm/> Some of the ways of organizing information that researchers at the institute have come across on the World-Wide Web are shown at <http://riwww.osf.org:8001/www/InfoPresForm/results.html>.

The structure of your organization might provide an obvious structure for your server, but you may prefer a more subject-oriented model based on the main areas your server covers. Here are some examples:

Extensive testing is one of the best ways to make your server and data presentation user friendly. By testing I mean trying out all the links and menus yourself from a variety of browser programs. And keep testing once your server is in use. Links can die or change on you--that link you thought was fascinating and pertinent two months ago may have changed focus or moved to another site. Also test your server's presentation by asking novice users to explore it, and then ask them to summarize what your server contains. You might be surprised by what they miss.

Always keep in mind that users are a resource. They can provide helpful information. On the Internet the bias is toward criticism and feedback (with the assumption that you'll fix what's wrong), so take advantage of that, as well as the ideas your users may provide. In their travels through the Internet they may have come across a design or presentation solution that could be useful to you. Don't forget that they may have seen and used many more servers than you have. Actually, it's common to borrow techniques from others on the Internet.

Another method is to try different approaches to presenting the same data and then use your system's logs to determine which route browsers use most.

Different people tend to learn in different ways. Psychology tells us that people have four basic learning styles:

Keep these styles in mind when designing your server. It might behoove you to present your crucial information in several different ways. For example, you could present a single concept with a written explanation, a diagram, a step-by-step example, and a discussion of the benefits and implications. Cater to the different types of learners in your audience.

Preparing Your Data

Gopher servers generally need less data preparation than WWW and WAIS servers. Usually, you just pop the file in place and it's instantly available on your server. But even with Gopher servers many questions come up about the preparation of material.

Acquiring the Material

You should analyze the need for

Formatting the Material

Formatting concerns include

Access to the Material

In determining which image file formats to use, you should ask whether

Future

You also need to consider your needs in the future:

Indexing Your Data

You might consider indexing the contents of your Gopher and WWW servers (WAIS servers have built-in indexing), although it's not required or even always necessary. If you have just a few documents and your menus aren't very deep, users are unlikely to get lost or miss items on your server. However, if you have an extensive menu structure, a large collection of documents, or large documents, you should consider an index. The usual practice is to place a search link at the beginning of your opening screen that offers users the chance to search the contents of your Gopher or WWW server. Users then can search quickly through the index for what they want, instead of working their way through various submenus or hypertext links.

If you've ever been unable to find something on a server that you knew was there, you'll appreciate what a convenience the ability to search the contents of a server can be. Jughead, developed by Rhett "Jonzy" Jones, a programmer at the University of Utah, is one tool available for indexing UNIX Gopher server menus. As discussed in Chapter 1, Veronica also does this kind of indexing, although for thousands of Gopher servers. We'll talk about Jughead and Veronica and other such tools and techniques in Chapters 3 and 4.

You also can index the full text of documents on your Gopher and WWW servers. Full-text indexing is different than indexing by menu item or file name. Full-text indexing means that every word in each document is indexed, not just the menu description line ( for Gopher servers). For example, if you have a document summarizing the history of the conflict in the Middle East and it mentions Syria, the menu line might say only "Middle Eastern Conflict--History." Someone searching for everything she can find about Syria won't turn up that document. But with full-text indexing a search on Syria would retrieve it. Some Gopher and WWW servers, such as GN, WN, TurboGopher, and MacHTTPD, offer the ability to do full-text indexing. Others allow links to WAIS to accomplish this.

You'll want to think about what kinds of searches your users might make and whether you'll need sophisticated indexing features such as structured field searching, which means that users can search fields or certain parts of the text, including title, author, and date. FreeWAIS-sf adds structured field searching to the freeWAIS indexing software, which is available for UNIX systems. Libraries and large databases will want this level of sophistication. Whenever the text information you are publishing is so big that searching on a simple word or words fails to narrow the field, you need to consider whether fields are available and field searching is necessary.

Menu and full-text indexing are not the only kinds of indexing. If you plan to provide a large number of nontext files, such as programs, pictures, sound, and video, you will want to index them. You might index only their file names (or menu items), but if you have explanatory blurbs about each one, you will want to index the explanations so that indexes retrieve the nontext file as well.

External indexers are usually programs that attempt to go out and collect information from all over the world and then make that index available from one server. Veronica (for Gopher sites) and Archie (for FTP and Gopher sites) are the most famous examples of external indexers. External indexers are a crucial element in indexing Gopher and Webspace (Webspace means the content of all the servers browsable with WWW browsers). They are analogous to listing a book in an international card catalog system. WWW Worm (WWWW), ALIWEB (Archie-like Web), and assorted other search programs traverse the Web to develop similar indexes. Learning the techniques and philosophies behind each of these external indexers is quite useful. The best way to do this is to look at the descriptions in Table 2-1, follow the links to each one, and then read more about them. Sometimes you'll find theoretical papers discussing search strategies, but usually you can learn their approach within a few minutes. Then you should register your server with all those indexes that you deem appropriate if you want your server to be truly internationally available.

Indexing has many advantages for both you and your users. It can lessen the load on your server and ease the frustration and complaints users have about getting lost in large servers. But indexing can be difficult to install and complicates server administration, one responsibility of which is ensuring that the indexes are always up to date.

Registering or Announcing Your Server

Gopher, WWW, and WAIS all have free central registries that attempt to keep track of all the servers in the world. You should register with them and apprise them of any server name changes you make. You also should be aware of the subject indexes and automatic and voluntary indexing servers. Register with them as appropriate. See the indexing sections of the Gopher, WWW, and WAIS chapters (3, 4, and 5) for a list of these services and an explanation of their philosophies and indexing techniques.

If your server performs a service or acts as a resource for a certain population on the Internet, post information about it to the relevant Usenet newsgroup and e-mail the appropriate mailing list. Keep your announcement short and to the point, and try to avoid boasting. Asking for feedback and additional links is both good netiquette and good business--it will help you improve your server. Finally, contact other similar Web, Gopher, and WAIS sites and invite them to link into your server. The easiest way to find related sites is to search the Internet directories and indexes using terms that describe your own server.

Maintaining Reliability

A reachable site is a well-maintained site. You should plan to budget for maintenance and hardware sufficient to handle the load. If your Gopher, WWW, or WAIS server is often down, unavailable, or filled with links that don't work or data that are out of date, your
company's reputation could be hurt. Servers go down because of power failures, machine malfunctions, software bugs, human error (someone turns it off), and overload. Although nothing is fail-safe, you need to plan for monitoring and maintenance of the server hardware and software. Some sites set up automatic programs that let them know if their server is down. I've even heard of someone who set up a machine to do this monitoring and page him if something is down.

Links can be incorrect if they were entered incorrectly and never checked or they changed without notification. A server has no way of knowing all the sites that link to it, so posting notices of changes to everyone who needs them is impossible, although some server administrators attempt to notify all sites they are aware of. Obviously, this requires a certain amount of personal attention. Plan to regularly verify and update your server's data and links. Some free UNIX programs for Gopher and WWW servers that automatically attempt to verify that all your links are working include:

Go4check--<gopher://tjgopher.tju.edu/00/networks/internet/tools/gopher/go4check>

Anchor Checker--<http://www.ugrad.cs.ubc.ca/spider/q7f192/branch/checker.html>

Link Verifier--<http://wsk.eit.com/wsk/dist/doc/admin/webtest/verify_links.html>

You should consider mirroring to another site or backup server if you expect to have a popular server. There are actually several kinds of mirroring. One type negotiates with another site to maintain a duplicate of its site or service. These sites are often on different continents to ease the load on international links. LeWeb Louvre does this with the original site in France and the mirror in North Carolina. The Internet Movie Database started out in Cardiff, Wales, and now has mirrors in Australia, the United States, Germany, and Japan. All sites offer the same data.

In another mirroring arrangement one site maintains several machines in parallel, each acting as the Gopher or WWW server. This is what the University of Minnesota Gopher crew had to do to handle the extreme load on its Mother Gopher server (because Gopher was invented there, the University of Minnesota's site is called the Mother Gopher). The university runs 10 Mac IIc's (running the A/UX operating system) in parallel, each with an 80MB hard disk just to support the top level of its Gopher server. Below that the university has a variety of machines, including Sun Microsystems and NeXT workstations and more A/UX machines.

Evaluating Server Performance
and Audience Response

Evaluation is an important but often ignored component of any process. In this case most of the servers come with some sort of log file, which is easily activated if not so easily analyzed. Chapters 3, 4, and 5 on Gopher, WWW, and WAIS servers detail what is possible with those servers, but check to see whether this information can be gleaned from the log file:

Turning log files into useful reports is sometimes more difficult than you'd expect, and some utilities have been written to simplify this process. Those currently available are described in Chapters 3, 4, and 5 on Gopher, WWW, and WAIS. Again, it's important to check with the Usenet newsgroup or e-mail mailing list for your particular brand of server to stay current with what these utilities can do. They are often written by administrators facing problems similar to yours.

Because Gopher, Web, and WAIS servers routinely log the host name (and IP address) of each user's computer, along with the items retrieved, you do need to be aware of the potential for invading users' privacy. To protect users' privacy many log analysis programs automatically strip off the lowest level of IP address in their reports, which removes this link to individuals. You need to know who your audience is, but you don't need to know who each user is. You might also consider modifying the server software to remove the link between users and items retrieved.

Nothing prevents you from building your own evaluation tools into your server. For example, some servers allow users to fill out a questionnaire. See Figure 2-3 for an example of an online comment form from a Web server. The answers can be stored in a private file, e-mailed directly to you, or posted right back on your server. One WWW site has a "sign-in sheet" that allows users to leave a comment as well as their name and e-mail address, data then added automatically to what is displayed on the server. Be aware that not all browsers support online forms, so always give your users an alternative, such as a way to e-mail their comments.

Another option is to provide users with a voice telephone number, fax number, e-mail address, and/or postal address from which they may obtain further information. Although this somewhat defeats the purpose of publishing on the Internet, responses can be an objective measure of the interest generated by your server.

Or perhaps you've posted a catalog, and users place their orders through an existing telephone or fax order system. In that case make sure that the order form or telephone operator asks for the source of users' information, so that you can accurately evaluate your server's contribution.

Using What You've Learned

You may or may not have planned to implement your Gopher, WWW, or WAIS server in stages, but almost certainly you will learn from the process and find better ways to accomplish your goals. Given that this is a most malleable medium, you should plan to continually check, revise, update, and perhaps redesign your server. It is an artificial restriction to say that once it's up it can't be changed. That logic may apply to print versions, but with the Internet there's no such thing as final.

However, if you change information or remove menus that may have been linked by other servers, you should post explanatory notes. In such cases it is helpful to leave a link in place that describes the change and gives the new link information. You may also find it helpful to keep an administrator's log, describing the changes you make to software and hardware and particularly any changes in philosophy on menu arrangement or hypertext design. Also record any addition of other resources, such as a searchable index or major bodies of information, such as new departments. You may find the log useful in the future, or it may give another administrator an insight into your server's design. Much of this is of interest to your users as well, so you should consider adding a "What's New" section to your server for providing this information online. If you do, you should still keep a separate and private administrator's log in which you record changes or upgrades in your system or server software as well as problems you encounter and their solutions.

Summary

This chapter is about the groundwork you should do before you
start publishing on the Internet. This includes exploring Gopher, Web, and WAIS sites both in your subject area and in others to get ideas and techniques you can use in your server. And keep notes of the problems you have and what helped and didn't help in your searches. Become familiar with FAQs, mailing lists (and their archives), and Usenet News, as well as the various Internet indexers and directories listed in Table 2-1. These are your tools on the Internet and you should know them well.

You will need to answer many questions--about your audience, your competition, charging policies, and how the information you publish will be gathered as well as maintained. You may need to think about a graphic designer. Are there any special security issues?

The presentation and organization of your data can be an important factor in your site's success. Do you have a design team or will you need one? Think about indexing the contents of your server wherever you have a large amount of material. Both internal and external indexes are important, so be sure to register your server. Also plan for continual maintenance and testing of your server's links. They can go out of date quickly, but tools are available to help you do the testing automatically.

Evaluate your server log files, particularly the error logs. Ask for feedback from your users (by e-mail, online forms, or even phone or fax). Users are a resource; don't waste them. This is a growing medium, so assume that your server will need to be redesigned regularly, that it is a perpetual work in progress.

Finally, remember that the Internet is not just millions of potential customers; it's also a network of people with varied interests, enthusiasms, and talents. Contribute something to the Internet to help it grow.


small image of cover of Internet Publishing Handbook
Table of Contents