Making Search-Engine-Friendly Sites

A white paper by Avi Rappoport, Search Tools Consulting

Search engines are the guides to World Wide Web -- when people want to find information, that's where they turn first. Whether it's a webwide search engine such as HotBot, MSN Search, or America Online, or a topical portal search engine, your pages should be present and attractive. There are several ways to improve your site's interaction with these search engines, and this article will give you some tips and hints about how to make your site search-engine-friendly.

As an extra bonus, the principles here apply to any local site search engine.

The examples in this article are based on a mythical web site all about the characters in Shakespeare's plays, especially Banquo's Ghost from Macbeth.

Design for Searching from the Beginning

When you are designing your site and the individual pages, remember that you want people to find the parts that are relevant to them. Creating search-engine-friendly sites will encourage visitors to find your pages, and to bookmark and link to them. There are several elements to search-friendly design:

Provide Navigation and Context Information

Beyond all else, you must show your visitors what site they are on, and how to get to the home page. Although you may design the site in a hierarchical tree, visitors will arrive at any page without any knowledge of how they got there. Be kind to them. If you have a complex or large site, you should also find a way to show the context of each page, so visitors can see the category the page is in, and the related pages. Again using our Shakespeare example, your page on Banquo's Ghost could be in two categories: Shakespeare's Characters and Macbeth, and could have related pages such as Hamlet's Father's Ghost.

When a search engine index robot (also known as a spider or a crawler) encounters your site, it will start from the home page or any other page that you have submitted, and follow all links from that page. Therefore, you should make sure you have at least one link to all important pages, so that the indexing robot can follow the links.

Make Sure Your Content Is Special

Before going to great lengths to organize and categorize your site, you should search for other sites on your topic. If there are hundreds of other sites describing Shakespeare's characters, selling Wonder Widgets, or adoring your favorite movie star, you'll have a hard time getting attention. Perhaps you should consider changing your focus to a less common subject, rather than competing in such a crowded field.

Focus Your Pages

Most search engines use a "relevance" ranking to match pages to search queries. These are complex rules designed to make sure that the most pertinent pages are presented first. The more focused your pages, the more likely that someone searching for the topic will find that page.

For example, if you have a site about Shakespeare's characters, and you have detailed information about both Macbeth and Banquo's Ghost, you should put that information into two separate pages. That makes it more likely that the pages will rank high in a search for those topics. If someone searches for Banquo, you want to make sure they find your page.

Several sort pages on various aspects of a topic will often fare better with search engines than one long page. Many of the relevance systems assign importance to a word on a percentage basis, so each word on a short page is considered more important. Make sure you divide the pages up carefully, though, or you'll make your readers uncomfortable.

Consider Your Vocabulary

You should use both very common words, and very precise ones -- this sounds like a contradiction, but think about it from the search engine point of view. People will search for both, and they should find your pages when they do those searches.

Living With Ambiguity

if you do a search for Banquo's ghost, you may see some pages about quantum chemistry, and wonder why. It turns out that there's a "ghost atom" Bq which is named after Banquo, so the search engines just find those pages along with the rest. The Web is so huge and search engines index so many pages that any search is likely to come up with some strange and mysterious matches. Don't let that worry you -- people encounter it all the time, and are learning to ignore it.

Don't Even Consider "Spamming" the Search Engines

Some people claim that they have ways to make sure that your pages are always presented first in the search results, and they will sell you the secret. If you believe that, I have a bridge to sell you too! This is called "search engine spamming" and it's just as bad as email spam. In fact, search engines have various ways to detect tricks, and will punish people who use them by removing the sites from their search index.

It's much better to have a good site with interesting information or unusual products. That way, people will want to come to your site, and you won't need to play games. Do follow the instructions on Submitting Your Site, to make sure your pages are in the search engine indexes.

Looking Good in Search Results Listings

When a search engine finds your page as the result of a search, it will show it within a list of similar pages. You want to make sure that the listing represents your page properly, and that it is easy to read and understand.

Good Page Titles are Vital

Make sure that you have a descriptive title for all your pages. I see so many pages that are simply called Products & Services or About Us -- that's no help at all! The page title can encourage people to read your page, so make it as interesting as you can. For example, the title Banquo's Ghost is accurate but doesn't convey much detail. Changing it to Banquo's Ghost: Supernatural Revenge provides an interesting and clear representation, so that visitors are not surprised when they come to it.

Tip: If you have framed pages, be sure each frameset and subpage has it's own title tag, so people can identify it in a list of search results.

The title is sometimes cut off, so keep your title as short as you can, and make sure that the most helpful words are at the front. Some people like to show the site name and something of the organization in the page title -- if you want to do that, go from specific to general. That way, a search which finds many of your pages will be easy to browse, for example:

<title>Banquo's Ghost : Macbeth : Shakespeare's Characters</title>

<title>Lady Macbeth : Macbeth : Shakespeare's Characters</title>

Tip: Remember that long titles make bookmark menus unwieldy, so do be careful about the length.

Page Descriptions: A Chance to Market Your Pages

The Meta Description tag lets you summarize your page and your site in general, and show how your page can answer a searcher's questions. All the major search engines will include that in their listings. This should be a description of the contents of that page, like one of those capsule movie reviews in the newspaper. For example:

<meta name="description" content="In-depth analysis of the character of Banquo's Ghost, from Shakespeare's Macbeth, concentrating on the theme of supernatural revenge, and contrasting with Hamlet's Father, Timon of Athens, and other theatrical ghosts.">

Tip: don't use double-quote marks in description tags, because the search engines will treat that as the end-of-text marker. Single quotes should be fine.

It's hard to write descriptions for lots of pages, so you can make a generic description and then customize it for each page, for example:

<meta name="description" content="Analysis of the characters in Shakespeare's Macbeth, including historical sources and psychological aspects. ">

You could replace Macbeth with Hamlet or Henry V for pages about characters in those plays.

Living Without Meta Descriptions

If you have hundreds of pages and can't describe each one, the search engines will generally display the first few lines of those pages. Think about page design and try to get some helpful text above any left-side navigation bars.

Make Sure You Rank High

When there are hundreds of matches for a search, the search engines must sort the pages by relevance. Each engine has its own algorithm (procedure) for determining relevance, often based on the number and position of the matching terms within a page. There are several legitimate ways to improve your search engine result rankings. These are generally commonsense procedures, rather than tricks or scams.

Use Key Words in Descriptive Titles

Many people try to be clever or cute in their titles, or forget them altogether, but search engines give a lot of weight to the words in the title. As mentioned above, choose a descriptive title with the most important words and phrases. Search engines assume that the title describes the contents of the page, and that anyone searching on that topic will want to see pages with the words in the title.

Meta Keywords

The HTML specification allows you to add hidden keywords, phrases and concepts that are relevant to the page, but may not be sufficiently emphasized, as well as provide alternate spellings and capitalization. Place these into the Meta Keywords tag in the header section of the HTML page like this:

<meta name="keywords" content="Banquo, Ghost, supernatural, theater, entertainment, Drama, Tragedy, MacBeth, McBeth, Shakespeare, Shakspere, Shakespearean Theatre, Queen Elizabeth I, Renaissance, England, Britain, Elizabethan Theater">

Some search engines may ignore or discount these tags, because unethical people have used them to misdirect searchers to unrelated pages (spam). But search engine policies changes over time, so you should create keywords for all your important pages. Just be sure that you don't repeat a word over and over: search engines will ignore that, assuming it is spam.

Tip: meta description tags are treated much the same as keywords for results ranking.

Frames

All major search engines can recognize frames and follow frame source links, so the information on the framed pages will be properly indexed. But watch out -- when someone finds your great page they will click on it and bring up just the page, without the associated frames. Therefore, each page should have a little bit of context, at least the site name and links to the main page of the site. You could also use a JavaScript or server-side function which can automatically brings up the frames when a sub-page is displayed.

Content, Content, Content

By creating focused pages with interesting content, you are doing the most important thing for your ranking in search results. As long as you use the same vocabulary as the people doing the sating, you're likely to do very well.

Make Pages Search-Friendly

Search engines read the text of a page and store it in their index for searching -- if they can't read a page, your site will be ignored. You can pretend to be a search engine indexer by using a text browser or by turning off the graphics, JavaScript, Java and Plug-Ins in your browser: what you see is what gets indexed.

Plain Text is Best

First of all, make sure you provide HTML text, rather than graphics or animation, for your most important information. Some people make graphics so they can control the font and size and layout of their pages, other people think it looks cooler. It doesn't matter how you got there, the search engine indexing robots simply can't read GIF, JPEG, Flash, or any other kind of graphical text, so you should always make sure there is plain text as well. Most search engines will ignore PDF (Adobe Acrobat) documents, so it's best to convert them to HTML.

Use ALT text

If you have text in graphical format, or pictures that might be interesting to searchers, use the ALT attribute of the IMG tag. This is indexed by many search engines (and read by speaking browsers used by blind people).

<img src="../art/ladymacbeth-sm.jpg" width=206 height=400 border=0
alt="Picture: Sargent - Miss Ellen Terry as Lady Macbeth">

Be Careful with JavaScript-Generated Pages

Some sites use JavaScript write and writeln to generate text on pages. Search engine indexers don't interpret JavaScript, so they'll ignore this text. To get around this, you can make a generic version of the text and put it into a <noscript> section on the page.

Write Valid HTML

Search engines must read through the HTML to find the words for the index, and they can be less forgiving than modern browsers. For example, if you forget to close an image tag, the indexer could ignore the whole rest of the page. If you have bad habits or an HTML generator that doesn't create very clean HTML, be sure to run a validation check on your most important pages, and fix any serious problems.

Make Links Easy to Follow

Search engines use programs known as spiders (or robot, wanderers or crawlers) to follow links and locate pages for indexing. These programs can follow simple links such as HREF and IMG links, and even frameset links, but there must be something there or the search engine will never find the other pages. If you have some private pages that should not be indexed, you can use the Robot Exclusion Standard to tell well-behaved search engines to avoid these pages. However, the only way to be absolutely sure that they will not index the pages is to protect them using server security.

JavaScript, Java and Flash

Links generated by JavaScript menus, Java navigation and Flash movies are invisible to search engine spiders -- unlike browsers, the spiders can't interpret them at all. To help them, make <noscript> sections of your pages, or create a site map page with simple HTML links to the pages.

Dynamic Data: Question Marks and Ampersands in URLs

If you have pages generated from a database or content-management system, and the URLs include characters such as ? or &, most search engines will ignore those links. They assume that the information generated will be changing too fast to be properly indexed, but that may not be true in your case. You can have the web server automatically rewrite these URLs, so they don't have those red flags, for example:

OLD: www.example.com/name?id=4&require=true

NEW: www.example.com/name/id.4/require-true/index.html

To do this, use, mod_rewrite for Apache, XQASP for Windows ASP sites or Pardeikes Welcome for Mac OS web servers.

Otherwise, you can periodically create a static version of the site, and let the search engines index that version. Just be sure to create simple links to the temporary pages.

File Name Issues

If your file names contain only letters and numbers, and end in .htm, .html, or .txt, all the search engines will be happy. You're probably OK if they have additional periods, underscores (_), or hyphens (-), and if the file names end in .asp, .php3 or .shtml.

Some search engines don't like pages with names containing other punctuation, such as commas, parentheses, colons and equals signs, or which end in .cfm, .jsp, .php, .pl or .ssi. To make sure you get indexed, you can use the rewriting modules described above.

The Magic File: robots.txt

Many search engine crawlers (also known as spiders or robots) require the presence of the robots.txt file. If you don't have it on your site, they will not index the site at all. For a full explanation of the purpose and format of this file, see the Robot Exclusion Instructions or the SearchTools Robots.txt page.

To allow the search engines to index your entire site, create a text file (not HTML) named robots.txt in the main web server folder. If you have a hosted site on a web presence provider's server, they almost certainly have a robots.txt file already. You can see your robots.txt file by simply typing in the URL of the site with the file name robots.txt, like this:

http://www.example.com/robots.txt

Your site may have disallowed indexing, so check to see if looks like this:

User-agent: *
Disallow: /

All well-behaved search engine indexing robots will accept this instruction, so change it or speak to your server administrator about changing it to remove the slash from the Disallow statement.

Here is the content that indicates all robots can index the site:

User-agent: *
Disallow:

If you have some private pages that should not be indexed, you can use the robots.txt file to tell well-behaved search engines to avoid these pages. The standard will explain how to do it in detail, and again, hosted sites should discuss this with their presence provider. However, the only way to be absolutely sure that no robots see the pages is to protect them with passwords.

Check Local Links Regularly

The search engines crawlers follow links from your main page (or other pages you submit), so if those links are bad, they will miss pages. Be sure to run a link checker on your site on a regular basis. If you mis-type or misplace a link, the search engine assumes the page is gone and removes it from the search index, and it will never be found by a search. You should double and triple-check your links when you make major changes to your site -- an ounce of prevention in this case is worth a pound of cure.

Submit Your Site To Search Engines

Search engines will only index your site if they know it exists. The quickest way to notify them of your site is to follow the instructions on the search engine site to submit your URL. If you go through the process yourself, you may even learn a little more about the search engines and improve your presentation.

Web search engines will index almost every site that is submitted -- they really want the bragging rights to the largest index. Other search engines are more selective: they may only want sites in a specific area, language or topic. You should definitely submit to the appropriate search engines, but may have to wait a while before they index you.

Search engines will index the your entire site by having the indexing robot follow links from the home page you submit. However, you can also submit other important pages, to be sure they are covered right away. Each engine has its own schedule for re-crawling sites, but most will update their index every couple of months. In addition, if you change your site substantially, you should resubmit the most significant changed pages so that the indexing robots will revisit your site. Do not submit every page on your site -- search engines may consider this "spam" and remove your entire site from their index.

Search engines are now charging submission fees to be included more promptly or indexed more often. If your site is a commercial site, it's a small expense for a good marketing opportunity. None of the major web search engines charges for results rankings, although several of them will allow you to buy a banner ad or text ad in results pages for particular searches.

You can also use a web site promotion service to submit your pages automatically. Prices range from free to shareware to hundreds of dollars. Many of these products and services are helpful and ethical, but some are scams. Be sure to read the information carefully, look for awards, reviews, or other indications that they are helpful and responsive, and always pay with your credit card so that you can withhold payment if necessary.

In any case, watch your server logs to see what happens when the search engine indexing robot comes through your site. You may notice that some pages are missed because they are not properly linked or are too deep within a site. Some log file analysis programs will provide special reports for robot activity.

Iterative Improvement

Don't rest on your laurels -- keep track of how you're doing and learn from your visitors.

More Resources

Conclusion

Web-wide search engines are matchmakers, bringing people with questions together with the sites that can answer them. It's worth your while to make sure your site is easy to find and presents itself well. If you follow the checklist in this article, you can be confident that your site will appear in its best light when someone asks a question you can answer.