7-2: Searchable

You have built a really cool site, with great content, excellent interactive usability, and a beautiful design. But how does anyone know? One way they might know is by searching for you on a search engine. You need to do some basic things to help search engines find you, to figure out who is coming to your site, and to encourage more people to come. This will help with some of the basic ways to do that.

Objectives

  • Understand how search engines work.
  • Be able to get people to come to your site.

Robots and Site Maps

To set up your site for search engines, you need to know a little bit about how search engines work. The first stage of indexing the web involves sending out robots (or “crawlers,” or “spiders”) to harvest content from the web. They do this by going from one page to another, following links. So, if your site is unlinked, it is unlikely it will be indexed. Moreover, the more links the better, for lots of reasons. In any case, it’s important to make your site “friendly” to this robotic visitors.

Some of the elements of a page that are frequently overlooked by designers are the pieces that do not show up on the page at all. The title is important enough that it is required in order to validate, but many designers don’t think very much about what title they use. Meta tags indicating the author or keywords are likewise often ignored. This is unfortunate, because search engines rely heavily on this material to make your page available to searchers on the web. It is smart to make your page as accessible to search engines as possible, because this is probably the best way to make sure the site you have worked so hard on does not fade into the obscurity of the web.

Back in the day, when you created a new site, you would submit it to search engines to be indexed. There were even services that charged you for submitting your site to multiple search engines. These days most search engines recommend you just make sure your site is widely linked and easily accessed.

There are a couple of ways you can communicate with the search engine (and other) robots that visit your site, telling them what you would like indexed and what you want them to ignore. First, robots are expected to follow what is called the “robot exclusion standard,” a way of signaling to robots the parts of a site that should be left out of their efforts. There are lots of reasons to do this. For example, some part of a site might be temporary, or might have very dynamic content, and by indexing it, the search engine will confuse visitors who are expecting one thing only to find it gone or changed.

There are a couple of ways of excluding robots from certain pages on your site. The first is by creating a file called robots.txt and putting it at the root of your website. Within this file, you can provide rules for which robots can reach which part of your site. You can also specify how much of an delay there should be between requests, so that the robot does not overburden your server. The Wikipedia page provides some examples of working robot.txt files. The second option is to include the robots metatag information. The following tag should mean that a page is ignored by search engines:

<META name=”robots” content=”NOINDEX,NOFOLLOW” />

Robots.txt gives you a way to stop robots, but Google provides you with a way to help the search engine out, with their suite of “webmaster tools.” Their tools will provide you with information about when your site was indexed and whether the crawler had any troubles, as well as providing very useful information about what people are searching for when they come to your site. It also provides tools to help you build Google Sitemaps, tools that provide the Google crawler with some information about how to find things on your site.

Onsite Search

The focus here is on searchers of the web finding your site, but there is the related question of whether you provide search functionality for your own site. For a one-page site, search is a bit silly, as it is for a three-page site. But it’s not clear how large a site necessitates providing search functionality. Certainly a small blog does. Why is it necessary? Because users have come to expect to find it, and even on a site with outstanding information architecture, many users will bypass clear navigation and instead want to type in a keyword search.

Of course, you can create a search engine for your own site, or you can find search engines that you can integrate. But it is usually best to leverage online services that are already doing the crawling and indexing. A number of services, including Google Custom Search, Freefind, and Picosearch, among others, can provide you with code and support to place a search engine within your site.

Marketing

As noted above, nothing replaces links to your site and positive references to it around the web–particularly when these come from influential existing sites. Not only do such links drive traffic to your page, they cause search engines to index your page and rank it more highly in the results of searches. As a result, there is a cycle by which the high traffic sites continue to get even more traffic.

How do you get started in that attention cycle? You aren’t the only one who wants to know that. Many people are trying to draw audiences to their work. There are a few things you might try. If you know people who already have a lot of traffic, you can ask them to link to you, or can comment on their sites. Be careful, it’s easy to fall into the trap of spamming a new site, but this is a good time to make use of your social networks. They may not even be web masters or bloggers, with services like Facebook, just about anyone can link to something. These informal marketing networks tend to be very successful when they work, but they do not always work.

The next step is actually advertising your site. Again, Google has taken the lead here, and it is possible to buy as little or as much publicity as you can afford through AdWords. AdWords (the little textual Google ads) are so common on the web these days that they have become almost invisible. If there is a relevant website that accepts private banner ads, you can also purchase adds directly, and draw traffic from that site to your own.

Logs & Analytics

When the web was new, people would just put up a site, and it was “published.” That was the end of it. Then people started looking at the server logs to see who had been visiting their pages. Most web hosts provide access to the raw server logs. The recommended web host for this class, NearlyFreeSpeech, requires that you turn on logging. Take a look at the “sites” tab when logged in to NearlyFreeSpeech, and you will see that there is the ability to “enable” the access log. (Note that this is turned off by default because, for a popular site, these files can get very large, very quickly.)

Here is an example of what a few lines from such a log might look like:


72.30.65.52 - - [28/Feb/2009:00:38:38 -0800] “GET /wrist-gps/ HTTP/1.0″ 200 4400 “-” “Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)”
213.155.0.208 - - [28/Feb/2009:00:39:11 -0800] “HEAD / HTTP/1.1″ 200 281 “http://host-tracker.com/web-site-uptime-monitor/1454480/” “Mozilla/4.0 (compatible; HostTracker.com/1.0;+http://host-tracker.com/)”
81.52.143.26 - - [28/Feb/2009:00:39:36 -0800] “GET /phone-sex-construction/ HTTP/1.1″ 200 4399 “-” “Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1) VoilaBot BETA 1.2 (support.voilabot@orange-ftgroup.com)”
74.87.194.100 - - [28/Feb/2009:00:40:12 -0800] “GET /1-and-1-hosting-sucks/ HTTP/1.1″ 200 18495 “http://www.google.com/search?hl=en&q=1+and+1+sucks&btnG=Search” “Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.48 Safari/525.19″
74.87.194.100 - - [28/Feb/2009:00:40:12 -0800] “GET /wp-content/themes/greyness/style.css HTTP/1.1″ 200 9929 “http://alex.halavais.net/1-and-1-hosting-sucks/” “Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/1.0.154.48 Safari/525.19″

Obviously, this isn’t something very nice to look at, and there are literally thousands of lines like this each hour even for a fairly low traffic blog. To get an idea of what this log is logging, let’s take a look at the first line. It says that someone from the IP Address 72.30.65.52 hit my site, gives the date, and indicates what file they requested (”/wrist-gps/”). It also provides a string indicating the USER-AGENT; in this case, it is Yahoo’s robot, Slurp! In fact, the first three of these accesses are from robots of various sorts, and a big part of my site traffic is from various types of robots (search engines, aggregators, spammers, etc.) requesting information from my site.

This might be useful as a log if I am looking for something specific, but it needs some kind of massaging if it is going to be useful to me. One of the more popular (free) pieces of software that allows this is called AWSTATS. Here, for example, it counts some of the “hits” on my site over time:

aw1

Later in the page, it tells me what people are searching for when they show up on my blog.

aw2

Yes, I guess people actuallydo search for “rss porn” and “how to cheat.” Above these is the tail end of a list of referring sites. So, links from Boing Boing, Smart Mobs, and BootyCallFriends all lead to my site. (Actually, the last site, in case you hadn’t figured this out, is one of the many ways spammers manage to annoy us: referrer spam.)

More recently, I’ve been using Google Analytics, a service that has you include a link to a small piece of javascript, and provides you with fairly detailed information over time. There is almost too much information, but it’s a wonderful set of data. Here’s just one view: a map of the cities from which people accessed my blog:

ga

There are lots of other features. I can benchmark my site against other sites of a similar size, I can find out how loyal my visitors are over time, I can see which adwords are successful in drawing people to my site, and I can test different layouts to see whether one leads to people exploring the site longer or buying more stuff. In all, it’s a pretty powerful way to begin testing.

This goes beyond this course, but that sets you out on the right foot when it comes to continual testing and improvement of your design. By watching carefully how users behave on your site, you are able to fine tune it. Of course, this won’t work all on its own, but these kinds of metrics are a great start.