Dave’s JDE Blog

Technology, Web and Marketing

SEARCH ENGINES DO IT WITH SPIDERS

At least the good ones do. There’s always a lot of talk online and offline about "how do I get to #1 in (insert search engine here, mostly Google)?". The answer is much along the lines of the famous New York quip "How do I get to Carnegie Hall?"…"With practice".

Search engines vary in the exact "how". At companies like Google, their search algorithms are pretty KFC-secret-recipe-for-your-eyes-only guarded secrets. Yet there are some conventionally-accepted "norms" that apply to Google and most others.

The term "spidering" or "crawling" is as old as the "web". If you image the Internet, of even a website being structured very much like a spider’s web, with one thread literally linking to another, search engines "crawl" along these threads and, based on certain algorithms, give "points" that ultimately increase or decrease a site’s ranking.

It’s a little more complex that than, but for the sake of this conversation, lets keep it as simple as we can.

"Spiders" or "crawlers" are little programs that do this searching, cataloging…."indexing" is the proper term. They look at pages in "text mode". Certain HTML "markups" such as headlines, page headers, links etc are weighed and scored. As a site owner, you can regular some of what is looked at and what isn’t, at least by the engines that play by the rules. There are codes that can be added to pages or in a file called "robots.txt" which given specific instructions to spiders.

Having relevant content within the text becomes important, yet there is a fine line between what is good for a site and what is bad. For example, lets use the term "Arizona Real Estate Information". Ideally, when people type this into Google, we want our site to be one of the first ones that appear. So, we put the search terms into our page headers, the "meta tags" which are parts of the page also used for indexing and we try to work it into the text. Too little and it might negatively affect our site’s placement. Too much and we run the risk of having content that looks bad to a visitor and may be considered "keyword spamming" by the search engines.

And this is one piece of the whole puzzle, but it is a piece that you, as a website owner, has a degree of control over.

There are, of course, many other things that affect placement including inbound links from reputable sites, proper HTML coding, site maps, image "alt" tags and a host of technical things.

Sites that are all "Flash" are harder, if not impossible to index because what the viewer may see as text is more or less an animated "picture". Ditto pages that are in fact images rather than text (you can test this out by using your cursor to highlight text on page. If you can’t, then it’s probably going to be difficult to index). The best search engine technologies, which are always evolving, are finding ways around this, to allow much more flexibility in site design. Even PDFs are now routinely indexed if they’re part of a site.

SEO or search engine optimization deals with all these and more, hence the industry is part art and part science…and part luck and very hard to get right and do well, hence a lot of bad press.

Anyway, I thought this might be of interest. If you’re a little technically inclinedyou can download "Lynx", a text viewer, which gives you a "spider-like" view of a site. The results can be quite interesting…

May 17, 2009 - Posted by Dave | Technology, Web Design | | No Comments Yet

No comments yet.

Leave a comment