Workshops
Search Engine Optimization
Overview
Introduction
What is a search engine and how does it work?
Techniques
- code
- other
Introduction
Your goal: get noticed
Tools: search engines
Tool you probably didn't think about: humans
How Search Engines Work
Yahoo, Google, etc. all work substantially the same. The differences come in features and in page ranking algorithms.
- They run a spider, which follows links
- Puts results into a database
- When you run a query, that goes against the database, not against the live web. The Search Engine returns ranked results.
- Google can only follow links. No link, no find.
Considerations
The page must be linked from somewhere though not necessarily from your site. If the page is still there, but you have removed your link to it, some other site may still have a link, so it will still show up in Google.
You can check the links to a given page. Enter link:url into Google (or Yahoo) and you will see the results. You can then go about contacting all those page owners.
The reference is in their database. When you delete the page, it doesn't instantly disappear from the Google database. It could take as long as a month or even two months. And Google isn't the only one out there. You just have to be patient.
The Well-Titled Web Page
- The <title> tag is important
- All search engines look at it
- Make sure every page on your site has a title specific to that page
- General rule: specific to general
META Tags
Concept
Tags that provide meta-information about the document. There are many such, but we'll look at two: keywords and description
Keywords
Useful for entering words or phrases that are not in the body.
For example, on a page that says Boise State, you might put BSU in the keywords.
If your page has acronyms, the keywords area is a place to spell out those acronyms.
Google does look at these tags, but no one knows how it affects ranking. It doesn't matter; there are other good reasons, outlined above.
Make what you enter here specific to the page.
Certain keywords should appear on multiple pages. For example, "OIT" should appear in the keywords of all OIT pages. So, put department-wide keywords into your template, then add page-specific keywords by hand for individual pages.
I'm really ambivalent about adding "bsu" and "boise state" to all Boise State pages.Syntax
<meta name="keywords" content="text" />
Delimit with commas. You can have one word or several within each.
Example (for a web page for this workshop)
<meta name="keywords" content="seo, search engine optimization, spider, crawler, robot, bsu workshop, web training" />
Description
The description tag is very useful because Google will use whatever it finds there as the description in its search results.
If you don't have a description meta tag, it picks up the first couple of lines from your page. On many documents, that will include the navigation bar, resulting in a less-than-useful description field.
Example
search on "human genome"
first hit is a good description; second is just the content; third is no content at all because the first few lines are Javascript or images
page 4 shows a really useless one: the title is "frame" and the description is the menu
Use complete sentences and keep it under 100 words. Make what you enter here specific to the page. My practice is to standardize where I can. For example, the language for the description of a "Mission Statement" page should be standard across departements. Syntax <meta name="description" content="text" />Other SEO Tactics
How old is your content?
Frequency of indexing
Frequency of updating
How Big are Your Pages?
Search engines will process only so much information for a page. 4K is typical, which should be plenty for most. This really only applies to excessively long pages
I'm only going to say this Once
Duplicate content is one of the leading causes of search engine penalties.
We don't do this much; just be aware of it
Make sure your site is cleanly organized and your content is original and not duplicated in any way.
Note that this doesn't mean you don't duplicate pieces of content. It only applies to entire pages being duplicated.
Canonical URLs
This is correct: http://www.boisestate.edu/courses/lma/
This is not correct: http://www.boisestate.edu/courses/lma
Nor is this: http://www.boisestate.edu/courses/lma/index.shtml
2-4-6-8 you really need to Validate!
Broken links
Valid HTML
Angels in the Architecture
- Functional forms
- Meaningful urls
- Each page should have a single content focus
- Don't break up text with tables (e.g., to achieve columns)
- Create a sitemap (an index)
- Navigation structure should move from broad to narrow
- Usability
- The more they use your site, the higher Google will rank it
- Functional
- Accessible
- Relevant
- Timely
- Accurate
One great page is worth a thousand good pages
Us, Robots
Concept
We need a way to prevent spiders from crawling certain portions of our site.
There is a standard for this. It's called a robots.txt file.
Usage
The file must be called robots.txt and it must be in the root directory. You specify subdirectories by name from there. A robots.txt file that is placed in a subdirectory will be ignored.
Subdomains are their own root. Thus, even though history.boisestate.edu is actually a folder on www, the robots.txt file would go in the root of the history folder and would be in effect.
You need to specify the user-agent. The wisest thing to do is to block all user agents:
User-Agent: *
But if you want to block some user agents and not others, you can.
To block the entire site, use a forward slash.
Disallow: /
To block a directory and everything in it, follow the directory name with a forward slash.
Disallow: /private_directory/
To block a page, list the page.
Disallow: /private_file.html
There are wildcards (also called pattern matching). Thus, you can block any asp, including asp?string or aspx with this:
Disallow: /*.asp$
This says any string (*) .asp and ending with any string ($). The dollar sign is for ending strings, the asterisk for intermediary strings.
Using meta tags
To prevent all robots from indexing a single page on your site, place the following meta tag into the <HEAD> section of your page:
<meta name="robots" content="noindex, nofollow">
To exclude a specific robot:
<meta name="googlebot" content="noindex, nofollow">
To allow robots to index the page on your site but instruct them not to follow outgoing links, you'd use the following tag:
<meta name="robots" content="nofollow">
The meta tag method is the only one open to editors who do not have rights to the server root.
To Market, To Market
"Holistic" SEO: pay attention to all your media, all your avenues of communication.
External
- print literature should include urls
- press releases and other pr work
- create communities and participate in communities
- send notice to your peers
- social networking
Internal
Send notice to Comm&Marketing
- Index
- Other pages
Send notice to related departments
Develop a standard format
- Brief description of the information
- Specific url
You can market your whole site, but you can also market specific pages within the site
Boise State GSA
We have our own search appliance.
