XML & Web Services

Google Sitemaps - Collaborative Crawling with the SiteMap XML

Google Sitemaps allows you to update an XML file to Google for telling the crawlers where to crawl for update… wow.

Quote:

Google Sitemaps is an experiment in web crawling. Using Sitemaps to inform and direct our crawlers, we hope to expand our coverage of the web and improve the time to inclusion in our index.

It's a collaborative crawling system that enables you to communicate directly with Google to keep us informed of all your web pages, and when you make changes to these pages.

By placing a Sitemap-formatted file on your webserver, you enable our crawlers to find out what pages are present and which have recently changed, and to crawl your site accordingly.

you get:

  • Better crawl coverage to help people find more of your web pages
  • Fresher search results
  • A smarter crawl because you can provide specific information about all your web pages, such as when a page was last modified or how frequently a page changes

amazing feature… there is even a Google Sitemap Generator available .. and you are allowed to update the Google Crawler once per hour.. the site map generator is a little pythin script to run actually, no heavy Windows application.

Quote:

Webmasters with a Unix webserver may consider setting this up as a cron job.

Google created an own Sitemap XML Schema where you can specify those details as

  • changefreq — how frequently the content at the URL is likely to change
  • lastmod — the time the content at the URL was last modified
  • loc — the URL location
  • priority — the priority of the page relative to other pages on the same site
  • url — this tag encapsulates the first four tags in this list
  • urlset — this tag encapsulates the first five tags in this list

I am sure there will be great discussions and applications for this use…

Average rating
(0 votes)

Similar entries