What is a sitemap and what does it do?
- AuthorAndrew Gray
A lot of misleading information is written on the subject of sitemaps suggesting that the presence or absence of a sitemap is going to have an impact on rankings. For the vast majority of websites you can relax, this is simply not the case - Google will find all your pages and rank them in exactly the same way whether you feed them to Google using a sitemap or allow Google to find them itself.
So what are sitemaps really about? Their true purpose is revealed in this quote from Google:
Sitemaps are a way to tell Google about pages on your site we might not otherwise discover.
The important part of this is the last few words "we might not otherwise discover" which I'll talk more about in a second but first lets remember how Google finds pages.
How does Google find your pages?
In order to have any chance of ranking well in search results you first have to get those page into the index – in other words, Google must be able to find your page. This can happen in one of three ways:
- “crawling” the site – the most common way for Google to find pages is by the process of crawling. The Googlebot application retrieves a page from your website (e.g. the homepage) and finds all the links on that page. It then retrieves those pages and looks for other links. The process continues indefinitely until it has indexed every page that it has found. The process is often referred to as "crawling".
- “third party links” – if other sites link to your site then that’s another way in which the search engines will find pages
- “sitemap” – you can tell the search engine about pages that it might not otherwise find by using a sitemap file
When are sitemaps essential?
Imagine you had a website that provided access to a directory of some sort - for example a directory of solicitors in the UK. Your website would certainly include a form that allowed your users to search the directory in various ways: by name, by location, by skills etc. Lets imagine that the form was the only way of getting into the directory then you would have a problem because Googlebot isn't able to interact with forms and the crawling process wouldn't find any links. It's situations like this that sitemaps were invented for: it's a method to "tell Google about pages on your site we might not otherwise discover".
Non-essential sitemaps do no harm
Most websites don't include directories in which case Googlebot will find all your pages whether or not they are included in a sitemap. However, even in these situations, there's no downside to generating a site other than the time and efforted needed to keep it up to date. The good news is that our technical platform does this automatically through a process very similar to Googlebot - we have a spider which runs across the site finding pages and adding them to the full-text-search index. We then extract that information into an XML file in the approved format for sitemaps.
What is the difference between sitemap.xml vs. sitemap.gz?
There is no significant difference. The file extension of .gz simply indicates to Google that the sitemap.xml file has been compressed (Zipped) which is simply a more efficient way of doing things. If you had a small site or were trying to manage the sitemap file manually then you might not bother with compressing it, but our technical platform does this automatically and since some of the sites are very large using the compressed format does make a significant difference.
Sitemap files can be stored anywhere but are almost always stored at the root level so you can find our whether you have a sitemap simply by asking for the file. The URL would be something like http://www.yourdomainname.co.uk/sitemap.xml or in our case /sitemap.gz