Technical Deep Dives
XML Sitemaps

XML Sitemaps

An XML sitemap is a file that lists all important pages on your website, helping search engines discover and crawl your content more efficiently. It acts as a roadmap of your site for search engine crawlers.

Why XML Sitemaps Matter

Discoverability

Helps search engines find pages, especially new or updated ones

Faster Indexing

New content can be indexed more quickly

Metadata

Provides additional info like last modified dates and priority

XML Sitemap Structure

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page/</loc>
    <lastmod>2024-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Sitemap Elements Explained

Element Required Description
<loc> Yes The full URL of the page
<lastmod> No Last modification date (YYYY-MM-DD format)
<changefreq> No How often the page changes (always, hourly, daily, weekly, monthly, yearly, never)
<priority> No Relative importance (0.0 to 1.0). Default is 0.5
Google largely ignores changefreq and priority. Focus on accurate lastmod dates instead.

Sitemap Index Files

For large sites, use a sitemap index to reference multiple sitemap files:

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2024-01-15</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2024-01-14</lastmod>
  </sitemap>
</sitemapindex>

Sitemap Types

Standard Sitemap

Lists regular web pages. Most common type.

sitemap.xml
Image Sitemap

Helps Google discover images for Google Images.

image:image namespace
Video Sitemap

Provides metadata about videos on your site.

video:video namespace
News Sitemap

For Google News publishers. Articles from last 2 days.

news:news namespace

Sitemap Best Practices

  • Only include canonical URLs - Don't list duplicate pages or non-canonical versions
  • Only indexable pages - Exclude pages blocked by robots.txt or noindex
  • Keep it updated - Regenerate automatically when content changes
  • Stay under limits - Max 50,000 URLs or 50MB per sitemap
  • Use absolute URLs - Include the full URL with protocol
  • Submit to search engines - Add to Google Search Console and Bing Webmaster Tools

What NOT to Include

Exclude These
  • Duplicate pages
  • Paginated archive pages
  • Tag/category pages (if thin)
  • Internal search results
  • Pages with noindex
  • Redirected URLs
  • 404 error pages
Include These
  • All important content pages
  • Product pages
  • Category pages (if valuable)
  • Blog posts
  • Service pages
  • Landing pages
  • Key informational pages

Submitting Your Sitemap

  1. Google Search Console - Go to Sitemaps > Add a new sitemap
  2. Bing Webmaster Tools - Go to Sitemaps > Submit sitemap
  3. robots.txt - Add: Sitemap: https://example.com/sitemap.xml

robots.txt Reference

User-agent: *
Disallow: /admin/

Sitemap: https://example.com/sitemap.xml

External Resources