Skip to main content

Background sitemap generation for Django

Project description

Django Sitemap Generate

Background sitemap generation for Django.

Build Status codecov PyPI version

Use case

Almost every content site has a sitemap. Django provides an application serving sitemap views, and it's OK if your website is small. If you have complicate logic in sitemap generation or if you have millions of items in sitemap - you'll have a massive load spikes when Google and another search engines come with thousands of there indexer bots. These bots will request same sitemap pages in parallel and those requests couldn't be cached because of large index interval and small hit rate.

The solution is to re-generate sitemap files periodically, once per day and not once per search engine indexer. These files could be served as static files which will not affect backend performance at all.

Prerequisites

These project uses index sitemap view and per-model sitemap views to generate sitemap xml files. To provide it you will need following.

  1. Add django.contrib.sitemaps to installed apps

    INSTALLED_APPS.append('django.contrib.sitemaps')
    
  2. Configure at least one sitemap

    from django.contrib.sitemaps import Sitemap
    
    from testproject.testapp import models
    
    
    class VideoSitemap(Sitemap):
        name = 'video'
        changefreq = 'daily'
        limit = 50000
    
        def items(self):
            return models.Video.objects.order_by('id')
    

    Note that changefreq parameter is a hint for search engine indexer, it does not affect sitemap files generation period.

  3. Configure sitemap serving

    from django.contrib.sitemaps import views
    from django.urls import path
    
    from testproject.testapp.sitemaps import VideoSitemap, ArticleSitemap
    
    sitemaps = {
        VideoSitemap.name: VideoSitemap,
        ArticleSitemap.name: ArticleSitemap
    }
    
    urlpatterns = [
        path('sitemap.xml', views.index, {'sitemaps': sitemaps},
             name='sitemap-index'),
        path('sitemap-<section>.xml', views.sitemap, {'sitemaps': sitemaps},
             name='django.contrib.sitemaps.views.sitemap'),
    ]
    

Now your website supports sitemap views.

Installation

pip install django-sitemap-generate

Working example is in testproject.testapp.

  1. Add sitemap_generate application to installed apps in django settings:

    INSTALLED_APPS.append('sitemap_generate')
    
  2. Add a reference to sitemap mapping to django settings:

    SITEMAP_MAPPING = 'testproject.testapp.urls.sitemaps'
    
  3. Specify name of the sitemap index url. If you have urlpatterns like example above, you can write:

    SITEMAP_INDEX_NAME = 'sitemap-index'
    

    default: 'sitemap-index'

  4. Specify name of the view.sitemap view. If you have urlpatterns like example above, you can write:

    SITEMAPS_VIEW_NAME = 'django.contrib.sitemaps.views.sitemap'
    

    default: 'django.contrib.sitemaps.views.sitemap'

  5. Set media path to store sitemaps under

    SITEMAP_MEDIA_PATH = 'sitemaps'
    

    default: 'sitemaps'

  6. Also you may need to setup forwarded protocol handling in django settings:

    SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
    
  7. Note that django paginates sitemap with p query parameter, but corresponding sitemap files are named sitemap-video.xml, sitemap-video-2.xml and so on. You'll need to configure some "rewrites".

  8. Optional. Change storage for generated sitemaps

    SITEMAP_STORAGE = custom_storage
    

    default: django.core.files.storage.default_storage

Usage

When you request sitemap over http, django substitutes website domain name from request to links in sitemap xml. In background, you'll need some environment variables. By defaults link are generated for localhost over HTTPS.

export \
  SITEMAP_PROTO=https \
  SITEMAP_HOST=github.com \
  SITEMAP_PORT=443

# generate all sitemaps
python manage.py generate_sitemap

# generate sitemap for single model
python manage.py generate_sitemap video

You may run sitemap generation from crontab:

0 0 * * * python manage.py generate_sitemap

You may run sitemap generation from celery:

@celery.task
def generate_sitemap():
    generator = SitemapGenerator() # Uses django settings by default
    generator.generate()

And you will need to configure xml files static responses, i.e. in nginx:

location ~* /sitemaps/(?<fn>sitemap(-(article|video)).xml {
    try_files /media/sitemaps/$fn$arg_p.xml @backend;
}

location /media/ {
    alias /app/media/;
}

location @backend {
    proxy_set_header Host $http_host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    set $app http://app:8000;
    proxy_pass $app;
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django_sitemap_generate-0.7.0.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

django_sitemap_generate-0.7.0-py3-none-any.whl (14.7 kB view details)

Uploaded Python 3

File details

Details for the file django_sitemap_generate-0.7.0.tar.gz.

File metadata

File hashes

Hashes for django_sitemap_generate-0.7.0.tar.gz
Algorithm Hash digest
SHA256 1fb86f20b73b6802aff33b5897731c3d2cfc4cf3176e89be59435affc91497a2
MD5 3baac9b16d691db3b87630acfd3224d5
BLAKE2b-256 051b932933cc5a1efc11753d4aff3e50558de8f738f792f51b303a1a2402010c

See more details on using hashes here.

File details

Details for the file django_sitemap_generate-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for django_sitemap_generate-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec4c828c435b5cb665e0d2320767d1d4d3b2d77775ee8eb59593d428c4c8620e
MD5 e5a36f5b338165db39f2681c8f0d47ae
BLAKE2b-256 b763c96150b5401dcfbe1676912528143cbd5bd2300de9d31b7f7104bbbbb48d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page