Skip to main content

A simple Worpdress eXtended RSS (WXR) parser

Project description

WXR-parser is a very simple parser for Wordpress eXtended RSS files, writtent in Python.

Feed it with a WXR file and it will return data about posts, categories, tags and comments, inside a dictionary, so you can use this data in your own projects.


WXR-parser has been tested under Python 2.7 and 3.4.


The recommanded install process is using pip, which will also handle any dependencies:

pip install wxr-parser

If you install it manually, you should also install lxml.


You can use the parser with the following instructions:

import wxr_parser

# parse a file
parsed_data = wxr_parser.parse('path_to_your_wxr.xml')

You can also parse a string containing WXR data.


WXR-parser returns a standard Python dictionary, with following keys:

  • site: data about the website. Not fully implemented.
  • categories: categories data (described below)
  • tags: tags data (described below)
  • posts: posts data (described below)


A dictionary of parsed categories, with categories nicenames as keys:


# output
    'a-category': {'slug': 'a-category',
                   'title': 'A category'},
    'another-category': {'slug': 'another-category',
                         'title': 'Another category'},
    'uncategorized': {'slug': 'uncategorized',
                      'title': 'Uncategorized'}


A dictionary of parsed tags, with tags nicenames as keys:


# output
    'another-tag': {'slug': 'another-tag',
                    'title': 'another tag'},
    'arbitrary-tag': {'slug': 'arbitrary-tag',
                      'title': 'arbitrary tag'},
    'some-tag': {'slug': 'some-tag',
                 'title': 'Some tag'}


A list of dictionaries, each dictionary corresponding to a parsed post:

# get the first parsed post

# output

    'categories': ['uncategorized'],
    'comment_status': 'open',
    'comments': [{'author': 'Mr WordPress',
                  'author_IP': None,
                  'author_url': '',
                  'content': 'Hi, this is a comment.<br />To delete a comment, just log in and view the post&#039;s comments. There you will have the option to edit or delete them.',
                  'date': datetime.datetime(2012, 7, 1, 18, 32, 32),
                  'id': 1}],
    'content': u'Welcome to WordPress. This is your first post. Edit or delete it, then start blogging!',
    'creator': 'admin',
    'guid': u'',
    'id': 1,
    'link': u'',
    'password': None,
    'ping_status': 'open',
    'pub_date': datetime.datetime(2012, 7, 1, 18, 32, 32),
    'slug': u'hello-world',
    'status': 'publish',
    'tags': [],
    'title': u'Hello world!'


0.1 - 12/10/2014

Initial release.


Contributions and feedback are welcome. You can fork the project and send me a link to your forked repo so I can merge it.

Feel free to email me at <>.


The project is licensed under BSD licence.

Project details

Release history Release notifications | RSS feed

This version


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for wxr-parser, version 0.1
Filename, size File type Python version Upload date Hashes
Filename, size wxr-parser-0.1.tar.gz (6.2 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page