Skip to main content

A Python API for scraping AO3 (the Archive of Our Own)

Project description

This Python package provides a scripted interface to some of the data on AO3 (the Archive of Our Own).

It is not an official API.

Motivation

I want to be able to write Python scripts that use data from AO3.

An official API for AO3 data has been on the roadmap for a couple of years. Until that appears, I’ve cobbled together my own page-scraping code that does the job. It’s a bit messy and fragile, but it seems to work most of the time.

If/when we get the proper API, I’d drop this in a heartbeat and do it properly.

Installation

Install using pip:

$ pip install ao3

I’m trying to support Python 2.7, Python 3.3+ and PyPy.

Usage

Create an API instance:

>>> from ao3 import AO3
>>> api = AO3()

Looking up information about a work

Getting a work:

>>> work = api.work(id='258626')

The id is the numeric portion of the URL. For example, the work ID of https://archiveofourown.org/works/258626 is 258626.

Get a URL:

>>> work.url
'https://archiveofourown.org/works/258626'

You can then look up a number of attributes, similar to the Stats panel at the top of a page. Here’s the full set you can look up:

>>> work.title
'The Morning After'

>>> work.author
'ambyr'

>>> work.summary
"<p>Delicious just can't understand why it's the shy, quiet ones who get all the girls.</p>"

>>> work.rating
['Teen And Up Audiences']

>>> work.warnings
[]

(An empty list is synonymous with “No Archive Warnings”, so that it’s a falsey value.)

>>> work.category
['F/M']

>>> work.fandoms
['Anthropomorfic - Fandom']

>>> work.relationship
['Pinboard/Fandom']

>>> work.characters
['Pinboard', 'Delicious - Character', 'Diigo - Character']

>>> work.additional_tags
['crackfic', 'Meta', 'so very not my usual thing']

>>> work.language
'English'

>>> work.published
datetime.date(2011, 9, 29)

>>> work.words
605

>>> work.comments
122

>>> work.kudos
1238

>>> for name in work.kudos_left_by:
...     print(name)
...
winterbelles
AnonEhouse
SailAweigh
# and so on

>>> work.bookmarks
99

>>> work.hits
43037

There’s also a method for dumping all the information about a work into JSON, for easy export/passing into other places:

>>> work.json()
'{"rating": ["Teen And Up Audiences"], "fandoms": ["Anthropomorfic - Fandom"], "characters": ["Pinboard", "Delicious - Character", "Diigo - Character"], "language": "English", "additional_tags": ["crackfic", "Meta", "so very not my usual thing"], "warnings": [], "id": "258626", "stats": {"hits": 43037, "words": 605, "bookmarks": 99, "comments": 122, "published": "2011-09-29", "kudos": 1238}, "author": "ambyr", "category": ["F/M"], "title": "The Morning After", "relationship": ["Pinboard/Fandom"], "summary": "<p>Delicious just can\'t understand why it\'s the shy, quiet ones who get all the girls.</p>"}'

Looking up your account

If you have an account on AO3, you can log in to access pages that aren’t available to the public:

>>> api.login('username', 'password')

Currently there’s only one thing you can do with this: if you have Viewing History enabled, you can get a list of work IDs from that history, like so:

>>> for entry in api.user.reading_history():
...     print(entry.work_id)
...
'123'
'456'
'789'
# and so on

You can enable Viewing History in the settings pane.

One interesting side effect of this is that you can use it to get a list of works where you’ve left kudos:

from ao3 import AO3
from ao3.works import RestrictedWork

api = AO3()
api.login('username', 'password')

for entry in api.user.reading_history():
    try:
        work = api.work(id=entry.work_id)
    except RestrictedWork:
        continue
    print(work.url + '... ', end='')
    if api.user.username in work.kudos_left_by:
        print('yes')
    else:
        print('no')

Warning: this is very slow. It has to go back and load a page for everything you’ve ever read. Don’t use this if you’re on a connection with limited bandwidth.

This doesn’t include “restricted” works – works that require you to be a logged-in user to see them.

(The reading page tells you when you last read something. If you cached the results, and then subsequent runs only rechecked fics you’d read since the last run, you could make this quite efficient. Exercise for the reader.)

License

The project is licensed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ao3-0.2.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

ao3-0.2.0-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file ao3-0.2.0.tar.gz.

File metadata

  • Download URL: ao3-0.2.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for ao3-0.2.0.tar.gz
Algorithm Hash digest
SHA256 348c582a608b8bf463a30e2288e66b82d8bb94731e05dfcfda3f0e80e6d98b7c
MD5 173bb7ff4163b46069bc8e0a1859e53b
BLAKE2b-256 744237551bfba671ad15de601b3eb67f576fd5e8e8d355a6236181012eca6088

See more details on using hashes here.

File details

Details for the file ao3-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ao3-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67dc5265e7b5e9856a028adfc21a740b897e3576c4026c8a6eb4239d3a3e1ae9
MD5 6057c461f87f6b8d39e8945ce7e91883
BLAKE2b-256 4f319dd4cd1ed4d3ded9b58d3d2d0cfda2ef56282ac4a379e2873819ab9f2afc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page