Skip to main content

Access the IETF Data Tracker and RFC Index

Project description

The ietfdata library - Access the IETF Datatracker and related resources

This project contains Python 3 libraries to interact with, and access, the IETF Datatracker, RFC index, and related resources.

Installation

The ietfdata library is distributed as a Python package. You should be able to install via pip in the usual manner:

pip install ietfdata

Accessing the IETF Datatracker

The DataTracker class provides an interface for programmatic access to the IETF Datatracker, providing metadata about the development of IETF standards.

Instantiation

There are two ways to instantiate this class, depending on how it is to be used. The normal way, when writing code to perform analysis of a snapshot of the IETF data, for example if writing a research paper, a dissertation, or as part of a student project, is to use an archive file:

dt = DataTracker(DTBackendArchive("archive/ietfdata-dt.sqlite"))

When instantiated in this manner, the DataTracker class will read from the specified sqlite database.

If the specified sqlite database does not exist, then the DataTracker class will fetch a complete copy of the data from the IETF Datatracker. This will take around 24 hours, and will produce database that is about 2GB in size (if interrupted, it is safe to rerun the above operation and the download will resume where it left-off). Once the sqlite database is downloaded, future instantiations of the DataTracker will read from it directly and will not access the online IETF Datatracker, making them much faster and avoiding overloading the IETF's servers.

The following can be run from the command line to fetch a copy of the database:

  python3 -m ietfdata.tools.download_dt archive/ietf_dt.sqlite

If you are working on a paper, project, or dissertation with a group of people, one person should create the sqlite database and share a copy with the others. This avoids overloading the IETF's servers, and ensures that everyone working in the group generates the same results.

Alternatively, when writing code to perform live queries of the IETF Datatracker, for example as part of a tool that provides an interactive dashboard or status report, the DataTracker should be instantiated as follows:

dt = DataTracker(DTBackendLive())

In this case, the DataTracker class will directly query the online IETF Datatracker for every request you make. This is appropriate when making small numbers of queries, for exploratory programming or when performing a live status check, but must not be used for tasks that need to make large numbers of queries. The IETF will block your access if you make many queries using DTBackendLive().

Usage

The DataTracker provides an extensive API that is best explored by reading the source code for datatracker.py and datatracker_types.py. The examples/ directory contains a number of examples of how to use the library.

Start by importing and instantiating the library:

from ietfdata.datatracker import *
dt = DataTracker(DTBackendArchive("archive/ietfdata-dt.sqlite"))

To find information about a person:

p = dt.person_from_email("csp@csperkins.org")
print(p.name)
print(p.biography)

To find information about a document:

d = dt.document_from_rfc("RFC9000")
print(d.title)
print(d.group)

To find information about a group:

g = dt.group(d.group)
print(g.acronym)

for e in dt.group_events(group = g):
  print(e.time)
  print(e.desc)

There is a lot of information in the Datatracker. Read the source code the datatracker.py to understand what functions can be called, and the code for datatracker_types.py to understand the objects the take or return.

Accessing the IETF Mail Archive

The MailArchive3 class provides an interface to accessing the IETF email archive.

Instantiation

The MailArchive3 class is instantiated as follows, giving a path to an sqlite database containing a copy of the archive:

ma = MailArchive("archive/ietfdata-ma.sqlite")

Once instantiated, a call to ma.update() will bring the sqlite database up to date with the IETF mail archive. The first time the ma.update() function is called, it will download a complete copy of the mail archive. This is approximately 40 gigabytes in size and will take around 24 hours to download. Subsequent calls only fetch new messages, and are much faster.

The following can be run from the command line to fetch a copy of the mail archive:

  python3 -m ietfdata.tools.download_ma archive/ietf_ma.sqlite

If you are working on a paper, project, or dissertation with a group of people, one person should create the sqlite database and share a copy with the others. This avoids overloading the IETF's servers, and ensures that everyone working in the group generates the same results.

Usage

Start by importing and instantiating the library:

from ietfdata.mailarchive3 import *
ma = MailArchive("archive/ietfdata-ma.sqlite")

Once this is done, you can find the mailing list names:

for ml_name in ma.mailing_list_names()
  print(ml_name)

You can find information about a particular mailing list:

ml = ma.mailing_list("quic")
print(ml.num_messages())

You can find information about the messages:

for msg in ml.messages():
  print(f"From:    {msg.from_()}")
  print(f"To:      {msg.to()}")
  print(f"Subject: {msg.subject()}")
  print("")

Read the source code for mailarchive3.py for details.

Accessing the RFC Index

(tbd)

See rfcindex.py

Development

To modify the ietfdata library, clone from GitHub then follow the following instructions to install dependencies and test the results. If you just intend to use the library to support writing a paper or to perform some other analysis, you can skip this section.

The project uses pipenv for dependency management. To begin, run:

pipenv install --dev -e .

to create a Python virtual environment with appropriate packages install. Then, run:

pipenv shell

to start the virtual environment, within which you can run the scripts.

Once the virtual environment is started, running:

python3 tests/test_datatracker.py 

will run the test suite for the datatracker module. Running:

python3 tests/test_rfcindex.py

Will test the rfcindex module.

Release Process

  • Edit CHANGELOG.md and ensure up-to-date
  • Edit setup.py to ensure the correct version number is present
  • Edit pyproject.toml to ensure the correct version number is present
  • Edit ietfdata/dt_backend.py to ensure the correct version number
  • Run make test to run the test suite. If any tests fail, fix then restart the release process
  • Commit changes and push to GitHub
  • Check that the GitHub Continuous Integration run succeeds, and fix any problems (this runs with a fresh cache, so can sometimes catch problems that aren't found by local tests).
  • Run python3 setup.py sdist bdist_wheel to prepare the package
  • Run python3 -m twine upload dist/* to upload the package
  • Commit the packages files in dist/* push to GitHub
  • Tag the release in GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ietfdata-0.9.0.tar.gz (215.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ietfdata-0.9.0-py3-none-any.whl (276.7 kB view details)

Uploaded Python 3

File details

Details for the file ietfdata-0.9.0.tar.gz.

File metadata

  • Download URL: ietfdata-0.9.0.tar.gz
  • Upload date:
  • Size: 215.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ietfdata-0.9.0.tar.gz
Algorithm Hash digest
SHA256 69be56ee6950a34c6f25d20ff1036c61d99559990ea6412a2c7ddc9c681b9f47
MD5 47762550f22c1156c9fdc9e34f8b0dbd
BLAKE2b-256 29e190f8ebe20d45319563ee4e1d77b2bc5aa058f22bf1714629f6216db77c9d

See more details on using hashes here.

File details

Details for the file ietfdata-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: ietfdata-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 276.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ietfdata-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3a01b87d7e2d393a807784f9c8f8e47882ca223f6cf9ce38b9595b9ca000a142
MD5 853641f1a6cf64c2dd5c39f6f85bbeec
BLAKE2b-256 b33ad894b850f73432187905c0ea3902a0ad91a82192f728f0bc0aea1e0090f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page