Skip to main content

Parse formats defined in IETF RFCs

Project description

PyPI - Version Documentation GitHub Actions Workflow Status Codecov Quality Gate Status

This project is a gut reaction to the wealth of ways to parse URLs, MIME headers, HTTP messages and other things described by IETF RFCs. They range from the Python standard library (urllib) to be buried in the guts of other kitchen sink libraries (werkzeug) and most of them are broken in one way or the other.

So why create another one? Good question... glad that you asked. This is a companion library to the great packages out there that are responsible for communicating with other systems. I'm going to concentrate on providing a crisp and usable set of APIs that concentrate on parsing text. Nothing more. Hopefully by concentrating on the specific task of parsing things, the result will be a beautiful and usable interface to the text strings that power the Internet world.

Here's a sample of the code that this library lets you write:

import json
import typing

from ietfparse import algorithms, constants

default_content_type = constants.APPLICATION_JSON
supported = [constants.APPLICATION_JSON, constants.TEXT_HTML]

def render_widget(request, widget):
    """Render `widget` based on the accept header"""
    selected, requested = algorithms.select_content_type(
        request.headers.get('accept'), supported,
        default=default_content_type)
    match selected:
        case constants.APPLICATION_JSON:
            body = json.dumps(widget)
        case constants.TEXT_HTML:
            body = translate_to_html(widget)
        case _ as unreachable:
            typing.assert_never(unreachable)

    return Response(body=body, content_type=str(requested))

The render_widget function is an implementation of Proactive Content Negotiation as described in RFC-9110. It calls select_content_type function to determine the most appropriate content type based on the Accept header from the request and the list of content types that the application supports. Then it renders widget in the selected format.

As usual, the devil is in the details. This library understands how to parse HTTP headers into datastructures and contains algorithms that do useful things with the parsed values. The datastructures themselves hide a lot of useful functionality. Consider the HTTP Link header that is synonymous with REST APIs. Links between resources are represented as a target URL and a relationship type. Consider an implementation of paging through a search result set. A naive implementation places the onus on the client to select each page by iteratively sending requests with the page number in the request.

GET /search?q=...&page-size=100
GET /search?q=...&page=1&page-size=100
GET /search?q=...&page=2&page-size=100

The client knows that it is "done" when it gets an empty response. Despite its simplicity, this approach has a few drawbacks. The largest is that every client has intimate knowledge of the query parameters and how to go from one response to the next request. This is a common web antipattern that you probably recognize. If you have been on the implementation side of search endpoint for a large dataset using an SQL backend, then you may have run into the performance problems associated with using SELECT ... WHERE ... OFFSET {page} LIMIT {size} style query.

What happens when we change the pattern from offset and page size to use a server-side cursor?

The short answer is that we have to change every client implementation. The next iteration is usually to add pagination information into the response structure. Something like:

{
  "data": [],
  "paging": {
    "total": 1234,
    "next": "/search?q=...&page=4&page-size=100",
    "previous": "/search?q=...&page=3&page-size=100",
    "first": "/search?q=..."
  }
}

Now our clients can follow links embedded in the response structure and the server is completely in control of the pagination API. If the traversal algorithm changes to pass a cursor in the URL, then it simply changes the links between pages in the response. This is at the core of what Roy Fielding termed the Representational State Transfer interaction pattern. There is still a problem in here ... clients need to parse metadata from the responses. In essence, they have to separate the data and the pagination data in the response. The HTTP Link header is used to move the links between representations out of the body and into HTTP headers.

GET /search?q=...&page=3&page-size=100 HTTP/1.1
Accept: application/json, application/msgpack;q=0.7

HTTP/1.1 200 OK
Content-Type: application/json
Link: </search?q=...&page=4&page-size=100>; rel="next"
Link: </search?q=...&page=4&page-size=100>; rel="next"
Link: </search?q=...&page=4&page-size=100>; rel="previous"
Link: </search?q=...>; rel="first"

[]

Now the response is simply a list of items found. Much easier to handle on the client side of things. However, Link headers have a complex syntax so parsing them requires some work. In addition to the individual header values, they can be combined into a single Link header contain a comma-separated list of values. The headers.parse_link function transforms a Link header into a list of datastructures that make accessing the individual properties simple.

>>> from ietfparse import headers
>>> links = headers.parse_link('</search?q=...&page=4&page-size=100>; rel="next"')
>>> len(links)
1
>>> links[0].rel
'next'
>>> links[0].target
'/search?q=...&page=4&page-size=100'
>>> str(links[0])
'</search?q=...&page=4&page-size=100>; rel="next"'
>>> links[0]
<ietfparse.datastructures.LinkHeader object at 0x100ad0620>

The datastructures.LinkHeader class is also useful for generating link headers.

>>> from iefparse import datastructures
>>> next_page = datastructures.LinkHeader(
...    '/search?q=...&page=4&page-size=100',
...    [('rel', 'next')])
>>> str(next_page)
'</search?q=...&page=4&page-size=100>; rel="next"'

Single link values are pretty simple. They get more complicated with the addition of properties. The LinkHeader instance knows how to correctly format property values contain various problematic characters so that you do not need to be an expert.

The Link header is one of the many headers supported by this library. See the API Documentation for a complete (and up tp date) list.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ietfparse-2.0.0a1.tar.gz (153.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ietfparse-2.0.0a1-py3-none-any.whl (47.8 kB view details)

Uploaded Python 3

File details

Details for the file ietfparse-2.0.0a1.tar.gz.

File metadata

  • Download URL: ietfparse-2.0.0a1.tar.gz
  • Upload date:
  • Size: 153.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ietfparse-2.0.0a1.tar.gz
Algorithm Hash digest
SHA256 bfe3394b4c427f5d56979efdf3cb7d81dad28adff692ef19ff16911e56f2a3c0
MD5 bf8f0893e7005b78d1ebaa5672a08d1a
BLAKE2b-256 c9c9fb58e282e17d582c9f975ec8c50e124451bf5434666c8fabd97058d41b10

See more details on using hashes here.

Provenance

The following attestation bundles were made for ietfparse-2.0.0a1.tar.gz:

Publisher: publish.yml on dave-shawley/ietfparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ietfparse-2.0.0a1-py3-none-any.whl.

File metadata

  • Download URL: ietfparse-2.0.0a1-py3-none-any.whl
  • Upload date:
  • Size: 47.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ietfparse-2.0.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 addce0179d9d834b3e095389539cea6d0665981b2ae881a129e057e0e1a38f33
MD5 d0b74d4c411497c9b3ffbf20a45e5d99
BLAKE2b-256 dcbeb082acd6456c884adf58dc28a7afb4333ee55f0904a682aef3be6a1f60a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for ietfparse-2.0.0a1-py3-none-any.whl:

Publisher: publish.yml on dave-shawley/ietfparse

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page