multipart

Parser for multipart/form-data

These details have not been verified by PyPI

Project links

Project description

This module provides multiple parsers for RFC-7578 multipart/form-data, both low-level for framework authors and high-level for WSGI application developers:

PushMultipartParser: A low-level incremental SansIO (non-blocking) parser suitable for asyncio and other time or memory constrained environments.
MultipartParser: A streaming parser emitting memory- and disk-buffered MultipartPart instances.
parse_form_data: A helper function to parse both multipart/form-data and application/x-www-form-urlencoded form submissions from a WSGI environment.

Installation

pip install multipart

Features

Pure python single file module with no dependencies.
100% test coverage. Tested with inputs as seen from actual browsers and HTTP clients.
Parses multiple GB/s on modern hardware (quick tests, no proper benchmark).
Quickly rejects malicious or broken inputs and emits useful error messages.
Enforces configurable memory and disk resource limits to prevent DoS attacks.

Limitations: This parser implements multipart/form-data as it is used by actual modern browsers and HTTP clients, which means:

Just multipart/form-data, not suitable for email parsing.
No multipart/mixed support (deprecated in RFC 7578).
No base64 or quoted-printable transfer encoding (deprecated in RFC 7578).
No encoded-word or name=_charset_ encoding markers (discouraged in RFC 7578).
No support for clearly broken input (e.g. invalid line breaks or header names).

Usage and examples

For WSGI application developers we strongly suggest using the parse_form_data helper function. It accepts a WSGI environ dictionary and parses both types of form submission (multipart/form-data and application/x-www-form-urlencoded) based on the actual content type of the request. You’ll get two MultiDict instances in return, one for text fields and the other for file uploads:

from multipart import parse_form_data

def wsgi(environ, start_response):
  if environ["REQUEST_METHOD"] == "POST":
    forms, files = parse_form_data(environ)

    title = forms["title"]    # string
    upload = files["upload"]  # MultipartPart
    upload.save_as(...)

The parse_form_data helper function internally uses MultipartParser, a streaming parser that reads from a multipart/form-data encoded binary data stream and emits MultipartPart instances as soon as a part is fully parsed. This is most useful if you want to consume the individual parts as soon as they arrive, instead of waiting for the entire request to be parsed:

from multipart import parse_options_header, MultipartParser

def wsgi(environ, start_response):
  assert environ["REQUEST_METHOD"] == "POST"
  ctype, copts = parse_options_header(environ.get("CONTENT_TYPE", ""))
  boundary = copts.get("boundary")
  charset = copts.get("charset", "utf8")
  assert ctype == "multipart/form-data"

  parser = MultipartParser(environ["wsgi.input"], boundary, charset)
  for part in parser:
    if part.filename:
      print(f"{part.name}: File upload ({part.size} bytes)")
      part.save_as(...)
    elif part.size < 1024:
      print(f"{part.name}: Text field ({part.value!r})")
    else:
      print(f"{part.name}: Test field, but too big to print :/")

The MultipartParser handles IO and file buffering for you, but does so using blocking APIs. If you need absolute control over the parsing process and want to avoid blocking IO at all cost, then have a look at PushMultipartParser, the low-level non-blocking incremental multipart/form-data parser that powers all the other parsers in this library:

from multipart import PushMultipartParser, MultipartSegment

async def process_multipart(reader: asyncio.StreamReader, boundary: str):
  with PushMultipartParser(boundary) as parser:
    while not parser.closed:
      chunk = await reader.read(1024*64)
      for result in parser.parse(chunk):
        if isinstance(result, MultipartSegment):
          print(f"== Start of segment: {result.name}")
          for header, value in result.headerlist:
            print(f"{header}: {value}")
        elif result:  # Result is a non-empty bytearray
          print(f"[received {len(result)} bytes of data]")
        else:         # Result is None
          print(f"== End of segment")

Changelog

1.1
- Some of these fixes changed behavior to match documentation or specification, none of them should be a surprise. Existing apps should be able to upgrade without change.
- fix: Fail faster on input with invalid line breaks (#55)
- fix: Allow empty segment names (#56)
- fix: Avoid ResourceWarning when using parse_form_data (#57)
- fix: MultipartPart now always has a sensible content type.
- fix: Actually check parser state on context manager exit.
- fix: Honor Content-Length header, if present.
- perf: Reduce overhead for small segments (-21%)
- perf: Reduce write overhead for large uploads (-2%)
1.0
- A completely new, fast, non-blocking PushMultipartParser parser, which now serves as the basis for all other parsers.
- The new parser is stricter and rejects clearly broken input quicker, even in non-strict mode (e.g. invalid line breaks or header names). This should not affect data sent by actual browsers or HTTP clients.
- Default charset for MultipartParser headers and text fields changed to utf8, as recommended by W3C HTTP.
- Default disk and memory limits for MultipartParser increased, but multiple other limits added for finer control. Check if the the new defaults still fit your needs.
- Undocumented APIs deprecated or removed, some of which were not strictly private. This includes parameters for MultipartParser and some MultipartPart methods, but those should not be used by anyone but the parser itself.
0.2.5
- Don’t test semicolon separators in urlencoded data (#33)
- Add python-requires directive, indicating Python 3.5 or later is required and preventing older Pythons from attempting to download this version (#32)
- Add official support for Python 3.10-3.12 (#38, #48)
- Default value of copy_file should be 2 ** 16, not 2 * 16 (#41)
- Update URL for Bottle (#42)
0.2.4
- Consistently decode non-utf8 URL-encoded form-data
0.2.3
- Import MutableMapping from collections.abc (#23)
- Fix a few more ResourceWarnings in the test suite (#24)
- Allow stream to contain data before first boundary (#25)
0.2.2
- Fix #21 ResourceWarnings on Python 3
0.2.1
- Fix #20 empty payload
0.2
- Dropped support for Python versions below 3.6. Stay on 0.1 if you need Python 2.5+ support.
0.1
- First release

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Oct 3, 2024

1.0.0

Sep 20, 2024

0.2.5

Jun 18, 2024

0.2.4

Jan 27, 2021

0.2.3

Nov 20, 2020

0.2.2

Sep 4, 2020

0.2.1

Jun 13, 2020

0.2

May 19, 2019

0.1.1

Nov 20, 2020

0.1

Jun 21, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multipart-1.1.0.tar.gz (34.6 kB view details)

Uploaded Oct 3, 2024 Source

Built Distribution

multipart-1.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Oct 3, 2024 Python 3

File details

Details for the file multipart-1.1.0.tar.gz.

File metadata

Download URL: multipart-1.1.0.tar.gz
Upload date: Oct 3, 2024
Size: 34.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for multipart-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ee32683f5c454740cd9139e1d6057053823da0729c426f156464f81111529ba1`
MD5	`d9832b0baa5b4f9083fdff7bac64a45a`
BLAKE2b-256	`eefc03c4a1db15b4365cddb7f18285267b599744a048f8e1a98759cf677e33f0`

See more details on using hashes here.

File details

Details for the file multipart-1.1.0-py3-none-any.whl.

File metadata

Download URL: multipart-1.1.0-py3-none-any.whl
Upload date: Oct 3, 2024
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for multipart-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5a784677de8b49e6409e730dfe018f73c5d7aef360e44750e00f67d669b51e91`
MD5	`cab1a28ca4271ee53d4ea46c99b90801`
BLAKE2b-256	`edc482f2eef01dde7e142776203706c3b7a221656975bff61965207dcbc0c88d`