A fluent, pipeline-based interface for querying HTML/XML with BeautifulSoup.

These details have not been verified by PyPI

Project links

Homepage

Project description

chainsoup

chainsoup provides a fluent, pipeline-based interface for querying HTML and XML documents with BeautifulSoup, turning complex nested searches into clean, readable, and chainable method calls.

The Problem

Working with BeautifulSoup is great, but navigating deeply nested structures can lead to verbose and hard-to-read code:

# Standard BeautifulSoup
try:
    doc = soup.find('div', class_='document')
    wrapper = doc.find('div', class_='documentwrapper')
    body_wrapper = wrapper.find('div', class_='bodywrapper')
    body = body_wrapper.find('div', class_='body')
    section = body.find('section', recursive=False)
    p_tag = section.find_all('p', recursive=False)[0]
    print(p_tag.text)
except AttributeError:
    print("One of the tags was not found.")

This pattern is repetitive, and the error handling can obscure the main logic.

The Solution: A Fluent Pipeline

chainsoup elegantly solves this by introducing a Pipeline that lets you chain find operations. The same query becomes:

from chainsoup import Pipeline

# With chainsoup
pipeline = Pipeline().find_tag('div', class_='document') \
                     .find_tag('div', class_='documentwrapper') \
                     .find_tag('div', class_='bodywrapper') \
                     .find_tag('div', class_='body') \
                     .find_tag('section', recursive=False) \
                     .find_all_tags('p', recursive=False)[0]

# Execute the pipeline and get the result
first_p = pipeline.raise_for_error.run(soup)
print(first_p.text)

from chainsoup import Pipeline, NestedArg, SpecalArg

# With chainsoup
pipeline = Pipeline().find_nested_tag(
    name = NestedArg() >> 'div' >> 'div' >> 'div' >> 'div' >> 'section',
    class_ = NestedArg() >> 'document' >> 'documentwrapper' >> 'bodywrapper' >> 'body',
    recursive = NestedArg() >> True >> True >> True >> True >> False >> SpecalArg.EXPANDLAST
).find_all_tags('p', recursive=False)[0]

# Execute the pipeline and get the result
first_p = pipeline.raise_for_error.run(soup)
print(first_p.text)

Features

Fluent Chaining: Link find_tag and find_all_tags calls in a natural, readable sequence.
Powerful Nested Searches: Use find_nested_tag with NestedArg to perform complex deep searches with a single method call.
Sequence Operations: After a find_all_tags call, you can filter, map, and perform assertions on the sequence of results.
Robust Error Handling: Choose your style: either get a descriptive Error object back or have an exception raised automatically on failure.
Intelligent Argument Resolution: Automatically handle varying arguments for each level of a nested search.

Installation

pip install chainsoup

Quickstart

1. Basic Find

Create a Pipeline and chain find_tag calls to navigate to a specific element.

from bs4 import BeautifulSoup
from chainsoup import Pipeline

html = '''
<body>
  <div id="content">
    <h1>Title</h1>
    <p>First paragraph.</p>
    <p>Second paragraph.</p>
  </div>
</body>
'''
soup = BeautifulSoup(html, 'html.parser')

# Build the pipeline
pipeline = Pipeline().find_tag('body').find_tag('div', id='content').find_tag('p')

# Execute it and raise an exception if any tag is not found
first_p = pipeline.raise_for_error.run(soup)
print(first_p.text)
# Output: First paragraph.

# Alternatively, execute without raising an error
result = pipeline.run(soup)
if not result:
    print(f"Pipeline failed: {result.msg}")
else:
    print(result.text)

2. Finding All Tags and Filtering

Use find_all_tags to get a sequence of results. This returns a PipelineSequence object, which you can use to filter, map, or select items.

# Continues from the previous example...

# Find all <p> tags inside the div
p_sequence = Pipeline().find_tag('div', id='content').find_all_tags('p')

# Select the second paragraph (index 1)
second_p_pipeline = p_sequence[1]
print(second_p_pipeline.raise_for_error.run(soup).text)
# Output: Second paragraph.

# Or use .first / .last properties
first_p_pipeline = p_sequence.first
print(first_p_pipeline.raise_for_error.run(soup).text)
# Output: First paragraph.

# Filter the sequence
contains_second = lambda tag: "Second" in tag.text
filtered_sequence = p_sequence.filter(contains_second)

# This will now find the first (and only) tag that matches the filter
result = filtered_sequence.first.raise_for_error.run(soup)
print(result.text)
# Output: Second paragraph.

Advanced Usage: `find_nested_tag`

The find_nested_tag method is the most powerful feature of chainsoup. It allows you to define an entire path of find operations in a single, declarative call using NestedArg.

`NestedArg`

An NestedArg is a fluent builder for creating a list of arguments, one for each level of the search. You can chain values using the >> operator or the .add() method.

Example

Let's revisit the complex example from the introduction.

from chainsoup import Pipeline, NestedArg, SpecalArg

# ... setup soup ...

pipeline = Pipeline().find_nested_tag(
    # For each level of the search, specify the tag 'name'
    name = NestedArg() >> 'body' >> 'div' >> 'div' >> 'div' >> 'div',

    # Specify attributes for each level. The lists are matched by index.
    attrs={
        'class': NestedArg() >> None >> 'document' >> 'documentwrapper' >> 'bodywrapper' >> 'body'
    },
    
    # Specify the `recursive` flag. Here, we use a Special Argument.
    # It will be True, then False, and EXPANDLAST will repeat `False` for the rest.
    recursive = NestedArg() >> True >> False >> SpecalArg.EXPANDLAST

).find_all_tags(
    name='section',
    recursive=False
).first.find_all_tags(
    name='p',
    recursive=False
)

# Create two branches of the pipeline to get the first and second <p> tags
first_p_pipeline = pipeline[0]
second_p_pipeline = pipeline[1]

# Execute both
print(first_p_pipeline.raise_for_error.run(soup).text)
print(second_p_pipeline.raise_for_error.run(soup).text)

`SpecalArg` Enum

When argument lists have different lengths, SpecalArg controls how the shorter lists are padded to match the longest one.

SpecalArg.EXPANDLAST: Repeats the last provided value.
SpecalArg.FILLNONE: Fills with None (the default).
SpecalArg.FILLTRUE: Fills with True.
SpecalArg.FILLFALSE: Fills with False.

API Overview

Pipeline: The main object for building a query that results in a single Tag.
- .find_tag(...): Appends a find operation.
- .find_nested_tag(...): Appends a series of find operations.
- .find_all_tags(...): Transitions the query into a PipelineSequence.
- .run(soup): Executes the pipeline and returns a Tag or Error object.
- .run_and_raise_for_error(soup): Executes and raises an Error on failure.
PipelineSequence: An object for building a query that results in a sequence of Tags.
- .filter(fn): Filters the sequence.
- .map(fn): Applies a function to each tag in the sequence.
- .assert_all(fn): Asserts a condition for all tags.
- .first, .last, [index]: Selects a single element, returning control to a Pipeline.
NestedArg: A helper class to build argument lists for find_nested_tag.

Contributing

Contributions are welcome! If you have a feature request, find a bug, or want to improve the documentation, please open an issue or submit a pull request on our GitHub repository.

License

This project is licensed under the MIT License.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.8

Aug 19, 2025

0.1.7

Aug 18, 2025

0.1.6

Jul 17, 2025

0.1.5

Jul 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chainsoup-0.1.8.tar.gz (16.5 kB view details)

Uploaded Aug 19, 2025 Source

File details

Details for the file chainsoup-0.1.8.tar.gz.

File metadata

Download URL: chainsoup-0.1.8.tar.gz
Upload date: Aug 19, 2025
Size: 16.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.9.23

File hashes

Hashes for chainsoup-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`af0523fcc90c199218f5fd171ea53168dc77c2517a826598b524a2541a673304`
MD5	`237929da9c5ad0982c7a1505078bad26`
BLAKE2b-256	`c97e83829a576449b9f991b1ebb3334fcdf06a4be0f2c0aeeb5dced1c3ad7d1f`

See more details on using hashes here.

chainsoup 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chainsoup

The Problem

The Solution: A Fluent Pipeline

Features

Installation

Quickstart

1. Basic Find

2. Finding All Tags and Filtering

Advanced Usage: `find_nested_tag`

`NestedArg`

Example

`SpecalArg` Enum

API Overview

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

chainsoup 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chainsoup

The Problem

The Solution: A Fluent Pipeline

Features

Installation

Quickstart

1. Basic Find

2. Finding All Tags and Filtering

Advanced Usage: find_nested_tag

NestedArg

Example

SpecalArg Enum

API Overview

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes

Advanced Usage: `find_nested_tag`

`NestedArg`

`SpecalArg` Enum