Skip to main content

Package for advanced web scraping with BeautifulSoup

Project description

!pypi !python-versions

Testing

example workflow

Code Quality

Build

example workflow Linter: flake8

example workflow Code style: black

example workflow Checked with mypy

SoupSavvy

SoupSavvy is a library designed to make web scraping tasks more efficient and manageable. Automating web scraping can be a thankless and time-consuming job. SoupSavvy builds around BeautifulSoup library enabling developers to create more complex workflows and advanced searches with ease.

Key Features

  • Automated Web Scraping: SoupSavvy simplifies the process of web scraping by providing intuitive interfaces and tools for automating tasks.

  • Complex Workflows: With SoupSavvy, developers can create complex scraping workflows effortlessly, allowing for more intricate data extraction.

  • Advanced Searches: SoupSavvy extends BeautifulSoup's capabilities by offering advanced search options, enabling users to find and extract specific elements from HTML markup with precision.

  • Clear Type Hinting: The library offers clear and concise type hinting throughout its API, enhancing code readability and maintainability.

  • Productionalize Scraping Code: By providing structured workflows and error handling mechanisms, SoupSavvy facilitates the productionalization of scraping code, making it easier to integrate into larger projects and pipelines.

Getting Started

Installation

SoupSavvy is published on PyPi and latest stable package version can be installed via pip, simply using the following command:

pip install soupsavvy
from soupsavvy import ElementTag, AttributeTag, PatternElementTag
from bs4 import BeautifulSoup

text = """
    <div href="github">
        <a class="github/settings", href="github.com"></a>
        <a id="github pages"></a>
        <a href="github "></a>
    </div>
"""
markup = BeautifulSoup(text)
tag = ElementTag(
    tag="a",
    attributes=[
        AttributeTag(name="href", value="github", re=True),
        AttributeTag(name="class", value="settings")
    ]
)
tag.find(markup)
tag.find_all(markup)

Contributing

If you'd like to contribute to SoupSavvy, feel free to check out the GitHub repository and submit pull requests. Any feedback, bug reports, or feature requests are welcome!

License

SoupSavvy is licensed under the MIT License, allowing for both personal and commercial use. See the LICENSE file for more information.

Acknowledgements

SoupSavvy is built upon the foundation of excellent BeautifulSoup. We extend our gratitude to the developers and contributors of this projects for their invaluable contributions to the Python community and making our life a lot easier!


Make your soup even more beautiful and savvier! Happy scraping! 🍲✨

from soup to nuts soup sandwich be duck soup

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soupsorcery-0.1.5.tar.gz (12.7 kB view hashes)

Uploaded Source

Built Distribution

soupsorcery-0.1.5-py3-none-any.whl (12.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page