Package for advanced web scraping with BeautifulSoup
Project description
Testing
Code Quality
SoupSavvy
SoupSavvy is a library designed to make web scraping tasks more efficient and manageable. Automating web scraping can be a thankless and time-consuming job. SoupSavvy builds around BeautifulSoup library enabling developers to create more complex workflows and advanced searches with ease.
Key Features
-
Automated Web Scraping: SoupSavvy simplifies the process of web scraping by providing intuitive interfaces and tools for automating tasks.
-
Complex Workflows: With SoupSavvy, developers can create complex scraping workflows effortlessly, allowing for more intricate data extraction.
-
Advanced Searches: SoupSavvy extends BeautifulSoup's capabilities by offering advanced search options, enabling users to find and extract specific elements from HTML markup with precision.
-
Clear Type Hinting: The library offers clear and concise type hinting throughout its API, enhancing code readability and maintainability.
-
Productionalize Scraping Code: By providing structured workflows and error handling mechanisms, SoupSavvy facilitates the productionalization of scraping code, making it easier to integrate into larger projects and pipelines.
Getting Started
Installation
SoupSavvy is published on PyPi and latest stable package version can be installed via pip, simply using the following command:
pip install soupsavvy
from soupsavvy import ElementTag, AttributeTag, PatternElementTag
from bs4 import BeautifulSoup
text = """
<div href="github">
<a class="github/settings", href="github.com"></a>
<a id="github pages"></a>
<a href="github "></a>
</div>
"""
markup = BeautifulSoup(text)
tag = ElementTag(
tag="a",
attributes=[
AttributeTag(name="href", value="github", re=True),
AttributeTag(name="class", value="settings")
]
)
tag.find(markup)
tag.find_all(markup)
Contributing
If you'd like to contribute to SoupSavvy, feel free to check out the GitHub repository and submit pull requests. Any feedback, bug reports, or feature requests are welcome!
License
SoupSavvy is licensed under the MIT License, allowing for both personal and commercial use. See the LICENSE
file for more information.
Acknowledgements
SoupSavvy is built upon the foundation of excellent BeautifulSoup. We extend our gratitude to the developers and contributors of this projects for their invaluable contributions to the Python community and making our life a lot easier!
Make your soup even more beautiful and savvier! Happy scraping! 🍲✨
from soup to nuts soup sandwich be duck soup
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for soupsorcery-0.1.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2941b489bc28093f568094b460382f4bbeecfb0cc8a6ab08d84422394c607c90 |
|
MD5 | 710b219066df45ecb24eebb3ba590ea5 |
|
BLAKE2b-256 | bf09643bab1db2942312dce96f954811798d500001f45b4bdf66a585276e9864 |