SnakyScraper is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.
Project description
🐍 SnakyScraper
SnakyScraper is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.
Fast. Accurate. Snake-style scraping. 🐍🎯
🚀 Features
- ✅ Extract metadata: title, description, keywords, author, and more
- ✅ Built-in support for Open Graph, Twitter Card, canonical, and CSRF tags
- ✅ Extract HTML structures:
h1–h6,p,ul,ol,img, links - ✅ Powerful
filter()method with class, ID, and tag-based selectors - ✅
return_htmltoggle to return clean text or raw HTML - ✅ Simple return values: string, list, or dictionary
- ✅ Powered by BeautifulSoup4 and Requests
📦 Installation
pip install snakyscraper
Requires Python 3.7 or later
🛠️ Basic Usage
from snakyscraper import SnakyScraper
scraper = SnakyScraper("https://example.com")
# Get the page title
print(scraper.title()) # "Welcome to Example.com"
# Get meta description
print(scraper.description()) # "This is the example meta description."
# Get all <h1> elements
print(scraper.h1()) # ["Welcome", "Latest News"]
# Extract Open Graph metadata
print(scraper.open_graph()) # {"og:title": "...", "og:description": "...", ...}
# Custom filter: find all div.card elements and extract child tags
print(scraper.filter(
element="div",
attributes={"class": "card"},
multiple=True,
extract=["h1", "p", ".title", "#desc"]
))
🧪 Available Methods
🔹 Page Metadata
scraper.title()
scraper.description()
scraper.keywords()
scraper.keyword_string()
scraper.charset()
scraper.canonical()
scraper.content_type()
scraper.author()
scraper.csrf_token()
scraper.image()
🔹 Open Graph & Twitter Card
scraper.open_graph()
scraper.open_graph("og:title")
scraper.twitter_card()
scraper.twitter_card("twitter:title")
🔹 Headings & Text
scraper.h1()
scraper.h2()
scraper.h3()
scraper.h4()
scraper.h5()
scraper.h6()
scraper.p()
🔹 Lists
scraper.ul()
scraper.ol()
🔹 Images
scraper.images()
scraper.image_details()
🔹 Links
scraper.links()
scraper.link_details()
🔍 Custom DOM Filtering
Use filter() to target specific DOM elements and extract nested content.
▸ Single element
scraper.filter(
element="div",
attributes={"id": "main"},
multiple=False,
extract=[".title", "#description", "p"]
)
▸ Multiple elements
scraper.filter(
element="div",
attributes={"class": "card"},
multiple=True,
extract=["h1", ".subtitle", "#meta"]
)
The
extractargument accepts tag names, class selectors (e.g.,.title), or ID selectors (e.g.,#meta).
Output keys are automatically normalized:
.title→class__title,#meta→id__meta
▸ Clean Text Output
You can also disable raw HTML output:
scraper.filter(
element="p",
attributes={"class": "dark-text"},
multiple=True,
return_html=False
)
📦 Output Example
scraper.title()
# "Welcome to Example.com"
scraper.h1()
# ["Main Heading", "Another Title"]
scraper.open_graph("og:title")
# "Example OG Title"
🤝 Contributing
Contributions are welcome!
Found a bug or want to request a feature? Please open an issue or submit a pull request.
📄 License
MIT License © 2025 — SnakyScraper
🔗 Related Projects
💡 Why SnakyScraper?
Think of it as your Pythonic sniper — targeting HTML content with precision and elegance.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snakyscraper-1.0.0.tar.gz.
File metadata
- Download URL: snakyscraper-1.0.0.tar.gz
- Upload date:
- Size: 6.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
368dab92aff789b48fdf2b28d0e883b42b0df9261ddfd68a97b19d302875747b
|
|
| MD5 |
9fdb11c1c4ac4e743470fba280798c3b
|
|
| BLAKE2b-256 |
0b58b404a78e02290ad3cb5eb8467954a8f93c930226e6d6d5f6dc7e7d844ca4
|
File details
Details for the file snakyscraper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: snakyscraper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
564eeccaf88a83803526c0fcb4c35d398b70dfb995eec27c70eb47a3b3d87bca
|
|
| MD5 |
5ad0873fd51fded67bc4a2d57bd88a82
|
|
| BLAKE2b-256 |
b51a2004ba337ba6e958462e2d873f7335bdeb6dacaf7432a1431e31a7a54bd8
|