Skip to main content

SnakyScraper is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.

Project description

🐍 SnakyScraper

SnakyScraper is a lightweight and Pythonic web scraping toolkit built on top of BeautifulSoup and Requests. It provides an elegant interface for extracting structured HTML and metadata from websites with clean, direct outputs.

Fast. Accurate. Snake-style scraping. 🐍🎯


🚀 Features

  • ✅ Extract metadata: title, description, keywords, author, and more
  • ✅ Built-in support for Open Graph, Twitter Card, canonical, and CSRF tags
  • ✅ Extract HTML structures: h1h6, p, ul, ol, img, links
  • ✅ Powerful filter() method with class, ID, and tag-based selectors
  • return_html toggle to return clean text or raw HTML
  • ✅ Simple return values: string, list, or dictionary
  • ✅ Powered by BeautifulSoup4 and Requests

📦 Installation

pip install snakyscraper

Requires Python 3.7 or later


🛠️ Basic Usage

from snakyscraper import SnakyScraper

scraper = SnakyScraper("https://example.com")

# Get the page title
print(scraper.title())  # "Welcome to Example.com"

# Get meta description
print(scraper.description())  # "This is the example meta description."

# Get all <h1> elements
print(scraper.h1())  # ["Welcome", "Latest News"]

# Extract Open Graph metadata
print(scraper.open_graph())  # {"og:title": "...", "og:description": "...", ...}

# Custom filter: find all div.card elements and extract child tags
print(scraper.filter(
    element="div",
    attributes={"class": "card"},
    multiple=True,
    extract=["h1", "p", ".title", "#desc"]
))

🧪 Available Methods

🔹 Page Metadata

scraper.title()
scraper.description()
scraper.keywords()
scraper.keyword_string()
scraper.charset()
scraper.canonical()
scraper.content_type()
scraper.author()
scraper.csrf_token()
scraper.image()

🔹 Open Graph & Twitter Card

scraper.open_graph()
scraper.open_graph("og:title")

scraper.twitter_card()
scraper.twitter_card("twitter:title")

🔹 Headings & Text

scraper.h1()
scraper.h2()
scraper.h3()
scraper.h4()
scraper.h5()
scraper.h6()
scraper.p()

🔹 Lists

scraper.ul()
scraper.ol()

🔹 Images

scraper.images()
scraper.image_details()

🔹 Links

scraper.links()
scraper.link_details()

🔍 Custom DOM Filtering

Use filter() to target specific DOM elements and extract nested content.

▸ Single element

scraper.filter(
    element="div",
    attributes={"id": "main"},
    multiple=False,
    extract=[".title", "#description", "p"]
)

▸ Multiple elements

scraper.filter(
    element="div",
    attributes={"class": "card"},
    multiple=True,
    extract=["h1", ".subtitle", "#meta"]
)

The extract argument accepts tag names, class selectors (e.g., .title), or ID selectors (e.g., #meta).
Output keys are automatically normalized:
.titleclass__title, #metaid__meta

▸ Clean Text Output

You can also disable raw HTML output:

scraper.filter(
    element="p",
    attributes={"class": "dark-text"},
    multiple=True,
    return_html=False
)

📦 Output Example

scraper.title()
# "Welcome to Example.com"

scraper.h1()
# ["Main Heading", "Another Title"]

scraper.open_graph("og:title")
# "Example OG Title"

🤝 Contributing

Contributions are welcome!
Found a bug or want to request a feature? Please open an issue or submit a pull request.


📄 License

MIT License © 2025 — SnakyScraper


🔗 Related Projects


💡 Why SnakyScraper?

Think of it as your Pythonic sniper — targeting HTML content with precision and elegance.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snakyscraper-1.0.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snakyscraper-1.0.0-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file snakyscraper-1.0.0.tar.gz.

File metadata

  • Download URL: snakyscraper-1.0.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.6

File hashes

Hashes for snakyscraper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 368dab92aff789b48fdf2b28d0e883b42b0df9261ddfd68a97b19d302875747b
MD5 9fdb11c1c4ac4e743470fba280798c3b
BLAKE2b-256 0b58b404a78e02290ad3cb5eb8467954a8f93c930226e6d6d5f6dc7e7d844ca4

See more details on using hashes here.

File details

Details for the file snakyscraper-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: snakyscraper-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.6

File hashes

Hashes for snakyscraper-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 564eeccaf88a83803526c0fcb4c35d398b70dfb995eec27c70eb47a3b3d87bca
MD5 5ad0873fd51fded67bc4a2d57bd88a82
BLAKE2b-256 b51a2004ba337ba6e958462e2d873f7335bdeb6dacaf7432a1431e31a7a54bd8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page