Skip to main content

Universal content extraction library with tiered fetching strategies and anti-bot bypass

Project description

OmniFetch Python Library

Python implementation of OmniFetch - a universal content extraction library.

Features

  • Universal Extraction: Fetches content from any URL, handling standard sites, SPAs, and paywalls.
  • Tiered System:
    1. Light Fetch: Fast, standard HTTP request.
    2. Headless Browser: Handles dynamic JS-heavy sites (requires Netlify endpoint).
    3. Search Fallback: Finds alternative sources for paywalled or blocked content.
  • Smart Parsing: Converts HTML to clean Markdown or JSON.

Installation

pip install omnifetch-lib

Quick Start

from omnifetch import omni_fetch

# Text extraction (Markdown)
result = omni_fetch('https://example.com', mode='TEXT')
print(result.content)

# JSON extraction (Structured Data)
json_result = omni_fetch('https://example.com', mode='JSON')
print(json_result.content['title'])

Configuration

def omni_fetch(
    url: str,
    mode: str = 'TEXT',           # 'JSON' for structured, 'TEXT' for markdown
    timeout: int = 30,            # Request timeout in seconds
    netlify_endpoint: str = None, # Headless browser endpoint (Tier 2)
    headers: dict = None,         # Custom headers
    skip_headless: bool = False,  # Skip Tier 2
    skip_search: bool = False,    # Skip Tier 3
    force_title: str = None       # Override title for search fallback
) -> OmniFetchResult

Advanced Usage

Handling Blocked Domains (e.g., X/Twitter)

Some domains block direct scraping. OmniFetch automatically handles this by falling back to search (Tier 3). For opaque URLs, you can provide a force_title to improve search results.

result = omni_fetch(
    'https://x.com/someuser/status/12345',
    mode='TEXT',
    force_title='Specific Tweet Content Title' # Helps find the content via search
)

Headless Browser Support

To enable Tier 2 (Headless Browser) for dynamic sites, you need to deploy the provided Netlify function and pass the endpoint.

result = omni_fetch(
    'https://dynamic-site.com',
    netlify_endpoint='https://your-site.netlify.app/.netlify/functions/headless-fetch'
)

Development Installation

pip install -e .

Running Tests

pip install -e ".[dev]"
pytest

See the main README.md for full documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnifetch_lib-1.2.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnifetch_lib-1.2.0-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file omnifetch_lib-1.2.0.tar.gz.

File metadata

  • Download URL: omnifetch_lib-1.2.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnifetch_lib-1.2.0.tar.gz
Algorithm Hash digest
SHA256 f523842c2a77e068bec55a3da5fb82b38ab0fbe129eacd21f9517e0b55cabb6c
MD5 bd022e3a4a5cc901e2736dbea658f05d
BLAKE2b-256 df2f3856d479eb4ce37dbcb612068afaa729f2fa7e89dbe2dac2f86c6d1f0737

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnifetch_lib-1.2.0.tar.gz:

Publisher: publish-python.yml on visy-ani/omni-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omnifetch_lib-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: omnifetch_lib-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 17.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnifetch_lib-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7864bd1aa1e5fdd6a798314e393d225851b4621ba0cdc3f2012568ace653f746
MD5 7ded18245fabcaf8ae60687e197dda48
BLAKE2b-256 eb7a63d8d926d3ee547921a769e997afdbe44da186ebca8163a3a49bdd5250cf

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnifetch_lib-1.2.0-py3-none-any.whl:

Publisher: publish-python.yml on visy-ani/omni-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page