Skip to main content

Universal content extraction library with tiered fetching strategies and anti-bot bypass

Project description

OmniFetch Python Library

Python implementation of OmniFetch - a universal content extraction library.

Features

  • Universal Extraction: Fetches content from any URL, handling standard sites, SPAs, and paywalls.
  • Tiered System:
    1. Light Fetch: Fast, standard HTTP request.
    2. Headless Browser: Handles dynamic JS-heavy sites (requires Netlify endpoint).
    3. Search Fallback: Finds alternative sources for paywalled or blocked content.
  • Smart Parsing: Converts HTML to clean Markdown or JSON.

Installation

pip install omnifetch-lib

Quick Start

from omnifetch import omni_fetch

# Text extraction (Markdown)
result = omni_fetch('https://example.com', mode='TEXT')
print(result.content)

# JSON extraction (Structured Data)
json_result = omni_fetch('https://example.com', mode='JSON')
print(json_result.content['title'])

Configuration

def omni_fetch(
    url: str,
    mode: str = 'TEXT',           # 'JSON' for structured, 'TEXT' for markdown
    timeout: int = 30,            # Request timeout in seconds
    netlify_endpoint: str = None, # Headless browser endpoint (Tier 2)
    headers: dict = None,         # Custom headers
    skip_headless: bool = False,  # Skip Tier 2
    skip_search: bool = False,    # Skip Tier 3
    force_title: str = None       # Override title for search fallback
) -> OmniFetchResult

Advanced Usage

Handling Blocked Domains (e.g., X/Twitter)

Some domains block direct scraping. OmniFetch automatically handles this by falling back to search (Tier 3). For opaque URLs, you can provide a force_title to improve search results.

result = omni_fetch(
    'https://x.com/someuser/status/12345',
    mode='TEXT',
    force_title='Specific Tweet Content Title' # Helps find the content via search
)

Headless Browser Support

To enable Tier 2 (Headless Browser) for dynamic sites, you need to deploy the provided Netlify function and pass the endpoint.

result = omni_fetch(
    'https://dynamic-site.com',
    netlify_endpoint='https://your-site.netlify.app/.netlify/functions/headless-fetch'
)

Development Installation

pip install -e .

Running Tests

pip install -e ".[dev]"
pytest

See the main README.md for full documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

omnifetch_lib-1.3.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

omnifetch_lib-1.3.0-py3-none-any.whl (21.6 kB view details)

Uploaded Python 3

File details

Details for the file omnifetch_lib-1.3.0.tar.gz.

File metadata

  • Download URL: omnifetch_lib-1.3.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnifetch_lib-1.3.0.tar.gz
Algorithm Hash digest
SHA256 20971427b7646161627dcd16336b7f7b970112d1231471dc1754494208424a84
MD5 fcaef40e07e79e331d43be26c68ca0d1
BLAKE2b-256 8a54032efa2d11c2d23b94cb035e378ef936a3cdc0cc8f418d9284d17b47cf1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnifetch_lib-1.3.0.tar.gz:

Publisher: publish-python.yml on visy-ani/omni-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file omnifetch_lib-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: omnifetch_lib-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 21.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for omnifetch_lib-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a0bbd2ad715b511266485a382a03917bba82ed51682cf03c487c4743d68c3a1a
MD5 055b77d66106a36a8541ca98dc6fcdce
BLAKE2b-256 df439aad58d9e4c429bf135c3f3ae00f3a43ed2439180b67854dc7a865496cbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for omnifetch_lib-1.3.0-py3-none-any.whl:

Publisher: publish-python.yml on visy-ani/omni-fetch

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page