Skip to main content

Web scraping (URLs, images), PubMed search, URL summarization helpers — standalone module from the SciTeX ecosystem

Project description

scitex-web

PyPI Python Tests Install Test Coverage Docs License: AGPL v3

Web scraping + PubMed search + URL summarization helpers, extracted from the SciTeX ecosystem as a standalone package.

Install

pip install scitex-web
pip install "scitex-web[readability]"   # readability-lxml for cleaner extraction

API

import scitex_web as web

# Scraping
web.get_urls(url, pattern=r"\.pdf$")
web.get_image_urls(url, min_size=128)
web.download_images(url, out_dir="imgs", same_domain=True)

# PubMed
web.search_pubmed("CRISPR Cas9 review", retmax=50)

# URL summarization (requires scitex.ai)
web.summarize_url("https://example.com/article")

Status

Standalone fork of scitex.web. Deps: requests / aiohttp / bs4 / tqdm. The umbrella package's scitex.web import path is preserved via a sys.modules-alias bridge.

Decoupling notes:

  • scitex.logging.getLogger → stdlib logging.getLogger.
  • scitex.str.printc (colored print) → tiny inline ANSI helper.
  • scitex.ai.GenAI (used by summarize_url) → deferred import that raises a clear ImportError if the umbrella scitex package isn't installed.

14/23 tests pass (7 pre-existing upstream failures around bs4 mocking that fail in scitex-python too — unrelated to extraction; 2 skipped).

License

AGPL-3.0-only (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_web-0.1.3.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_web-0.1.3-py3-none-any.whl (27.9 kB view details)

Uploaded Python 3

File details

Details for the file scitex_web-0.1.3.tar.gz.

File metadata

  • Download URL: scitex_web-0.1.3.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_web-0.1.3.tar.gz
Algorithm Hash digest
SHA256 192ad1199558d0eef226680b353695bbefb3f739f100238fc74e0e9d27d10847
MD5 80bf284023b14279f60c6bc8bf4463fc
BLAKE2b-256 e962ebf029c2a56c6e9de23462a60a5b04f8b4cff8fddd6da5351e13d34aff55

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_web-0.1.3.tar.gz:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scitex_web-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: scitex_web-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 27.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_web-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 004030bc99d4b238b7ddebf14fc36a34938aed6d53df186be9d7bf4f955b0857
MD5 b2822c1640c26bd5dc91c04c18a41329
BLAKE2b-256 9757a7dce5baa46314fd70645185e8b17c28143cf8555e1f9780d374c2844d31

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_web-0.1.3-py3-none-any.whl:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page