Skip to main content

Web scraping (URLs, images), PubMed search, URL summarization helpers — standalone module from the SciTeX ecosystem

Project description

scitex-web

PyPI Python Tests Install Test Coverage Docs License: AGPL v3

Web scraping + PubMed search + URL summarization helpers, extracted from the SciTeX ecosystem as a standalone package.

Install

pip install scitex-web
pip install "scitex-web[readability]"   # readability-lxml for cleaner extraction

API

import scitex_web as web

# Scraping
web.get_urls(url, pattern=r"\.pdf$")
web.get_image_urls(url, min_size=128)
web.download_images(url, out_dir="imgs", same_domain=True)

# PubMed
web.search_pubmed("CRISPR Cas9 review", retmax=50)

# URL summarization (requires scitex.ai)
web.summarize_url("https://example.com/article")

Status

Standalone fork of scitex.web. Deps: requests / aiohttp / bs4 / tqdm. The umbrella package's scitex.web import path is preserved via a sys.modules-alias bridge.

Decoupling notes:

  • scitex.logging.getLogger → stdlib logging.getLogger.
  • scitex.str.printc (colored print) → tiny inline ANSI helper.
  • scitex.ai.GenAI (used by summarize_url) → deferred import that raises a clear ImportError if the umbrella scitex package isn't installed.

14/23 tests pass (7 pre-existing upstream failures around bs4 mocking that fail in scitex-python too — unrelated to extraction; 2 skipped).

License

AGPL-3.0-only (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_web-0.1.2.tar.gz (27.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_web-0.1.2-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file scitex_web-0.1.2.tar.gz.

File metadata

  • Download URL: scitex_web-0.1.2.tar.gz
  • Upload date:
  • Size: 27.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_web-0.1.2.tar.gz
Algorithm Hash digest
SHA256 091005bb47db7995bab16df0b9481f5cb20732f378ba6c144a5824a73e617109
MD5 10d3b7cfc3ba9c5ec201b2893c87e9d7
BLAKE2b-256 081e4ff7da270a8bffebe37f6e820657d26835b68755f7536030dfd9e251cb8f

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_web-0.1.2.tar.gz:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scitex_web-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: scitex_web-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_web-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ad65685f51b405ba17349727220b6f901440ed47773ca2b933d664cf500c7d39
MD5 557f5681b35e071f6e30225884654be4
BLAKE2b-256 7064debd5acc563935777ffab97ed126a0adf80198afd0ea2488dddee7e4d794

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_web-0.1.2-py3-none-any.whl:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page