Skip to main content

Web scraping (URLs, images), PubMed search, URL summarization helpers — standalone module from the SciTeX ecosystem

Project description

scitex-web

Web scraping + PubMed search + URL summarization helpers, extracted from the SciTeX ecosystem as a standalone package.

Install

pip install scitex-web
pip install "scitex-web[readability]"   # readability-lxml for cleaner extraction

API

import scitex_web as web

# Scraping
web.get_urls(url, pattern=r"\.pdf$")
web.get_image_urls(url, min_size=128)
web.download_images(url, out_dir="imgs", same_domain=True)

# PubMed
web.search_pubmed("CRISPR Cas9 review", retmax=50)

# URL summarization (requires scitex.ai)
web.summarize_url("https://example.com/article")

Status

Standalone fork of scitex.web. Deps: requests / aiohttp / bs4 / tqdm. The umbrella package's scitex.web import path is preserved via a sys.modules-alias bridge.

Decoupling notes:

  • scitex.logging.getLogger → stdlib logging.getLogger.
  • scitex.str.printc (colored print) → tiny inline ANSI helper.
  • scitex.ai.GenAI (used by summarize_url) → deferred import that raises a clear ImportError if the umbrella scitex package isn't installed.

14/23 tests pass (7 pre-existing upstream failures around bs4 mocking that fail in scitex-python too — unrelated to extraction; 2 skipped).

License

AGPL-3.0-only (see LICENSE).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_web-0.1.1.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_web-0.1.1-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file scitex_web-0.1.1.tar.gz.

File metadata

  • Download URL: scitex_web-0.1.1.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for scitex_web-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e7c0a56fbc61082415c125c44295814ee525abbf3f055c4ffbeb1b152a0c22a9
MD5 0e36bf5d57a8c795fa42005a04ee6f37
BLAKE2b-256 025b32ef2cc98c7461b2c8b9e8a0905e974f6911138850d5af23f82f90e75e22

See more details on using hashes here.

File details

Details for the file scitex_web-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: scitex_web-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for scitex_web-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 18ba5cdf4cddcc627018e419145ef01f6b49801b22cf6ef0de7f030529a90ef3
MD5 ebb9ecb56c47f091463ba5d7111c4660
BLAKE2b-256 360d1132d32355117731076bb1d0a56daa8496e931cb4d2fd8589d8c0ded5863

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page