Skip to main content

Web scraping (URLs, images), PubMed search, URL summarization helpers — standalone module from the SciTeX ecosystem

Project description

scitex-web

PyPI Python Tests Install Test Coverage Docs License: AGPL v3

SciTeX

Web scraping + PubMed search + URL summarization helpers.

Full Documentation · pip install scitex-web


Installation

pip install scitex-web
pip install "scitex-web[readability]"   # readability-lxml for cleaner extraction

Quick Start

import scitex_web as web

results = web.search_pubmed("CRISPR Cas9 review", retmax=5)
images = web.get_image_urls("https://example.com/gallery", min_size=128)

1 Interfaces

Python API
import scitex_web as web

# Scraping
web.get_urls(url, pattern=r"\.pdf$")
web.get_image_urls(url, min_size=128)
web.download_images(url, out_dir="imgs", same_domain=True)

# PubMed
web.search_pubmed("CRISPR Cas9 review", retmax=50)

# URL summarization (requires scitex.ai umbrella)
web.summarize_url("https://example.com/article")

Status

Standalone fork of scitex.web. Deps: requests / aiohttp / bs4 / tqdm. The umbrella package's scitex.web import path is preserved via a sys.modules-alias bridge.

Decoupling notes:

  • scitex.logging.getLogger → stdlib logging.getLogger.
  • scitex.str.printc (colored print) → tiny inline ANSI helper.
  • scitex.ai.GenAI (used by summarize_url) → deferred import that raises a clear ImportError if the umbrella scitex package isn't installed.

Part of SciTeX

scitex-web is part of SciTeX.

Four Freedoms for Research

  1. The freedom to run your research anywhere — your machine, your terms.
  2. The freedom to study how every step works — from raw data to final manuscript.
  3. The freedom to redistribute your workflows, not just your papers.
  4. The freedom to modify any module and share improvements with the community.

AGPL-3.0 — because we believe research infrastructure deserves the same freedoms as the software it runs on.

License

AGPL-3.0-only (see LICENSE).


SciTeX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scitex_web-0.1.4.tar.gz (27.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scitex_web-0.1.4-py3-none-any.whl (28.3 kB view details)

Uploaded Python 3

File details

Details for the file scitex_web-0.1.4.tar.gz.

File metadata

  • Download URL: scitex_web-0.1.4.tar.gz
  • Upload date:
  • Size: 27.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_web-0.1.4.tar.gz
Algorithm Hash digest
SHA256 34d539610ba8757513f1ad30bf0420662be46fd727330bee2a6b166597611697
MD5 fbe81d2c2937fbb07550e0b136c7fd12
BLAKE2b-256 e955313053aba606611d8627c8abd414cc897a9f3c39d829e6cc531dfe276665

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_web-0.1.4.tar.gz:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scitex_web-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: scitex_web-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 28.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scitex_web-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 138c1bc0af89566b26d8f1e7d131fb9e9b9cf56af0849fc4543f0e780235eff5
MD5 6609a270933c0c476a359b71d13c72af
BLAKE2b-256 63acc49afd1fc37327878aecfd816278cd207e4f45a8c8f264edf6209f7d715c

See more details on using hashes here.

Provenance

The following attestation bundles were made for scitex_web-0.1.4-py3-none-any.whl:

Publisher: publish-pypi.yml on ywatanabe1989/scitex-web

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page