Skip to main content

Lightweight Web Scraping Automation for Everyone

Project description

AutoScrap

Lightweight Web Scraping Automation for Everyone

Installation

After publishing to PyPI, install with:

pip install autoscrap

Or for development:

pip install -r requirements.txt

Features

  • Simple functions for web scraping:
    • get_text(url, tag): Fetches all text within a given HTML tag from a URL.
    • extract_table(url): Extracts the first HTML table from a URL as a list of lists or pandas DataFrame.
  • No need to learn BeautifulSoup or Selenium.

Usage (Python)

from autoscrap.core import get_text, extract_table

# Get all text inside <p> tags
paragraphs = get_text('https://example.com', 'p')
print(paragraphs)

# Extract the first table as a list of lists
rows = extract_table('https://example.com/table')
print(rows)

# Extract as pandas DataFrame (requires pandas)
df = extract_table('https://example.com/table', as_dataframe=True)
print(df)

Usage (Command Line)

# Extract all <p> tag text from a page
python -m autoscrap.cli get_text https://example.com p

# Extract the first table as plain text
python -m autoscrap.cli extract_table https://example.com

# Extract the first table as a pandas DataFrame (requires pandas)
python -m autoscrap.cli extract_table https://example.com --as-dataframe

Running Tests

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoscrap-0.1.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autoscrap-0.1.0-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file autoscrap-0.1.0.tar.gz.

File metadata

  • Download URL: autoscrap-0.1.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for autoscrap-0.1.0.tar.gz
Algorithm Hash digest
SHA256 61e024fe57f4723bc059be5fe2379d8469f87d7b5470f8dea4eaa1fefee06fda
MD5 937e7ffabae153d43f7f2e25e8d72607
BLAKE2b-256 574d21f38509f6690470b6af1fc5731bb2a0c997b65b5016ead3cb0e3d7b0283

See more details on using hashes here.

File details

Details for the file autoscrap-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: autoscrap-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for autoscrap-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f212577e56cfb7977b98d241f6a2994aeaf5f911342fcf2cb2808bc1c1e9f2c4
MD5 20d183da6c3d98346e57cddd50737a3c
BLAKE2b-256 04dfbf6f8f6d9e1bd63b5eeb617296147e3a59e126dc93e60ac79fc79d718ef6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page