xpath/css based scraper with pagination

These details have not been verified by PyPI

Project links

Project description

Hodor

A simple html scraper with xpath or css.

Install

pip install hodorlive

Usage

As python package

WARNING: This package by default doesn't verify ssl connections. Please check the arguments to enable them.

Sample code

from hodor import Hodor
from dateutil.parser import parse


def date_convert(data):
    return parse(data)

url = 'http://www.nasdaq.com/markets/stocks/symbol-change-history.aspx'

CONFIG = {
    'old_symbol': {
        'css': '#SymbolChangeList_table tr td:nth-child(1)',
        'many': True
    },
    'new_symbol': {
        'css': '#SymbolChangeList_table tr td:nth-child(2)',
        'many': True
    },
    'effective_date': {
        'css': '#SymbolChangeList_table tr td:nth-child(3)',
        'many': True,
        'transform': date_convert
    },
    '_groups': {
        'data': '__all__',
        'ticker_changes': ['old_symbol', 'new_symbol']
    },
    '_paginate_by': {
        'xpath': '//*[@id="two_column_main_content_lb_NextPage"]/@href',
        'many': False
    }
}

h = Hodor(url=url, config=CONFIG, pagination_max_limit=5)

h.data

Sample output

{'data': [{'effective_date': datetime.datetime(2016, 11, 1, 0, 0),
           'new_symbol': 'ARNC',
           'old_symbol': 'AA'},
          {'effective_date': datetime.datetime(2016, 11, 1, 0, 0),
           'new_symbol': 'ARNC$',
           'old_symbol': 'AA$'},
          {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
           'new_symbol': 'MALN8',
           'old_symbol': 'AHUSDN2018'},
          {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
           'new_symbol': 'MALN9',
           'old_symbol': 'AHUSDN2019'},
          {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
           'new_symbol': 'MALQ6',
           'old_symbol': 'AHUSDQ2016'},
          {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
           'new_symbol': 'MALQ7',
           'old_symbol': 'AHUSDQ2017'},
          {'effective_date': datetime.datetime(2016, 8, 16, 0, 0),
           'new_symbol': 'MALQ8',
           'old_symbol': 'AHUSDQ2018'}]}

Arguments

ua (User-Agent)
proxies (check requesocks)
auth
crawl_delay (crawl delay in seconds across pagination - default: 3 seconds)
pagination_max_limit (max number of pages to crawl - default: 100)
ssl_verify (default: False)
robots (if set respects robots.txt - default: True)
reppy_capacity (robots cache LRU capacity - default: 100)
trim_values (if set trims output for leading and trailing whitespace - default: True)

Config parameters:

By default any key in the config is a rule to parse.
- Each rule can be either a xpath or a css
- Each rule can extract many values by default unless explicity set to False
- Each rule can allow to transform the result with a function if provided
Extra parameters include grouping (_groups) and pagination (_paginate_by) which is also of the rule format.

Building & Publishing

Prerequisites

Install uv.
Review the uvx execution model for running tools without global installs.
Hatch documentation: https://hatch.pypa.io/latest/.

Build workflow

Run the release helper to build and publish wheels and source archives via Hatch:

./upload.sh

The script shells out to uvx hatch build followed by uvx hatch publish so that Hatch is executed in an ephemeral environment.

Publishing requirements

Configure credentials in ~/.pypirc as described in the PyPI configuration specification.

Example configuration:

[distutils]
index-servers =
  pypi
  testpypi

[pypi]
repository = https://upload.pypi.org/legacy/
username = __token__
password = <pypi-token>

[testpypi]
repository = https://test.pypi.org/legacy/
username = __token__
password = <testpypi-token>

Replace token placeholders with secrets from the team password manager and avoid committing the file to version control.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.17

Nov 4, 2025

1.2.16

Nov 4, 2025

1.2.15

Nov 3, 2025

1.2.14

Nov 3, 2025

1.2.12

Jun 25, 2024

1.2.11

Nov 6, 2023

1.2.10

Nov 6, 2023

1.2.9

Nov 4, 2023

1.2.8

Oct 11, 2021

1.2.7

Apr 17, 2018

1.2.6

Apr 17, 2018

1.2.5

Jun 14, 2017

1.2.4

Feb 20, 2017

1.2.3

Jan 30, 2017

1.2.2

Jan 14, 2017

1.2.1

Jan 9, 2017

1.2

Jan 9, 2017

1.1.1

Oct 18, 2016

1.1

Sep 22, 2016

1.0.5

Apr 17, 2018

1.0.4

Apr 17, 2018

1.0.1

Sep 7, 2016

1.0

Sep 7, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hodorlive-1.2.17.tar.gz (23.7 kB view details)

Uploaded Nov 4, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hodorlive-1.2.17-py3-none-any.whl (5.8 kB view details)

Uploaded Nov 4, 2025 Python 3

File details

Details for the file hodorlive-1.2.17.tar.gz.

File metadata

Download URL: hodorlive-1.2.17.tar.gz
Upload date: Nov 4, 2025
Size: 23.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.28.1

File hashes

Hashes for hodorlive-1.2.17.tar.gz
Algorithm	Hash digest
SHA256	`54a26e7322b1b64b117038c58625dc34f2810929b11d955b32aaaab1a3651248`
MD5	`7c8f346ed5e579c328f70b61410b1d06`
BLAKE2b-256	`55e4f21907dc770c3784218b7fdf1e33575c50a68f7f0b379159cf2e65666cba`

See more details on using hashes here.

File details

Details for the file hodorlive-1.2.17-py3-none-any.whl.

File metadata

Download URL: hodorlive-1.2.17-py3-none-any.whl
Upload date: Nov 4, 2025
Size: 5.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: python-httpx/0.28.1

File hashes

Hashes for hodorlive-1.2.17-py3-none-any.whl
Algorithm	Hash digest
SHA256	`da021b8d5f39401df9bc0f5a9d09458ffc7d6ca8ceb30639e62ccb18d7867059`
MD5	`7ee85475c61e27cb49cb4b9aea9e5295`
BLAKE2b-256	`988489926f95ceebbcfecb0da3834260b1124e82975ddb7dea7ca146652aa812`

See more details on using hashes here.

hodorlive 1.2.17

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hodor

Install

Usage

As python package

Sample code

Sample output

Arguments

Config parameters:

Building & Publishing

Prerequisites

Build workflow

Publishing requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes