Skip to main content

keibascraper is a simple scraping library for netkeiba.com

Project description

Keiba Scraper

Test PyPI

keibascraper is a Python library designed to parse data from netkeiba.com, a prominent Japanese horse racing website. It allows users to programmatically extract detailed information about races, entries, results, odds, and horses. Please note that depending on your usage, this may impose a significant load on netkeiba.com.

Table of Contents

Features

  • Flexible Data Loading: Supports loading of various data types such as race entries, results, odds, and horse information.
  • Configurable Parsing: Utilizes JSON configuration files to define parsing rules, making it easy to adapt to changes in the source website.
  • Error Handling: Provides robust error handling to manage network issues and data inconsistencies.
  • Caching: Implements caching mechanisms to improve performance and reduce redundant network requests.

Installation

keibascraper is available on PyPI and can be installed using pip:

$ python -m pip install keibascraper

Supported Python Versions: keibascraper officially supports Python 3.8 and above.

Dependencies

  • requests: For handling HTTP requests.
  • BeautifulSoup4: For parsing HTML content.
  • jq: For parsing JSON content using jq expressions.

Usage

To use keibascraper, import the library and use the load function to fetch and parse data from netkeiba.com. The load function requires two parameters: the data type and the entity ID.

Loading Entry Data (出走データ)

>>> import keibascraper
>>> race, entry = keibascraper.load("entry", "201206050810")
>>> print(race)
[{'race_id': '201206050810', 'race_number': 10, 'race_name': '有馬記念', ... }]
>>> print(entry)
[{'bracket': 7, 'horse_number': 13, 'horse_name': 'ゴールドシップ', ...}, {...}, ...]

Loading Result Data (結果データ)

>>> import keibascraper
>>> race, entry = keibascraper.load("result", "201206050810")
>>> print(race)
[{'race_id': '201206050810', 'race_number': 10, 'race_name': '有馬記念', ... }]
>>> print(entry)
[{'rank': 1, 'horse_name': 'ゴールドシップ', 'rap_time': 151.9,...}, {...}, ...]

Loading Odds Data (オッズデータ)

>>> import keibascraper
>>> odds = keibascraper.load("odds", "201206050810")
>>> print(odds)
[{'horse_number': 13, 'win': 2.7, 'show_min': 1.3, 'show_max': 1.5, ...}, {...}, ...]

Loading Horse Data (血統データ/出走履歴データ)

>>> import keibascraper
>>> horse, result = keibascraper.load("horse", "2009102739")
>>> print(horse)
[{'horse_id': '2009102739', 'father_name': 'ステイゴールド', ... }]
>>> print(result)
[{'race_date': '20151227', 'race_name': '有馬記念', 'rank': 8, ...}, {...}, ...]

Bulk Data Loading

To load multiple races in bulk, you can use the race_list function to retrieve a list of race IDs for a specific year and month.

import keibascraper

# Get list of race IDs for July 2022
race_ids = keibascraper.race_list(2022, 7)

# Loop through race IDs and load entry data
for race_id in race_ids:
    race_info, entry_list = keibascraper.load("entry", race_id)
    # Process the data as needed

Create table query generation for SQLite

The create_table_sql function generates an SQL query string for creating a table in an SQLite database. The table structure is dynamically defined based on the configuration file corresponding to the provided data_type like race, entry, result and etc. This function ensures that the table is created only if it does not already exist and assigns a primary key to the first column.

>>> import keibascraper
>>> query = keibascraper.create_table_sql("entry")
>>> print(query)
CREATE TABLE IF NOT EXISTS entry (bracket text, ... weight_diff integer);

API Reference

load Function

keibascraper.load(data_type, entity_id)
  • Description: Loads data from netkeiba.com based on the specified data type and entity ID.
  • Parameters:
    • data_type (str): Type of data to load. Supported types are 'entry', 'result', 'odds', and 'horse'.
    • entity_id (str): Identifier for the data entity (e.g., race ID, horse ID).
  • Returns:
    • For 'entry' and 'result': Returns a list of [{race}] and list of [{entry1}, {entry2}...].
    • For 'odds': Returns a list of [{odds1}, {odds2}...].
    • For 'horse': Returns a list of [{horse}] and list of [{result1}, {result2}...].
  • Raises:
    • ValueError: If an unsupported data type is provided.
    • RuntimeError: If data loading or parsing fails.

race_list Function

keibascraper.race_list(year, month)
  • Description: Retrieves a list of race IDs for the specified year and month.
  • Parameters:
    • year (int): The target year.
    • month (int): The target month.
  • Returns:
    • A list of race IDs (list).

Contributing

Contributions are welcome! If you have suggestions or find bugs, please open an issue or submit a pull request on the GitHub repository.

When contributing, please follow these guidelines:

  • Coding Standards: Follow PEP 8 style guidelines.
  • Testing: Ensure that your code passes existing tests and add new tests for your changes.
  • Documentation: Update documentation and docstrings as needed.

License

This project is licensed under the terms of the Apache-2.0 license. See the LICENSE file for details.

Disclaimer: This library is intended for personal use and educational purposes. Scraping data from websites may violate their terms of service. Please ensure that you comply with netkeiba.com's terms and conditions when using this library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keibascraper-3.1.1.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keibascraper-3.1.1-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file keibascraper-3.1.1.tar.gz.

File metadata

  • Download URL: keibascraper-3.1.1.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for keibascraper-3.1.1.tar.gz
Algorithm Hash digest
SHA256 a9461865d1c5b85cf3a0d5235383544dd2482d6a37e9425c618d4840dfc1ea10
MD5 f7a6fa32f7055b57e6a31e5859a7cac3
BLAKE2b-256 99a76f44115c60834b2894a7d6e46191423c0282700a2f657cd5afda063cff72

See more details on using hashes here.

File details

Details for the file keibascraper-3.1.1-py3-none-any.whl.

File metadata

  • Download URL: keibascraper-3.1.1-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for keibascraper-3.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 83a14a1b1045a1e0e1bff18711719eab4d1a3a90c4e372582bc8ff96ca3895d9
MD5 ecdd35f3763a012fe59f03d5ec86f834
BLAKE2b-256 1ec9cc9baaadd0f1872ad23e5c13d3ef420f29343d09aa0be08beb716bc8e8bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page