Skip to main content

Converts a chess pgn file into a csv dataset containing game information and move information

Project description

pgn2data

License: GPL v3 GitHub stars GitHub forks

This library converts chess pgn files into CSV tabulated data sets.

A pgn file can contain one or multiple chess games. The library parses the pgn file and creates two csv files:

  • Games file: contains high level information (e.g. date, site, event, score, players etc...)

  • Moves file: contains the moves for each game (e.g. notation, squares, fen position, is in check etc...)

The two files can be mapped together using a GUID which the process inserts into both files.

Installation

The library requires Python 3.7 or later.

To install, type the following command on the python terminal:

pip install pgn2data

Implementation

Here is a basic example of how to convert a PGN file:

from converter.pgn_data import PGNData

pgn_data = PGNData("tal_bronstein_1982.pgn")
pgn_data.export()

The following is an example of grouping multiple files into the same output file ("output.csv").

pgn_data = PGNData(["file1.pgn","file2.pgn"],"output")
pgn_data.export()

The export function has a return object which allows you to quickly check the size and location of the files created:

pgn_data = PGNData("tal_bronstein_1982.pgn")
result = pgn_data.export()
result.print_summary()

If you want to check if the files have been created before doing further processing you can do the following:

pgn_data = PGNData("tal_bronstein_1982.pgn")
result = pgn_data.export()
if result.is_complete:
    print("Files created!")
else:
    print("Files not created!")

The result object also provides methods to import the created files into pandas dataframes:

pgn_data = PGNData("tal_bronstein_1982.pgn")
result = pgn_data.export()
if result.is_complete:
    
    # read the games file
    games_df = result.get_games_df()
    print(games_df.head())
    
    # read the moves file
    moves_df = result.get_moves_df()
    print(moves_df.head())
    
    # read both files joined together
    combined_df = result.get_combined_df()
    print(combined_df.head())

To output the game information only, you can do the following:

from converter.pgn_data import PGNData

pgn_data = PGNData("tal_bronstein_1982.pgn")
pgn_data.export(moves_required=False)

Examples

The folder 'samples' in this repository, has some examples of the output from the library.

You can also go here to see a Kaggle project that converted all of Magnus Carlsen's online Bullet games into CSV format.

Columns

This is a full list of the columns in each output file:

Games File

Field Description
game_id ID of game generated by process
game_order Order of game in PGN file
event Event
site Site
date_played Date played
round Round
white White player
black Black player
result Result
white_elo White player rating
white_rating_diff White rating difference from Black
black_elo Black player rating
black_rating_diff Black rating difference from White
white_title Player title
black_title Player title
winner Player name
winner_elo Player rating
loser Losing player
loser_elo Player rating
winner_loser_elo_diff Diff in rating
eco Opening
termination How game ended
time_control Time control
utc_date Date played
utc_time Time played
variant Game type
ply_count Ply Count
date_created Extract date
file_name PGN source file

Moves File

Field Description
game_id ID of game that maps to games file
move_no Order of moves
move_no_pair Chess move number
player Player name
notation Standard notation of move
move Before and after piece location
from_square Piece location before
to_square Piece location after
piece Initial of piece name
color Piece color
fen Fen position
is_check Is check on board
is_check_mate Is checkmate on board
is_fifty_moves Is 50 move complete
is_fivefold_repetition Is 5 fold repetition on board
is_game_over Is game over
is_insufficient_material Is game over from lack of mating material
white_count Count of white pieces
black_count Count of black pieces
white_{piece}_count Count of white specified piece
black_{piece}_count Count of black specified piece
captured_score_for_white Total of black pieces captured
captured_score_for_black Total of white pieces captured
fen_row{number}_{colour)_count Number of pieces for the specified colour on this row of the board
fen_row{number}_{colour}_value Total value of pieces for the specified colour on this row of the board
move_sequence Sequence of moves up to current position

Contributions

Contributions are welcome, all modifications should come with appropriate tests demonstrating an issue has been resolved, or new functionality is working as intended. Pull Requests without tests will not be merged.

The library can be tested by doing the following:

from testing.tests import run_all_tests
run_all_tests()

New tests should be added to the above method.

Acknowledgements

This project makes use of the python-chess library.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pgn2data-0.0.9.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

pgn2data-0.0.9-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file pgn2data-0.0.9.tar.gz.

File metadata

  • Download URL: pgn2data-0.0.9.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pgn2data-0.0.9.tar.gz
Algorithm Hash digest
SHA256 2021229c11d5a8516d57ead504efe4ff551a50d006aa5c205665c7f681621136
MD5 aa8a1fd17bbe46ba5aa1aca825d6b6ca
BLAKE2b-256 582712b789f240b60f0ffcbe2e038f8c8cd8c2956219a3247a80f1abf902d4d9

See more details on using hashes here.

File details

Details for the file pgn2data-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: pgn2data-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for pgn2data-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 4458f12bdcd1c3eb5660b6d67338d2cc05a50e1fdd9d3b132f38f129345ec74c
MD5 4da973f36163f8fa19c5b08acc40586a
BLAKE2b-256 b93ec367ece9612bf98cf5c08597340520cb54cb061b6a61c94147ae0be8af96

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page