Skip to main content

Containerize osu! data into a MySQL container and optionally serve .osu files via NGINX

Project description

osu! Data on Docker

pip install osu-data; osu-data -m mania -v top_1000 -ymd YYYY_MM_DD

Docker must be installed and running on your machine.

Retrieves database data from https://data.ppy.sh/ and hosts it on a local MySQL server. Optionally, store all ranked/loved .osu files in a service with the -f tag.

Get Started

IMPORTANT: MySQL data persists across runs. Recreate the MySQL Service if you changed the data used.

  1. Install via pip pip install osu-data

  2. Minimally, specify:

    • -m, --mode: The game mode to build the database with. osu, taiko, catch or mania
    • -v, --version: The database version. top_1000, top_10000 or random_10000
  3. Optionally, specify:

    • -ymd, --year_month_day: The year, month, day of the database in the format YYYY_MM_DD
    • -p, --port: The port to expose MySQL on. Default is 3308
    • -f, --files: Whether to download .osu files.
    • -np, --nginx-port: The port to expose the nginx service on. Default is 8080. Not used if -f is not specified.
    • --...: See below table, these are optional flags to include or exclude more data. By specifying the flag, will INVERT the default value.
Option Default Value
--beatmap-difficulty-attribs False
--beatmap-difficulty False
--scores True
--beatmap-failtimes False
--user-beatmap-playcount False
--beatmaps True
--beatmapsets True
--user-stats True
--sample-users True
--counts True
--difficulty-attribs True
--beatmap-performance-blacklist True

These options are chosen to be the most useful for analysis, and performance.

E.g.

osu-data \
  -m osu -v top_1000 -ymd 2023_08_01 -p 3308 -f \
  --beatmap-difficulty 
  • Download the top 1000 osu! standard beatmaps
  • from 1st August 2023
  • expose MySQL on port 3308
  • download .osu files
  • include beatmap difficulty data
  1. Connect on:
    • localhost:<MYSQL_PORT>
    • localhost:<NGINX_PORT> (if -f is specified)

Common Issues

  • Docker daemon is not running. Make sure that Docker is installed and running. If you're using Docker Desktop, make sure it's actually started.
  • MySQL Data isn't incorrect. A few reasons
    • Import was abruptly stopped. This can cause some .sql files to be missing / incomplete. Delete the whole compose project and try again.
    • Didn't specify the optional flags to include files. By default, some .sql files are not loaded. Take a look at osu-data -h and specify the optional flags to include them.
    • Data is outdated. By default, on every re-run of osu-data, the data is preserved. To update the data, you must delete the whole compose project and try again.
  • wget: server returned error: HTTP/1.1 404 Not Found. This happens when you try to pull a YYYY_MM_DD that doesn't exist, and happens often when the data isn't yet ready on the start of each month. Check on https://data.ppy.sh/ to see which YYYY_MM_DD are available.
  • rm: can't remove '../osu.mysql.init/*': This is safe to ignore.
  • MySQL Credentials. By default, the MySQL doesn't have a password, so just use root as the username and leave the password blank.
  • No files service. This is default, files service is optional and must be activated with the -f tag. osu-data -h for more info.

mysql.cnf

The database is tuned to be fast in importing speed, thus shouldn't be used for production. Notably, we set innodb_doublewrite = 0 which can compromise data integrity in the event of a crash. If you want to use this for production, we recommend to set this up from this Git repo, and tweak mysql.cnf.

Important Matters

  1. Do not distribute the built images as per peppy's request. Instead, you can just share the code to build your image, which should yield the same result.
  2. This database is meant to be for analysis, it's not tuned for production. Tweak mysql.cnf after importing for more MySQL customizations.
  3. Finally, be mindful on conclusions you make from the data.

Changelog

  • 0.1.5:
    • Allowed wider range of Python versions 3.9 ~ 4.0.
  • 0.2.0:
    • Added GitHub Actions to automatically create dataset on workflow dispatch.
    • Year, Month specification is now Year, Month, Day because some data dumps don't fall exactly on day 1.
      • -ym -> -ymd, --year-month -> --year-month-day
      • Default of -ymd is removed to encourage users to check source of data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osu_data-0.2.2.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

osu_data-0.2.2-py3-none-any.whl (13.3 kB view details)

Uploaded Python 3

File details

Details for the file osu_data-0.2.2.tar.gz.

File metadata

  • Download URL: osu_data-0.2.2.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Windows/10

File hashes

Hashes for osu_data-0.2.2.tar.gz
Algorithm Hash digest
SHA256 4120795ebbffc3ef46074572010f7ff9facc79af1a76f659e73b99705c0f5df9
MD5 ac7387951a6e63ad44a070515eb04c8a
BLAKE2b-256 9d362630e53b8dc3f6e9b07c313edefeb3e0c41641a96152fd7d847ad010d688

See more details on using hashes here.

File details

Details for the file osu_data-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: osu_data-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 13.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Windows/10

File hashes

Hashes for osu_data-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4c9690fd410028e16cc25e7a363d62de880c3635180621590f8494c6fa049fb1
MD5 f72b8d315f23fed8fc5b44333414ff42
BLAKE2b-256 fe6ab33377776e6372536f8b84271755ff9d03c62e7db31a1083cb14a795c408

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page