Containerize osu! data into a MySQL container and optionally serve .osu files via NGINX
Project description
osu! Data on Docker
pip install osu-data; osu-data -m mania -v top_1000 -ymd YYYY_MM_DD
Docker must be installed and running on your machine.
Retrieves database data from https://data.ppy.sh/ and hosts it on a local MySQL
server.
Optionally, store all ranked/loved .osu
files in a service with the -f
tag.
Get Started
IMPORTANT: MySQL data persists across runs. Recreate the MySQL Service if you changed the data used.
-
Install via pip
pip install osu-data
-
Minimally, specify:
-m
,--mode
: The game mode to build the database with.osu
,taiko
,catch
ormania
-v
,--version
: The database version.top_1000
,top_10000
orrandom_10000
-
Optionally, specify:
-ymd
,--year_month_day
: The year, month, day of the database in the formatYYYY_MM_DD
-p
,--port
: The port to expose MySQL on. Default is3308
-f
,--files
: Whether to download.osu
files.-np
,--nginx-port
: The port to expose the nginx service on. Default is8080
. Not used if-f
is not specified.--...
: See below table, these are optional flags to include or exclude more data. By specifying the flag, will INVERT the default value.
Option | Default Value |
---|---|
--beatmap-difficulty-attribs |
False |
--beatmap-difficulty |
False |
--scores |
True |
--beatmap-failtimes |
False |
--user-beatmap-playcount |
False |
--beatmaps |
True |
--beatmapsets |
True |
--user-stats |
True |
--sample-users |
True |
--counts |
True |
--difficulty-attribs |
True |
--beatmap-performance-blacklist |
True |
These options are chosen to be the most useful for analysis, and performance.
E.g.
osu-data \
-m osu -v top_1000 -ymd 2023_08_01 -p 3308 -f \
--beatmap-difficulty
- Download the top 1000 osu! standard beatmaps
- from 1st August 2023
- expose MySQL on port 3308
- download
.osu
files - include beatmap difficulty data
- Connect on:
localhost:<MYSQL_PORT>
localhost:<NGINX_PORT>
(if-f
is specified)
Common Issues
- Docker daemon is not running. Make sure that Docker is installed and running. If you're using Docker Desktop, make sure it's actually started.
- MySQL Data isn't incorrect. A few reasons
- Import was abruptly stopped. This can cause some
.sql
files to be missing / incomplete. Delete the whole compose project and try again. - Didn't specify the optional flags to include files. By default, some
.sql
files are not loaded. Take a look atosu-data -h
and specify the optional flags to include them. - Data is outdated. By default, on every re-run of
osu-data
, the data is preserved. To update the data, you must delete the whole compose project and try again.
- Import was abruptly stopped. This can cause some
- wget: server returned error: HTTP/1.1 404 Not Found. This happens when
you try to pull a
YYYY_MM_DD
that doesn't exist, and happens often when the data isn't yet ready on the start of each month. Check on https://data.ppy.sh/ to see whichYYYY_MM_DD
are available. rm: can't remove '../osu.mysql.init/*'
: This is safe to ignore.- MySQL Credentials. By default, the MySQL doesn't have a password, so just
use
root
as the username and leave the password blank. - No
files
service. This is default,files
service is optional and must be activated with the-f
tag.osu-data -h
for more info.
mysql.cnf
The database is tuned to be fast in importing speed, thus shouldn't be used for
production. Notably, we set innodb_doublewrite = 0
which can compromise
data integrity in the event of a crash. If you want to use this for production,
we recommend to set this up from this Git repo, and tweak mysql.cnf
.
Important Matters
- Do not distribute the built images as per peppy's request. Instead, you can just share the code to build your image, which should yield the same result.
- This database is meant to be for analysis, it's not tuned for production.
Tweak
mysql.cnf
after importing for more MySQL customizations. - Finally, be mindful on conclusions you make from the data.
Changelog
- 0.1.5:
- Allowed wider range of Python versions
3.9 ~ 4.0
.
- Allowed wider range of Python versions
- 0.2.0:
- Added GitHub Actions to automatically create dataset on workflow dispatch.
- Year, Month specification is now Year, Month, Day because some data dumps
don't fall exactly on day 1.
-ym
->-ymd
,--year-month
->--year-month-day
- Default of
-ymd
is removed to encourage users to check source of data.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.