Skip to main content

scrap mame games information and images

Project description

Downloads

mamescraper

Scrap mame games information and images from:

And generate a XML file for use with EmulationStation, the default frontend for RetroPie and Recalbox.

Simple example of the scraper in action:

$ mamescraper

Initializing the scraper...

Scraping from: bigode - http://mame.bigode.net/

Mame database xml not found
 Downloading: 100%
Done!

Generating list of images to download...
Done!

Downloading 10 images:
 aliensyn.png downloaded
 guwange.png downloaded
 bjourney.png downloaded
 ecofghtr.png downloaded
 captcomm.png downloaded
 spf2t.png downloaded
 nbahangt.png downloaded
 ldrun.png downloaded
 raiden2.png downloaded
 tmnt2.png downloaded
Done!

Generating new gamelist xml...
 New gamelist.xml file created!

Total time spent: 0m 1s

All set! Happy gaming :)

About

The purpose of this scraper is to be simple and fast. Currently it supports two sources with drastically different scraping methodologies.

The default source (bigode) is a lot faster because the scraper will download an entire mame database (1.1MB compressed) and scrap all the games information in one go. Besides that, this source uses a CDN and a very fast webserver to serve the images, resulting in faster responses and downloading speed overall.

The adb source is a more traditional approach, for each game found, the scraper will do a http request to get the information needed and then will download the appropriate image.

It is important to note that the images flyers on ‘bigode’ source are smaller in size, because they have a fixed 400 width.

To illustrate the difference in scraping speed, the following shows the time spent scraping an entire mame 037b5 set (2241 roms) on both sources:

‘bigode’ source:

Command: mamescraper -w 10 -i title
Time spent: 1m 10s

‘adb’ source:

Command: mamescraper -w 10 -i title -s adb
Time spent: 8m 29s

Note that mamescraper was run with 10 workers and scraping titles only. If ran in ‘mixed’ image mode (default) the difference is even higher, since images flyers are bigger on adb:

‘bigode’ source:

Command: mamescraper -w 10
Time spent: 2m 2s

‘adb’ source:

Command: mamescraper -w 10 -s adb
Time spent: 26m 47s

Based on that, the recommended way is to run using the default ‘bigode’ source and if a game is not found, just run the scraper again in ‘append’ mode (to scan only the missing games) using ‘adb’ as source.

A huge thanks for AntoPISA creator of Progetto Snaps for the images and tons of mame resources.

Also, a huge thanks to Motoschifo creator of Arcade Database for the awesome arcade database website.

Notes

  • Works on Python 2.7 and Python 3.3+

  • Uses only Python standard library for maximum compatibility

Install

Install using pip:

pip install mamescraper

or

Download and set executable permission on the script file:

chmod +x mamescraper.py

or

Download and run using the python interpreter:

python mamescraper.py

or

Download the Windows executable file from the releases page.

Usage

Usage: mamescraper [options]

scrap mame games information and images from 'mame.bigode.net' or
'adb.arcadeitalia.net'

Options:
--version           show program's version number and exit
-h, --help          show this help message and exit
-a, --append        scrap only missing roms from output file and append it
                    to the file (default: disabled)
-d ROMS_DIR         directory containing the games (default: current
                    directory)
-e IMAGES_DIR_NAME  directory name to download the images (default: images)
-f FORMAT           file format of the games: 'zip' or '7z' (default: zip)
-i IMAGES           images type: 'mixed', 'title' or 'flyer' - mixed will
                    download a flyer and fallback to title if a flyer is not
                    found (default: mixed)
-o OUTPUT_FILE      the xml file that will be created (default:
                    gamelist.xml)
-s SOURCE           information and images source: 'bigode' or 'adb'
                    (default: bigode)
-w WORKERS          number of workers threads to use (default: 5)

Examples

Simplest use case is to run on the mame games directory itself:

$ cd my_games_dir
$ mamescraper

Alternatively, you can pass the directory to the scraper:

$ mamescraper -d path_to_games_dir

Scrap only missing games in the existing ‘gamelist.xml’ file:

$ mamescraper -a

It is useful for scrap on both sources if a game is not found. Just run the scraper a second time with append enable and a different source:

$ mamescraper
$ mamescraper -a -s adb

If you have games in mixed formats, the following will run the first time scraping games in ‘zip’ format (default) and the second time appending the missing games in ‘7z’ format:

$ mamescraper
$ mamescraper -a -f 7z

Run with 10 workers downloading only titles images:

$ mamescraper -i title -w 10

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mamescraper-0.4.tar.gz (9.4 kB view details)

Uploaded Source

File details

Details for the file mamescraper-0.4.tar.gz.

File metadata

  • Download URL: mamescraper-0.4.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.9.1 pkginfo/1.4.1 requests/2.18.4 setuptools/33.1.1 requests-toolbelt/0.8.0 tqdm/4.19.5 CPython/2.7.14

File hashes

Hashes for mamescraper-0.4.tar.gz
Algorithm Hash digest
SHA256 6ea8ce9b124d2de342b765d5d776477b2812302637c82415181504780c7cda5e
MD5 ec813bc86a1c21a318da9f740c01e8f0
BLAKE2b-256 f399e9a3cd3d489700ae7e49b65a14189c3dc2cbfc65dcf8c1a47d00b72856bc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page