Extensible auction house scraper for ebay, liveauctioneers, catawiki, and other platforms
Project description
Auction Scraper
Scrape auction data auction sites into a sqlite database
Currently supports: catawiki, ebay, liveauctioneers
Can be used as a CLI tool, or interfaced with directly
Installation
You can install with pip:
pip install auction-scraper
New backend support
Want to scrape an auction house not listed above? Fear not - through our partnership with Dreaming Spires, you can request that we build additional backend scrapers to extend the functionality. Email contact@dreamingspires.dev for more info.
We also accept PRs, so feel free to write your own backend and submit it, if you require. Instructions for this can be found under the Building new backends section.
Usage
auction-scraper
will scrape data from auctions, profiles, and searches on the specified auction site. Resulting textual data is written to a sqlite3
database, with images and backup web pages optionally being written to a data directory.
The tool is invoked as:
Usage: auction-scraper [OPTIONS] DB_PATH BACKEND:[ebay|liveauctioneers]
COMMAND [ARGS]...
Options:
DB_PATH The path of the sqlite database file to be
written to [required]
BACKEND:[ebay|liveauctioneers] The auction scraping backend [required]
--data-location TEXT The path additional image and html data is
saved to
--save-images / --no-save-images
Save images to data-location. Requires
--data-location [default: False]
--save-pages / --no-save-pages Save pages to data-location. Requires
--data-location [default: False]
--verbose / --no-verbose [default: False]
--base-uri TEXT Override the base url used to resolve the
auction site
--install-completion [bash|zsh|fish|powershell|pwsh]
Install completion for the specified shell.
--show-completion [bash|zsh|fish|powershell|pwsh]
Show completion for the specified shell, to
copy it or customize the installation.
--help Show this message and exit.
Commands:
auction Scrapes an auction site auction page.
profile Scrapes an auction site profile page.
search Performs a search, returning the top n_results results for each...
Auction mode
In auction mode, an auction must be specified as either a unique auction ID or as a URL. The textual data is scraped into the [BACKEND]_auctions
table of DB_PATH
, the page is scraped into [data-location]/[BACKEND]/auctions
, and the images into [data-location]/[BACKEND]/images
. The --base-url
option determines the base URL from which to resolve auction IDs, profile IDs, and search query strings if specified, otherwise defaulting to the default for the specified backend.
Example usage:
# Scraping an auction by URL
auction-scraper db.db liveauctioneers auction https://www.liveauctioneers.com/item/88566418_cameroon-power-or-reliquary-figure
# Equivalently scraping from an auction ID
auction-scraper db.db liveauctioneers auction 88566418
# Scraping an auction, including all images and the page itself, into data-location
auction-scraper --data-location=./data --save-images --save-pages db.db liveauctioneers auction 88566418
Profile mode
In profile mode, a profile must be specified as either a unique user ID or as a URL. The textual data is scraped into the [BACKEND]_profiles
table of DB_PATH
, and the page is scraped into [data-location]/[BACKEND]/profiles
. The --base-url
option determines the base URL from which to resolve auction IDs, profile IDs, and search query strings if specified, otherwise defaulting to the default for the specified backend.
Example usage:
# Scraping a profile by URL
auction-scraper db.db liveauctioneers profile https://www.liveauctioneers.com/auctioneer/197/hindman/
# Equivalently scraping from a profile ID
auction-scraper db.db liveauctioneers auction 197
# Scraping a profile, including the page itself, into data-location
auction-scraper --data-location=./data --save-pages db.db liveauctioneers profile 197
Search mode
In search mode, at least one QUERY_STRING
must be provided alongside N_RESULTS
. It will scrape the auctions pertaining to the top N_RESULTS
results from the QUERY_STRING
. The --base-url
option determines the base URL from which to resolve the search if specified, otherwise defaulting to the default for the specified backend.
Example usage:
# Search one result by a single search term
auction-scraper db.db search 1 "mambila art"
# Search ten results by two search terms, scraping images and pages into data-location
auction-scraper --data-location=./data --save-images --save-pages db.db search 10 "mambila" "mambilla"
Running continuously using systemd
auction-scraper@.service
and auction-scraper@.timer
, once loaded by systemd, can be used to schedule the running of auction-scraper
with user-given arguments according to a schedule.
Running as a systemd root service
Copy auction-scraper@.service
and auction-scraper@.timer
to /etc/systemd/system/
.
Modify auction-scraper@.timer
to specify the schedule you require.
Reload the system daemons. As root:
systemctl daemon-reload
Run (start now) and enable (restart on boot) the systemd-timer, specifying the given arguments, within quotes, after the '@'. For example, as root:
systemctl enable --now auction-scraper@"db.db liveauctioneers search 10 mambila".timer
Find information about your running timers with:
systemctl list-timers
Stop your currently running timer with:
systemctl stop auction-scraper@"db.db liveauctioneers search 10 mambila".timer
Disable your currently running timer with:
systemctl disable auction-scraper@"db.db liveauctioneers search 10 mambila".timer
A new timer is created for each unique argument string, so the arguments must be specified when stopping or disabling the timer.
Some modification may be required to run as a user service, including placing the service and timer files in ~/.local/share/systemd/user/
.
Building from source
Ensure poetry is installed. Then from this directory install dependencies into the poetry virtual environment and build:
poetry install
poetry build
Source and wheel files are built into auction_scraper/dist
.
Install it across your user with pip
, outside the venv:
cd ./dist
python3 -m pip install --user ./auction_scraper-0.0.1-py3-none-any.whl
or
cd ./dist
pip install ./auction_scraper-0.0.1-py3-none-any.whl
Run auction-scraper
to invoke the utility.
Interfacing with the API
Each backend of auction-scraper
can also be invoked as a Python library to automate its operation. The backends implement the abstract class auction_scraper.abstract_scraper.AbstractAuctionScraper
, alongside the abstract SQLAlchemy models auction_scraper.abstract_models.BaseAuction
and auction_scraper.abstract_models.BaseProfile
.
The resulting scraper exposes methods to scrape auction, profile, and search pages into these SQLAlchemy model objects, according to the following interface:
def scrape_auction(self, auction, save_page=False, save_images=False):
"""
Scrapes an auction page, specified by either a unique auction ID
or a URI. Returns an auction model containing the scraped data.
If specified by auction ID, constructs the URI using self.base_uri.
If self.page_save_path is set, writes out the downloaded pages to disk at
the given path according to the naming convention specified by
self.auction_save_name.
Returns a BaseAuction
"""
def scrape_profile(self, profile, save_page=False):
"""
Scrapes a profile page, specified by either a unique profile ID
or a URI. Returns an profile model containing the scraped data.
If specified by profile ID, constructs the URI using self.base_uri.
If self.page_save_path is set, writes out the downloaded pages to disk at
the given path according to the naming convention specified by
self.profile_save_name.
Returns a BaseProfile
"""
def scrape_search(self, query_string, n_results=None, save_page=False,
save_images=False):
"""
Scrapes a search page, specified by either a query_string and n_results,
or by a unique URI.
If specified by query_string, de-paginates the results and returns up
to n_results results. If n_results is None, returns all results.
If specified by a search_uri, returns just the results on the page.
Returns a dict {auction_id: SearchResult}
"""
def scrape_auction_to_db(self, auction, save_page=False, save_images=False):
"""
Scrape an auction page, writing the resulting page to the database.
Returns a BaseAuction
"""
def scrape_profile_to_db(self, profile, save_page=False):
"""
Scrape a profile page, writing the resulting profile to the database.
Returns a BaseProfile
"""
def scrape_search_to_db(self, query_strings, n_results=None, \
save_page=False, save_images=False):
"""
Scrape a set of query_strings, writing the resulting auctions and profiles
to the database.
Returns a tuple ([BaseAuction], [BaseProfile])
"""
Building new backends
All backends live at action_scraper/scrapers
in their own specific directory. It should implement the abstract class auction_scraper.abstract_scraper.AbstractAuctionScraper
in a file scraper.py
, and the abstract SQLAlchemy models auction_scraper.abstract_models.BaseAuction
and auction_scraper.abstract_models.BaseProfile
in models.py
.
The AuctionScraper
class must extend AbstractAuctionScraper
and implement the following methods:
# Given a uri, scrape the auction page into an auction object (of type BaseAuction)
def _scrape_auction_page(self, uri)
# Given a uri, scrape the profile page into an profile object (of type BaseAuction)
def _scrape_profile_page(self, uri)
# Given a uri, scrape the search page into a list of results (of type {auction_id: SearchResult})
def _scrape_search_page(self, uri)
It must also supply defaults to the following variables:
auction_table
profile_table
base_uri
auction_suffix
profile_suffix
search_suffix
backend_name
Authors
Edd Salkield edd@salkield.uk - Main codebase
Mark Todd - Liveauctioneers scraper
Jonathan Tanner - Catawiki scraper
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file auction-scraper-0.4.2.tar.gz
.
File metadata
- Download URL: auction-scraper-0.4.2.tar.gz
- Upload date:
- Size: 32.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6b871a85d8e6fa9917a6e7941ce6372a6149a1c6771a7fa74e8b9ec5455bb1d4 |
|
MD5 | 8ae625689de7ee3e470d8ffb50d315cf |
|
BLAKE2b-256 | d5ef8d48bc845d74cdfdb04cfd2f29987b725d4e22452c02343d6a6a261bf0e8 |
File details
Details for the file auction_scraper-0.4.2-py3-none-any.whl
.
File metadata
- Download URL: auction_scraper-0.4.2-py3-none-any.whl
- Upload date:
- Size: 36.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/53.0.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 14863fb66d2e3f717f35c44dffc9ba69f1fbb4b326cbebf21a9d54364f28b021 |
|
MD5 | e86d497d787004f240f520949718b923 |
|
BLAKE2b-256 | 276c306bc4952ac7f4a52c4e23dd6da6652317fc79a523ba8d1a7be3c86979cb |