bookshops·PyPI

Get book information (isbn or search) from real bookstores.

These details have not been verified by PyPI

Project links

Homepage

Project description

Web scraping to get book information

This library is to get book information. We can search with keywords, with the isbn, with an advanced search, and do pagination.

We get the data from existing websites. We scrape:

for French books, http://www.librairie-de-paris.fr (also Decitre, but it’s less complete)
for Spain: http://www.casadellibro.com
for Germany: http://www.buchlentner.de

we get: the title and authors, the price, the publisher(s), the cover, etc

Import data from an ods or csv file

Another goal of this lib is to easily fetch a large amount of books you have listed in an ods or csv file.

If your file has an ‘isbn’ and a ‘quantity’ column, it’s easy, we will find all the book information.

If it only has the title and the publisher, it’s doable but error prone. We can still do it, but you shall do an inventory of your stock afterwards.

See the odsimport module. It gives back a json. It’s your responsibility to add what you want in your database (this is done in Abelujo https://gitlab.com/vindarel/abelujo).

Usable, but work in progress.

Accepted format and columns

We can read ods and csv files.

a file with an “isbn” and “quantity” column,
a file with columns “title”, “publisher”, “isbn” (optionnal in this case), “shelf”, “distributor”, “quantity”. There is no “price” column. “authors” is optionnal (it can help to fetch the correct book).

If the file has no headers, use the “odsettings.py” configuration file (in particular, to set the csv delimiter, either “,” or “;”).

Why not Amazon ?

Amazon kills the book industry and its employees. But moreover, we can add value to our results. We can link to a good and independent bookshop from within our application, we could command books from it, we could say if it has exemplaries in stock or not, etc. And… we learn a lot in doing this !

Technically speaking, the Amazon API web service can be too limitating and not appropriate. One must register to Amazon Product Advertising and to AWS, and the requests rate is limited to 1 request per second.

Why not Google books ?

It has very few data.

Why not the BNF (Bibliothèque Nationale de France) ?

Because, for bookshops, we need recent books (they enter the BNF database after a few months), up to date information. There isn’t a lot of tools either.

Install

Install from pypi:

pip install bookshops

Use

Command line

You can try this lib on the command line with the following commands: - livres: french books - libros: spanish books - bucher: german books - come and ask for more :)

For example:

livres antigone

livres 9782918059363

and you get the above screenshot.

Options: (this may vary according to the scrapers, check them with -h) - -i or --isbn to ensure to get all the isbn. The command line tool won’t get them by default if they need to be fetched with another http request for each book. That depends on the websites.

As a library

But most of all, from within your program:

from bookshops.frFR.librairiedeparis.librairiedeparisScraper import Scraper as frenchScraper

scraper = frenchScraper("search keywords")
cards = scraper.search()
# we get a list of dictionnaries with the title, the authors, etc.

Advanced search

Work in progress.

You can search ed:agone to search for a specific publisher.

Pagination

We do pagination:

scraper = frenchScraper("search keywords", PAGE=2)

Develop and test

See http://dev.abelujo.cc/webscraping.html

Development mode:

pip install -e .

Now you can edit the project and run the development version like the lib is meant to be run, i.e. with the entry_points: livres, libros, etc.

doc: https://python-packaging-user-guide.readthedocs.org/en/latest/distributing/#working-in-development-mode

Bugs and shortcomings

This is webscraping, so it doesn’t go without pitfalls:

the site can go down. It happened already.
the site can change, it which case we would have to change our sraper too. This can be catched early with automated and frequent tests (work ongoing).

Changelog

0.2.1

german scraper: search by isbn

0.2.0

German scraper
multiprocessing for the german scraper (from 15 to 9s) (see issue #1)
--isbn option for it

0.1.x

french, spanish scrapers
command line tool

Licence

LGPLv3

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.2.1

May 13, 2016

0.2.0

May 13, 2016

0.1.3

Apr 25, 2016

0.1.2

Apr 20, 2016

0.1.1

Apr 20, 2016

0.1.1dev3 pre-release

Apr 20, 2016

0.1.1dev2 pre-release

Apr 20, 2016

0.1.1dev pre-release

Apr 20, 2016

0.1

Apr 18, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bookshops-0.2.1.tar.gz (37.9 kB view details)

Uploaded May 13, 2016 Source

File details

Details for the file bookshops-0.2.1.tar.gz.

File metadata

Download URL: bookshops-0.2.1.tar.gz
Upload date: May 13, 2016
Size: 37.9 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for bookshops-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`fceea12cb8f518d70e1dedd9dbe239b85cd04817fc0f7305514029ef03b22fe3`
MD5	`96592aead7102c02e74e2cd4ed7ce105`
BLAKE2b-256	`0519c0834038edd74b247caf01d7885499a22cf470e2d0e9df8206db28825a53`

See more details on using hashes here.

bookshops 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Web scraping to get book information

Import data from an ods or csv file

Accepted format and columns

Why not Amazon ?

Why not Google books ?

Why not the BNF (Bibliothèque Nationale de France) ?

Install

Use

Command line

As a library

Advanced search

Develop and test

Bugs and shortcomings

Changelog

0.2.1

0.2.0

0.1.x

Licence

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes