Skip to main content

Pull publicly available Goodreads data on books, authors and users

Project description

kulchur

Pull publicly available Goodreads data on books, authors and users.

description

kulchur is a python package that allows you to work with publicly available Goodreads data on books, authors and users (with public profiles). You are able to get a number of different attributes for each of the three sources, from the ratings distribution for a book to the author's birth place and birth date. You are also able to get the data asynchronously, allowing for pulling multiple records in an official manner. Given the nature of this data, commercial use is not condoned or supported.

installation

You can install kulchur via PyPi:

$ pip install kulchur

usage

kulchur allows you to pull data on publicly available books, authors, and users on Goodreads. There are three classes, one for each data source:

  • Alexandria for book data
  • Pound for author data
  • FalseDmitry for user data

Each class requires either

  • the unique Goodreads ID for an item; for example, the ID for Ivan Turgenev is 410680, found in the URL of Turgenev's page.
  • the URL to a[n] author/book/user's page.

loading the data

After initializing one of the three classes, you can now load the data, either asynchronously or non-asynchronously:

alx = Alexandria()

# asynchronously
with aiohttp.ClientSession() as sesh:
    await alx.load_book_async(session=sesh, 
                              book_identifier='410680')

# regularly
alx.load_book(book_identifier='410680')

Errors will occur when a non-200 response is recieved, such as when an item is non-existent. Further, when pulling user data, an error will be returned if a user is private.

pulling specific fields

Each of the three classes have a number of methods for pulling specific fields after initially loading in the data, all starting with get. See below for a handful of examples for each:

# book data
alx = Alexandria()
alx.get_title() # returns title name
alx.get_top_genres() # returns list of book's genres
alx.get_rating_dist() # returns rating-share dictionary, e.g., {'1': .2, '2': .4,..., '5': .05} 

# author data
pnd = Pound()
pnd.get_review_count() # returns number of user reviews submitted
pnd.get_quotes_sample() # returns sample (n=3) of quotes by author
pnd.get_birth_place() # returns author's birthplace

# user data
dmtry = FalseDmitry()
dmtry.get_favorite_genres() # returns list of author's favorite genres
dmtry.get_follower_count() # returns user's number of followers
dmtry.get_name() # returns user name

In cases where the field is not available, None will be returned

pulling fields in bulk

Each of the three classes all have bulk data methods, each called get_all_data(). You can pass attributes you want to exclude, and whether you want a dictionary or SimpleNamespace format:

alx = Alexandria()
dat = alx.get_all_data(exclude_attrs=['rating', 'similar_books'],
                       to_dict=True)

pulling multiple items

If you'd like to pull more than one item at a time, you can use the load_[item]_aio functions for asynchronous pulls. The functions have a number of configurations avaialable, from JSON exports to semaphore counts to number of attempts per pull:

# pull book data on 
#   Fathers and Sons by Ivan Turgenev
#   The Master and Margarita by Mikhail Bulgakov
#   The Year of Magical Thinking by Joan Didion
dat = await bulk_books_aio(book_ids=['19117', '117833', '7815'],
                           exclude_attrs=['similar_books'],
                           semaphore_count=2,
                           batch_delay=None,
                           batch_size=None,
                           to_dict=True,
                           see_progress=True,
                           write_json='out_books.json')

Try to be considerate of Goodreads server load. And again, given the nature of this data, commercial use is not condoned or supported.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kulchur-1.0.0.tar.gz (25.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kulchur-1.0.0-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file kulchur-1.0.0.tar.gz.

File metadata

  • Download URL: kulchur-1.0.0.tar.gz
  • Upload date:
  • Size: 25.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kulchur-1.0.0.tar.gz
Algorithm Hash digest
SHA256 2af1c49526a67c8c278c1eb63c02c69def5bd618b7235d4453f1e329a386c9d2
MD5 08d5a95fd3da3ee51e8cebbd991718a0
BLAKE2b-256 1b610acffcc1d33d8084659307bc638e4007939bd259a6531408df74a90ad2ad

See more details on using hashes here.

File details

Details for the file kulchur-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: kulchur-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 23.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for kulchur-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9420c109ffd06ab6cdbb89b39a2a500a9ec80c9ab5ee31eb2bb2eec959d285c2
MD5 a8291a68aa03012c5878db7ea7346fa3
BLAKE2b-256 e23bf7d19e173342ab6e9ef0a4b4ea267ef0dc7ceb6376adc7f5379eb6f4b1f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page