Skip to main content

Query information about scientific publications on the web.

Project description

Pubfisher: Effectively explore scientific publications

Pubfisher is about querying scientific publications from web sources such as Google Scholar. These sources often do not offer a convenient, programmable API, such that complex or comprehensive queries require lots of manual steps, such as solving Captchas.

Pubfisher offers a simple data model and API that reduces the manual effort to the minimum and makes complex queries simple to express. In particular, Pubfisher does not fail when it sees a Captcha: It shows the captcha to you, you solve the captcha, on goes the query.

Let's say you are interested in the first 200 citations of a paper according to Google Scholar Then this could be your query:

from pubfisher.fishers.googlescholar import PublicationGSFisher
from itertools import islice


def my_query():
    fisher = PublicationGSFisher()

    fisher.look_for_key_words('Parachute use to prevent death '
                              'and major trauma related to '
                              'gravitational challenge: '
                              'systematic review of randomised '
                              'controlled trials')

    return islice(fisher.fish_all(), 200)

Using one and the same scraper, you can perform lots of queries. Pubfisher takes care of reusing the session cookies across requests such that your queries appear natural to the underlying web services.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pubfisher-2019.11.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pubfisher-2019.11-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file pubfisher-2019.11.tar.gz.

File metadata

  • Download URL: pubfisher-2019.11.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for pubfisher-2019.11.tar.gz
Algorithm Hash digest
SHA256 b51437e314da5c261bace91de9e28dc56f5992752d43a936362e92c4d14fd6ba
MD5 2b66cf0f70234a77a965a6bdb381c100
BLAKE2b-256 d73aa5d1dde5a85328294e664b9740c9d061c020750c3d5071d33ad3cfb37165

See more details on using hashes here.

File details

Details for the file pubfisher-2019.11-py3-none-any.whl.

File metadata

  • Download URL: pubfisher-2019.11-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.5

File hashes

Hashes for pubfisher-2019.11-py3-none-any.whl
Algorithm Hash digest
SHA256 2d8a64e7d9ac55593b7d9f6b0291db76919c0efadae1d77bd68cb6f567d3f216
MD5 65ad9a8b40b10d932191a39df254916a
BLAKE2b-256 e0c70bc635f2a126b397fa04311d442e19ad6b4ed7a20a9b321f353795eb7f4a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page