Skip to main content

Small library for extracting references used in scholarly communication.

Project description

https://travis-ci.org/inspirehep/refextract.svg?branch=master https://coveralls.io/repos/github/inspirehep/refextract/badge.svg?branch=master

About

A small library for extracting references used in scholarly communication.

Install

$ pip install refextract

Usage

To get structured information from a publication reference:

>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
    'extra_ibids': [],
    'is_ibid': False,
    'misc_txt': u'',
    'page': u'13445',
    'title': u'J. Phys.',
    'type': 'JOURNAL',
    'volume': u'A39',
    'year': '',
}

To extract references from a PDF:

>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

To extract directly from a URL:

>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

Notes

refextract depends on pdftotext.

Acknowledgments

refextract is based on code and ideas from the following people, who contributed to the docextract module in Invenio:

  • Alessio Deiana

  • Federico Poli

  • Gerrit Rindermann

  • Graham R. Armstrong

  • Grzegorz Szpura

  • Jan Aage Lavik

  • Javier Martin Montull

  • Micha Moskovic

  • Samuele Kaplun

  • Thorsten Schwander

  • Tibor Simko

License

GPLv2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refextract-0.2.20.tar.gz (6.5 MB view details)

Uploaded Source

File details

Details for the file refextract-0.2.20.tar.gz.

File metadata

  • Download URL: refextract-0.2.20.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/2.7.15

File hashes

Hashes for refextract-0.2.20.tar.gz
Algorithm Hash digest
SHA256 38e0e1480a7184cd6010320819c624633571361ec902bf761474efdfed015e60
MD5 d4a35e91c10a0fdf33a265f081a48a5b
BLAKE2b-256 338bef85eaf82d9932f325a019d0199a1a34c38717f7358abe0fd7c6290fc31c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page