Skip to main content

Small library for extracting references used in scholarly communication.

Project description

https://travis-ci.org/inspirehep/refextract.svg?branch=master https://coveralls.io/repos/github/inspirehep/refextract/badge.svg?branch=master

About

A small library for extracting references used in scholarly communication.

Install

$ pip install refextract

Usage

To get structured information from a publication reference:

>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
    'extra_ibids': [],
    'is_ibid': False,
    'misc_txt': u'',
    'page': u'13445',
    'title': u'J. Phys.',
    'type': 'JOURNAL',
    'volume': u'A39',
    'year': '',
}

To extract references from a PDF:

>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

To extract directly from a URL:

>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

Notes

refextract depends on pdftotext.

Acknowledgments

refextract is based on code and ideas from the following people, who contributed to the docextract module in Invenio:

  • Alessio Deiana
  • Federico Poli
  • Gerrit Rindermann
  • Graham R. Armstrong
  • Grzegorz Szpura
  • Jan Aage Lavik
  • Javier Martin Montull
  • Micha Moskovic
  • Samuele Kaplun
  • Thorsten Schwander
  • Tibor Simko

License

GPLv2

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for refextract, version 1.0.3
Filename, size File type Python version Upload date Hashes
Filename, size refextract-1.0.3.tar.gz (6.5 MB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page