Skip to main content

Small library for extracting references used in scholarly communication.

Project description

https://travis-ci.org/inspirehep/refextract.svg?branch=master https://coveralls.io/repos/github/inspirehep/refextract/badge.svg?branch=master

About

A small library for extracting references used in scholarly communication.

Install

$ pip install refextract

Usage

To get structured information from a publication reference:

>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
    'extra_ibids': [],
    'is_ibid': False,
    'misc_txt': u'',
    'page': u'13445',
    'title': u'J. Phys.',
    'type': 'JOURNAL',
    'volume': u'A39',
    'year': '',
}

To extract references from a PDF:

>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

To extract directly from a URL:

>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
    'author': [u'F. Englert and R. Brout'],
    'doi': [u'doi:10.1103/PhysRevLett.13.321'],
    'journal_page': [u'321'],
    'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
    'journal_title': [u'Phys. Rev. Lett.'],
    'journal_volume': [u'13'],
    'journal_year': [u'1964'],
    'linemarker': [u'1'],
    'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
    'texkey': [u'Englert:1964et'],
    'year': [u'1964'],
}

Notes

refextract depends on pdftotext.

Acknowledgments

refextract is based on code and ideas from the following people, who contributed to the docextract module in Invenio:

  • Alessio Deiana

  • Federico Poli

  • Gerrit Rindermann

  • Graham R. Armstrong

  • Grzegorz Szpura

  • Jan Aage Lavik

  • Javier Martin Montull

  • Micha Moskovic

  • Samuele Kaplun

  • Thorsten Schwander

  • Tibor Simko

License

GPLv2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refextract-1.1.1.tar.gz (6.6 MB view details)

Uploaded Source

Built Distribution

refextract-1.1.1-py3-none-any.whl (355.1 kB view details)

Uploaded Python 3

File details

Details for the file refextract-1.1.1.tar.gz.

File metadata

  • Download URL: refextract-1.1.1.tar.gz
  • Upload date:
  • Size: 6.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10

File hashes

Hashes for refextract-1.1.1.tar.gz
Algorithm Hash digest
SHA256 077d24cc7b5ac059f6bb01f2919694561740e27aec075ddc46b92d0a22d29cab
MD5 259df0cc7186a75bb4a3d6639f79d9c3
BLAKE2b-256 f916bb644c20c1f24c8a6095a2fa446f680effce8ad270df9add4d04f204cfde

See more details on using hashes here.

File details

Details for the file refextract-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: refextract-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 355.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10

File hashes

Hashes for refextract-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ffad7b7ee90671afe0363f385dc53ffd031ea2bccf32ffff3e8e79ae2cb8e86c
MD5 543e9917592892618387f7a5724707c7
BLAKE2b-256 53e72dc09da761236cb3c9b9d21c53451e72cae5132eb7933a0aa896c26cc20a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page