Skip to main content

Small library for extracting references used in scholarly communication.

Project description

Small library for extracting references used in scholarly communication.

Originally exported from Invenio https://github.com/inveniosoftware/invenio.

Installation

pip install refextract

Usage

To get structured info from a publication reference:

from refextract import extract_journal_reference
reference = extract_journal_reference("J.Phys.,A39,13445")
print(reference)
{
    'extra_ibids': [],
    'is_ibid': False,
    'misc_txt': u'',
    'page': u'13445',
    'title': u'J. Phys.',
    'type': 'JOURNAL',
    'volume': u'A39',
    'year': ''
 }

To extract references from a publication full-text PDF:

from refextract import extract_references_from_file
reference = extract_references_from_file("some/fulltext/1503.07589v1.pdf")
print(reference)
{
    'references': [
            {'author': [u'F. Englert and R. Brout'],
             'doi': [u'10.1103/PhysRevLett.13.321'],
             'journal_page': [u'321'],
             'journal_reference': ['Phys.Rev.Lett.,13,1964'],
             'journal_title': [u'Phys.Rev.Lett.'],
             'journal_volume': [u'13'],
             'journal_year': [u'1964'],
             'linemarker': [u'1'],
             'title': [u'Broken symmetry and the mass of gauge vector mesons'],
             'year': [u'1964']}, ...
       ],
    'stats': {
          'author': 15,
          'date': '2016-01-12 10:52:58',
          'doi': 1,
          'misc': 0,
          'old_stats_str': '0-1-1-15-0-1-0',
          'reportnum': 1,
          'status': 0,
          'title': 1,
          'url': 0,
          'version': u'0.1.0.dev20150722'
    }
}

You can also extract directly from a URL:

from refextract import extract_references_from_url
reference = extract_references_from_url("http://arxiv.org/pdf/1503.07589v1.pdf")
print(reference)
{
    'references': [
            {'author': [u'F. Englert and R. Brout'],
             'doi': [u'10.1103/PhysRevLett.13.321'],
             'journal_page': [u'321'],
             'journal_reference': ['Phys.Rev.Lett.,13,1964'],
             'journal_title': [u'Phys.Rev.Lett.'],
             'journal_volume': [u'13'],
             'journal_year': [u'1964'],
             'linemarker': [u'1'],
             'title': [u'Broken symmetry and the mass of gauge vector mesons'],
             'year': [u'1964']}, ...
       ],
    'stats': {
          'author': 15,
          'date': '2016-01-12 10:52:58',
          'doi': 1,
          'misc': 0,
          'old_stats_str': '0-1-1-15-0-1-0',
          'reportnum': 1,
          'status': 0,
          'title': 1,
          'url': 0,
          'version': u'0.1.0.dev20150722'
    }
}

Changes

Version 0.1.0 (2016-01-12)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refextract-0.1.0.tar.gz (1.9 MB view details)

Uploaded Source

File details

Details for the file refextract-0.1.0.tar.gz.

File metadata

  • Download URL: refextract-0.1.0.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for refextract-0.1.0.tar.gz
Algorithm Hash digest
SHA256 172896a7fbab80df66424658739ecfa7034b163a351f84d67e2cfe49c1312cc7
MD5 511a64891a1a4780d0f76458e986388e
BLAKE2b-256 a1be0604ed0de23402296f3289cc1355d1db0a88723144ab572aaad572f80b2b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page