Small library for extracting references used in scholarly communication.
Project description
About
A small library for extracting references used in scholarly communication.
Install
$ pip install refextract
Usage
To get structured information from a publication reference:
>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
'extra_ibids': [],
'is_ibid': False,
'misc_txt': u'',
'page': u'13445',
'title': u'J. Phys.',
'type': 'JOURNAL',
'volume': u'A39',
'year': '',
}
To extract references from a PDF:
>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
'author': [u'F. Englert and R. Brout'],
'doi': [u'doi:10.1103/PhysRevLett.13.321'],
'journal_page': [u'321'],
'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
'journal_title': [u'Phys. Rev. Lett.'],
'journal_volume': [u'13'],
'journal_year': [u'1964'],
'linemarker': [u'1'],
'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
'texkey': [u'Englert:1964et'],
'year': [u'1964'],
}
To extract directly from a URL:
>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
'author': [u'F. Englert and R. Brout'],
'doi': [u'doi:10.1103/PhysRevLett.13.321'],
'journal_page': [u'321'],
'journal_reference': [u'Phys. Rev. Lett. 13 (1964) 321'],
'journal_title': [u'Phys. Rev. Lett.'],
'journal_volume': [u'13'],
'journal_year': [u'1964'],
'linemarker': [u'1'],
'raw_ref': [u'[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
'texkey': [u'Englert:1964et'],
'year': [u'1964'],
}
Notes
refextract depends on pdftotext.
Acknowledgments
refextract is based on code and ideas from the following people, who contributed to the docextract module in Invenio:
Alessio Deiana
Federico Poli
Gerrit Rindermann
Graham R. Armstrong
Grzegorz Szpura
Jan Aage Lavik
Javier Martin Montull
Micha Moskovic
Samuele Kaplun
Thorsten Schwander
Tibor Simko
License
GPLv2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file refextract-1.1.1.tar.gz
.
File metadata
- Download URL: refextract-1.1.1.tar.gz
- Upload date:
- Size: 6.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 077d24cc7b5ac059f6bb01f2919694561740e27aec075ddc46b92d0a22d29cab |
|
MD5 | 259df0cc7186a75bb4a3d6639f79d9c3 |
|
BLAKE2b-256 | f916bb644c20c1f24c8a6095a2fa446f680effce8ad270df9add4d04f204cfde |
File details
Details for the file refextract-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: refextract-1.1.1-py3-none-any.whl
- Upload date:
- Size: 355.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.0 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffad7b7ee90671afe0363f385dc53ffd031ea2bccf32ffff3e8e79ae2cb8e86c |
|
MD5 | 543e9917592892618387f7a5724707c7 |
|
BLAKE2b-256 | 53e72dc09da761236cb3c9b9d21c53451e72cae5132eb7933a0aa896c26cc20a |