Small library for extracting references used in scholarly communication.
Project description
refextract
About
A library for extracting references used in scholarly communication.
Getting Started
Note: due to the usage of mmap resize functionality this library cannot be locally installed on a mac
Docker Setup:
Before the first usage, or anytime a new library/dependency is changed a new docker image must be created using:
docker build --target refextract-tests -t refextract .
After that, spin up a refextract service with:
docker run -it -v ./tests:/refextract/tests -v ./refextract:/refextract/refextract refextract
Running tests
Exec into the container via
docker exec -it refextract /bin/bash
Then simply run
pytest .
Usage
To get structured information from a publication reference:
>>> from refextract import extract_journal_reference
>>> reference = extract_journal_reference('J.Phys.,A39,13445')
>>> print(reference)
{
'extra_ibids': [],
'is_ibid': False,
'misc_txt': '',
'page': '13445',
'title': 'J. Phys.',
'type': 'JOURNAL',
'volume': 'A39',
'year': '',
}
To extract references from a PDF:
>>> from refextract import extract_references_from_file
>>> references = extract_references_from_file('1503.07589.pdf')
>>> print(references[0])
{
'author': ['F. Englert and R. Brout'],
'doi': ['doi:10.1103/PhysRevLett.13.321'],
'journal_page': ['321'],
'journal_reference': ['Phys. Rev. Lett. 13 (1964) 321'],
'journal_title': ['Phys. Rev. Lett.'],
'journal_volume': ['13'],
'journal_year': ['1964'],
'linemarker': ['1'],
'raw_ref': ['[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
'texkey': ['Englert:1964et'],
'year': ['1964'],
}
To extract directly from a URL:
>>> from refextract import extract_references_from_url
>>> references = extract_references_from_url('https://arxiv.org/pdf/1503.07589.pdf')
>>> print(references[0])
{
'author': ['F. Englert and R. Brout'],
'doi': ['doi:10.1103/PhysRevLett.13.321'],
'journal_page': ['321'],
'journal_reference': ['Phys. Rev. Lett. 13 (1964) 321'],
'journal_title': ['Phys. Rev. Lett.'],
'journal_volume': ['13'],
'journal_year': ['1964'],
'linemarker': ['1'],
'raw_ref': ['[1] F. Englert and R. Brout, \u201cBroken symmetry and the mass of gauge vector mesons\u201d, Phys. Rev. Lett. 13 (1964) 321, doi:10.1103/PhysRevLett.13.321.'],
'texkey': ['Englert:1964et'],
'year': ['1964'],
}
Notes
refextract depends on
Acknowledgments
refextract is based on code and ideas from the following people, who
contributed to the docextract module in Invenio:
- Alessio Deiana
- Federico Poli
- Gerrit Rindermann
- Graham R. Armstrong
- Grzegorz Szpura
- Jan Aage Lavik
- Javier Martin Montull
- Micha Moskovic
- Samuele Kaplun
- Thorsten Schwander
- Tibor Simko
License
GPLv2
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file refextract-1.1.6.tar.gz.
File metadata
- Download URL: refextract-1.1.6.tar.gz
- Upload date:
- Size: 259.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1cfd235286f1e77af9992c493a3fab83bd3c6d69e91962f0c8c97dae45dc226
|
|
| MD5 |
bee3ba760883bd8dce08ad1f9caaa216
|
|
| BLAKE2b-256 |
f25dec25190dd00f7121eebcde4656402c59ee565f88adcee40e1c8f8e602c00
|
File details
Details for the file refextract-1.1.6-py3-none-any.whl.
File metadata
- Download URL: refextract-1.1.6-py3-none-any.whl
- Upload date:
- Size: 276.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fab1374a91e264dc23fac81f3b7ab31fcd4bd970756b9d4417974640fa03e77
|
|
| MD5 |
ec803f8993c3e2ec0220679ed4fac2a8
|
|
| BLAKE2b-256 |
bc39f00089a804db6b1516568a7479a816dd413f2d12c526d65e746574634f97
|