Skip to main content

Extract survey content and bibliography from arXiv papers

Project description

bibextract

codecov tests image image PyPI version

A Python package (with Rust backend) for extracting survey content and bibliography from arXiv papers.

There are a lot of ArXiv MCP tools already. This is another.

What it does differently is that it extracts content directly from the LaTeX source of the paper, rather than parsing the PDF.

It also focuses entirely on survey/background/related work sections. Right now this tool will ignore all the other sections.

Once it extracts the content, it also extracts looks at the BBL file and tries to reconstruct the .bibtex file and normalise the entries. Not all BBL files work (see the tests/bbls for examples). Once it has a title/author/year, it will try to look up the arXiv ID or DOI of the paper, and use that in the bibtex entry instead of the raw entry from the BBL file.

This citation normalisation means that you can pass multiple papers to it and it will extract the related work content and bibliography from all of them, merging them into a single output, with limited overlap.

The goal of this tool is to make it easy to get LLM agents to read/cite/write background sections of papers. In a loop, an agent could read a paper, extract the related work section, and then use all the ArXiv IDs in that section to extract the related work sections of those papers, and so on. This way, you can build a large corpus of related work content without having to manually search for papers.

Some future todos

  • push to Smithery
  • improve test coverage
  • add more .bbl files to tests
  • improve the MCP docs for the tool
  • add a CLI binding to run directly with uvx

Installation

Installing via Smithery

To install bibextract for Claude Desktop automatically via Smithery:

npx -y @smithery/cli install @gautierdag/bibextract --client claude

fastMCP server implementation

uv run bibextract_mcp.py

fastMCP from URL

# obviously check the file before running it, don't trust random scripts from the internet
uv run --python 3.12 https://raw.githubusercontent.com/gautierdag/bibextract/refs/heads/main/bibextract_mcp.py

From PyPI

uv add bibextract

From Source

  1. Install Rust (if not already installed):

    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
    source ~/.cargo/env
    
  2. Install maturin:

    pip install maturin
    
  3. Clone and build:

    git clone https://github.com/gautier/bibextract.git
    cd bibextract
    maturin develop
    

Usage

Python API

import bibextract

# Process one or more arXiv papers
result = bibextract.extract_survey(['2104.08653', '1912.02292'])

# Access the extracted content
survey_text = result['survey_text']  # Raw LaTeX with sections
bibtex = result['bibtex']           # BibTeX bibliography

# Save to files
with open('survey.tex', 'w') as f:
    f.write(survey_text)

with open('bibliography.bib', 'w') as f:
    f.write(bibtex)

Command Line (original Rust binary)

# Build the CLI tool
cargo build --release

# Process papers
./target/release/bibextract --paper-ids 2104.08653 1912.02292 --output survey.tex

Development

Running Tests

cargo test
pytest tests

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bibextract-0.1.1.tar.gz (69.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bibextract-0.1.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

bibextract-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded PyPymanylinux: glibc 2.17+ x86-64

bibextract-0.1.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

bibextract-0.1.1-cp313-cp313-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.13Windows x86-64

bibextract-0.1.1-cp313-cp313-win32.whl (1.8 MB view details)

Uploaded CPython 3.13Windows x86

bibextract-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

bibextract-0.1.1-cp313-cp313-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

bibextract-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

bibextract-0.1.1-cp312-cp312-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.12Windows x86-64

bibextract-0.1.1-cp312-cp312-win32.whl (1.8 MB view details)

Uploaded CPython 3.12Windows x86

bibextract-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

bibextract-0.1.1-cp312-cp312-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

bibextract-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.12macOS 10.12+ x86-64

bibextract-0.1.1-cp311-cp311-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.11Windows x86-64

bibextract-0.1.1-cp311-cp311-win32.whl (1.8 MB view details)

Uploaded CPython 3.11Windows x86

bibextract-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

bibextract-0.1.1-cp311-cp311-macosx_11_0_arm64.whl (1.9 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

bibextract-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

bibextract-0.1.1-cp310-cp310-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.10Windows x86-64

bibextract-0.1.1-cp310-cp310-win32.whl (1.8 MB view details)

Uploaded CPython 3.10Windows x86

bibextract-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

bibextract-0.1.1-cp39-cp39-win_amd64.whl (1.9 MB view details)

Uploaded CPython 3.9Windows x86-64

bibextract-0.1.1-cp39-cp39-win32.whl (1.8 MB view details)

Uploaded CPython 3.9Windows x86

bibextract-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

bibextract-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.3 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

File details

Details for the file bibextract-0.1.1.tar.gz.

File metadata

  • Download URL: bibextract-0.1.1.tar.gz
  • Upload date:
  • Size: 69.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for bibextract-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ec661c1348f3876665bf19441056bba936e947a5fc87ae6e6015d5d5d30ebff
MD5 e2749348c32834cdf8e80a0c98d922c9
BLAKE2b-256 09e55c4d124d70ae15f2a8f04ac378392a02b894fc6c293227eeec6e46a5161f

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-pp311-pypy311_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b90a966f23c65e63a18035514dd27ffd10c591ebfd36acbf3742c7f570ac5a51
MD5 10c09203c6b7d5333c7bbb70b29fbd66
BLAKE2b-256 97393d609d254adb1a5124a27286839f28055b1de25a3f4feccb0fffa5e50a09

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3e92ebd368d12ca7a082e8acc7e45289cd7d42d0c5a50f33d3543e9103f6cc3a
MD5 06ecd231bb036a630974ba5949bb4961
BLAKE2b-256 20d10136e1653df4c943ad710741f70cbf804ca8e5f297277b7ff97acee39f92

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 939fcc414d8602b113ced9ad6e4cc7ef279b8b5b614f1c263f68b1e096721bd2
MD5 eca7fff53f88a70d03b940265375e8bc
BLAKE2b-256 10383cd5e07565b0eb973d706c9e39143d4817d18838d1aaf76cf67085a8eaa6

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 14d2981a2773a47750d2413679b8e35004ddcf2e88d1afc2b88fce19733273de
MD5 0ad9408f0972a0e2314e7e49ad2bad7f
BLAKE2b-256 67d3c154e6dedd46325411fc67ecd716c94a4743d00e3776ead6ffcd27b13fc3

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp313-cp313-win32.whl.

File metadata

  • Download URL: bibextract-0.1.1-cp313-cp313-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.13, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for bibextract-0.1.1-cp313-cp313-win32.whl
Algorithm Hash digest
SHA256 c1c3ad2d2de8c1fc18288908c079856028c41438df54f261a0a0f2262d89d636
MD5 9ad7efafc3161855270e4bba3e918faf
BLAKE2b-256 3718c4764585a18fff77ac46e5688a407e43682b27c182c16ee817fa65af7df8

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c950b960dd9d677d1dce920fa10f882c462cd30f3f658eac9655e3ed14dd9c79
MD5 ecbdf6099a41c568743f2e3d159d6bad
BLAKE2b-256 2976d2f26648b047a66a471ec3365c5d67ba6c93b8a32acee1a6a8fd35e6a70b

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4ffe313b1407edfdcb6134d7510747f518ed669e9cd617b163dce33d05066197
MD5 56d86c69e1980b1c24d5b25d205b9524
BLAKE2b-256 01de83e5f7ecb94d18368814ab65e2c7b3c436196d00345c077ff56cff41a508

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 b36869b37588419b2e056a6c047a4a89e889b36a28fbe5e3ba08514bb6e3443f
MD5 ca53866d53ad96287f0261431b4ffdfe
BLAKE2b-256 676178fc9b32ed41ef1e9b7cc9570ee6d71541e57abf879022167e85cf10e61c

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 b1c47380a47b1db8b595a25a5e2d3d6c3345378f8a44d203f76dc82a8adc028b
MD5 9c02d22f72074e214db3bd0f41e818da
BLAKE2b-256 a10e0526a2adaa3b49b60667b020d9f9684ef9ec05cfedc06cd423b6c36d32c9

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp312-cp312-win32.whl.

File metadata

  • Download URL: bibextract-0.1.1-cp312-cp312-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.12, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for bibextract-0.1.1-cp312-cp312-win32.whl
Algorithm Hash digest
SHA256 3696e304e3c8f32dc6cc7bccf6626d727b3a7224c0ae4bf1393c19f8ce81f946
MD5 0c8b6c69128e1d52866aed0f2ce0abda
BLAKE2b-256 0a208593ce6e0166945d6a58401fbdd99a2ae8fb7026d8a441aaaba98eb480c3

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9189a28e14e81a98ecfeb2d8bcbf1906415003f2e1e9ed37088928d89de84c5c
MD5 7b048de1708a0f9636d6dce3b6ccc646
BLAKE2b-256 e3481a7ac6701a95165686798ef4d4e3b0a7296c8b3c1152b04088dccbc41565

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 bdc8aa089ecc38200ebff5053b5ae771bdcbb80ff556ef8723bad82107f5cf04
MD5 2d80f85bef27941c495e4f2fd092b9f4
BLAKE2b-256 0920f3aa426e3ca9c656389f1dca405e8f20fccef1dba3b5088265051431e544

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp312-cp312-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 fb4ac154c774af187624e953b201c6ebd15a942dedc3db3eb8eaf9982db3b174
MD5 cbaf000a55dee6dc61359cd36c4b1554
BLAKE2b-256 289132fb9dabb12a8ffc33b2cdf4b3db13c459ebfb27c2347edf4da49057ae91

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 5a664888373f6d34fae350945e10c45d289e6e2f7be5bf6977c0df06ef659a23
MD5 2808daf7a839645ae3c8d650364b8e3c
BLAKE2b-256 c458f6e636afec57462fa3456e016e92341c545993cc4403bee0dac2493316a6

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp311-cp311-win32.whl.

File metadata

  • Download URL: bibextract-0.1.1-cp311-cp311-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.11, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for bibextract-0.1.1-cp311-cp311-win32.whl
Algorithm Hash digest
SHA256 f7983ab8da7ce0c48e223dcd16f5b66e5b822c093072ed3da4774a2e0f291989
MD5 a90492f85a460e5cd11fc0db150615ba
BLAKE2b-256 185a73d38f697195d1eb4d5bf17d6857aeedc8933938e51f65b99bb155d241c5

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 64efa54aab9ed8db93fa165e4634a323902581dc5acec415f297ad6f14c5cba0
MD5 bf141cdd7ac8ec9ea8f45271aad1215c
BLAKE2b-256 805f102a42b672733b21486541243556a1392a723bff6bbe86e209d23fd5bd73

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 32cbe7df869673579a73f81b0443bab2efa6b59509f470e64175b42f2b28ec85
MD5 0ea9af665e17ed1f311cd9e78b26a437
BLAKE2b-256 b1da3a6bd7a42e5c93ae1dac97ae26dc6f1319fb920f8952ae94e4e46eac2f3f

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 2ddebc578271428a6c7503873eccf4c12d17d4fd9e3711116a42a801653ffcff
MD5 fb202996488d1a5f526dc7fe4e5b831e
BLAKE2b-256 f8902c3caca348cf36e43d366fd2414e9762d6b040f4ae350f9d2893d3957d98

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 7daf2970b8da2ce5cf1eb9cae5e6aa0e5ab3589c9798caae07fe38a58139c96a
MD5 3e233f90beacd000c0e104bc15a9fb88
BLAKE2b-256 463630f3adc4eecac5d8c3e136e91f8defb88dd09ffb9cedbd8a67b44af56ec9

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp310-cp310-win32.whl.

File metadata

  • Download URL: bibextract-0.1.1-cp310-cp310-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.10, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for bibextract-0.1.1-cp310-cp310-win32.whl
Algorithm Hash digest
SHA256 13c5b68e3ae2faad011e6013b6c10ea9dba3ee165c1195af94c816d6a881c62d
MD5 b1276861d2e589a61fd0af00f5702209
BLAKE2b-256 8c54ca34352085da957c0a4cde13f7f68c8ee6f385bda30b00bf03ff80776abf

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3d185379437e68c56ec78bb9eee3331141e9fa74639b5a61da217ef945ad1d81
MD5 c120abf96bc8736657095aca191aa347
BLAKE2b-256 376385e354094b1ddbbe8a5ae3012c9fd92fe1c503bb9d8b2eaf91aa0b8a1f5d

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 757a27969d403a009e44341fa67f8d69e4307fce42e91d7280cee7fda3169b28
MD5 d56d265482c48d5abd51845e9fe11df3
BLAKE2b-256 677b3030b34186f1d39152943b36b137170e6371566e3e0c447d1f1e6433e874

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp39-cp39-win32.whl.

File metadata

  • Download URL: bibextract-0.1.1-cp39-cp39-win32.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: CPython 3.9, Windows x86
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.9.0

File hashes

Hashes for bibextract-0.1.1-cp39-cp39-win32.whl
Algorithm Hash digest
SHA256 2a733260caa80736b64cbc40669ded026efb3aca84c64d42c87739d45e0262f2
MD5 0dc141c9b64ab7de463476292335816f
BLAKE2b-256 93b2c143df1d1ad197a87f127a274cf70e2857adb627e67738925d67a32717e0

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 32fb7d8c59e9b784a2b844a84be3812f39426046637d0c72878beabe1260a025
MD5 2af508763a23d49f61794a9e8ea7fd94
BLAKE2b-256 ba3a6c0c15f9a5d5c654e20f74b04e3111d75a7ee132a7dd8df80385119b3d8b

See more details on using hashes here.

File details

Details for the file bibextract-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bibextract-0.1.1-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8c2a7bec5f47f324a0abf7ceb694f9d8c4bd5ffd50200dd5351fb0214208b487
MD5 e3378a2eb1e644077d723bd68c8f6123
BLAKE2b-256 55bff0ea6348fdd87c4e767c347dbbc1510f6dd890153c04d807cdd5f4367446

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page