Skip to main content

A command-line tool that converts a BibTeX file to a TSV of author-paper pairs for NSF COA forms

Project description

bib2coa

A command-line tool and Python library that converts a BibTeX file (.bib) into a tab-separated file (TSV) of unique author–paper pairs — designed to jumpstart the creation of NSF-compliant Collaborators & Other Affiliations (COA) documents.

What does this do?

  • Parses a .bib file and extracts every author and their associated paper title + year.
  • Merges duplicate entries (same year + title) so each author appears only once.
  • Writes a clean two-column TSV: author<TAB>year - paper title.
  • Optionally prints summary statistics (number of entries, unique papers, unique authors).

What does this NOT do?

It does not automatically look up author affiliations, institutions, or any other metadata beyond what is in the BibTeX file. You will still need to fill those in manually — but this tool saves you from copying and pasting author names out of PDFs.

Installation

From PyPI (stable release)

pip install bib2coa

From source

git clone https://github.com/holehouse-lab/bib2coa.git
cd bib2coa
pip install .

Requires Python 3.6+ and the bibtexparser package (installed automatically).

Command-line usage

Basic usage

bib2coa my_references.bib

This reads my_references.bib and writes coa_initial.tsv in the current directory.

Custom output filename

bib2coa my_references.bib --output my_coa.tsv

Show summary statistics

bib2coa my_references.bib --stats

Example output:

+----------------------+
| Summary statistics   |
+----------------------+
Number of bibtex entries     : 42
Number of unique paper names : 38
Number of unique authors     : 107
Wrote 107 author-paper pairs to coa_initial.tsv

Show version

bib2coa --version

All options

Flag Description
file (positional) Path to the BibTeX .bib file
--output FILENAME Output TSV filename (default: coa_initial.tsv)
--stats Print summary statistics after processing
--version Show version and exit

Python API

You can also use bib2coa as a library in your own scripts:

from bib2coa import parse_bibtex, write_tsv

# Parse a BibTeX file
unique_pairs, stats = parse_bibtex("my_references.bib")

# unique_pairs is a list of (author, paper_label) tuples
for author, paper in unique_pairs:
    print(f"{author}  ->  {paper}")

# stats is a dict with keys:
#   'num_entries', 'num_unique_papers', 'num_unique_authors'
print(f"Found {stats['num_unique_authors']} unique authors")

# Write to a TSV file
write_tsv(unique_pairs, "coa_initial.tsv")

parse_bibtex(filepath)

Parse a BibTeX file and return unique author–paper pairs.

Parameters:

  • filepath (str) — Path to a .bib file.

Returns:

  • unique_pairs (list of tuple) — Each tuple is (author_name, paper_label) where paper_label is formatted as "YEAR - Title".
  • stats (dict) — Keys: num_entries, num_unique_papers, num_unique_authors.

Raises:

  • Bib2COAException — If the file does not exist or cannot be parsed.

write_tsv(unique_pairs, output_path)

Write author–paper pairs to a tab-separated file.

Parameters:

  • unique_pairs (list of tuple) — Output from parse_bibtex().
  • output_path (str) — Path for the output TSV file.

Bib2COAException

Custom exception raised when a file is missing or a BibTeX file cannot be parsed.

Output format

The output TSV has two columns separated by a tab character:

Smith, John     2020 - A study of protein folding
Doe, Jane       2020 - A study of protein folding
Garcia, Maria   2021 - Phase separation in the cell
Kumar, Raj      2019 - Intrinsically disordered regions
  • Column 1: Author name (as it appears in the BibTeX file).
  • Column 2: Year and cleaned title (LaTeX braces {} are stripped).

Each author appears exactly once, associated with the first paper where they were encountered.

Recommended workflow

Below is the workflow we recommend for making an NSF-compliant COA file.

1. Build your references

Use your reference manager (we recommend PaperPile) to select all the references you want to process and export them as a single BibTeX file.

For PaperPile, select all the papers you want to include, copy the citations as BibTeX keys to your clipboard, and save them as a .bib file (e.g. nsf.bib).

paperpile_screenshot

2. Run bib2coa

bib2coa --stats nsf.bib

This generates coa_initial.tsv.

3. Edit in a spreadsheet

Open coa_initial.tsv in Excel, Google Sheets, or Numbers using a tab delimiter. From there, reorganize and add affiliations to match the appropriate table in the NSF COA template.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bib2coa-1.2.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bib2coa-1.2-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file bib2coa-1.2.tar.gz.

File metadata

  • Download URL: bib2coa-1.2.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bib2coa-1.2.tar.gz
Algorithm Hash digest
SHA256 31b8b7fbf45abcab19250032c4071fdd0d78bd35c71f3f7b1b44d8ac47ba3962
MD5 57e8210a9e735254acfff7fc48610781
BLAKE2b-256 aac76415930ff933e652a9f98d70b465c903cca004e1a510954955fa44b7bc2e

See more details on using hashes here.

File details

Details for the file bib2coa-1.2-py3-none-any.whl.

File metadata

  • Download URL: bib2coa-1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for bib2coa-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 553c86b680319409a08a6b1ae0ab11476a9ae89cbdb850884737eb8f17406c59
MD5 aa41da1e57f2f7d1fef32665a6f781d0
BLAKE2b-256 c200b557cf596d647870307ec0a03c8bd8e149675ec8d14a0091f99bb1cbf827

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page