Convert EndNote XML to CSV with streaming parse and TXT report.
Project description
EndNote Utils
Convert EndNote XML files into clean CSVs with automatic TXT reports.
Supports both Python API and command-line interface (CLI).
Features
- ✅ Parse one XML file (
--xml) or an entire folder of*.xml(--folder) - ✅ Streams
<record>elements usingiterparse(low memory usage) - ✅ Extracts fields:
database, ref_type, title, journal, authors, year, volume, number, abstract, doi, urls, extracted_date - ✅ Adds a
databasecolumn from the XML filename stem (IEEE.xml → IEEE) - ✅ Normalizes DOI (
10.xxxx→https://doi.org/...) - ✅ Always generates a TXT report (default:
<csv>_report.txt) with:- per-file counts (exported/skipped records)
- totals, files processed
- run timestamp & duration
- ✅ Auto-creates output folders if missing
- ✅ CLI options for CSV formatting, filters, verbosity
- ✅ Importable Python API for scripting & integration
Installation
From PyPI
pip install endnote-utils
Requires Python 3.8+.
Usage
Command Line
Single file
endnote-utils --xml data/IEEE.xml --csv output/ieee.csv
Folder with multiple files
endnote-utils --folder data/xmls --csv output/all_records.csv
Custom report path
endnote-utils \
--xml data/Scopus.xml \
--csv output/scopus.csv \
--report reports/scopus_run.txt
If --report is not provided, it defaults to <csv>_report.txt.
CLI Options
| Option | Description | Default |
|---|---|---|
--xml |
Path to a single EndNote XML file | – |
--folder |
Path to a folder containing multiple *.xml files |
– |
--csv |
Output CSV path | – |
--report |
Output TXT report path | <csv>_report.txt |
--delimiter |
CSV delimiter | , |
--quoting |
CSV quoting: minimal, all, nonnumeric, none |
minimal |
--no-header |
Suppress CSV header row | – |
--encoding |
Output CSV encoding | utf-8 |
--ref-type |
Only include records with this ref_type name |
– |
--year |
Only include records with this year | – |
--max-records |
Stop after N records per file (useful for testing) | – |
--verbose |
Verbose logging with debug details | – |
Example Report
Run started: 2025-09-11 14:30:22
IEEE.xml: 120 exported, 0 skipped
Scopus.xml: 95 exported, 2 skipped
TOTAL exported: 215
Files processed: 2
Duration: 3.14 seconds
Python API
You can also use it directly in Python scripts:
from pathlib import Path
from endnote_utils import export, export_folder
# Single file
total, csv_out, report_out = export(
Path("data/IEEE.xml"), Path("output/ieee.csv")
)
# Folder
total, csv_out, report_out = export_folder(
Path("data/xmls"), Path("output/all.csv"),
ref_type="Conference Proceedings", year="2024"
)
Development Notes
- Pure Python, uses only standard library (
argparse,csv,xml.etree.ElementTree,logging,pathlib). - Streaming XML parsing avoids high memory usage.
- Robust error handling: skips malformed records but logs them in verbose mode.
- Follows PEP 621 packaging (
pyproject.toml).
License
MIT License © 2025 Minh Quach
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file endnote_utils-0.1.1.tar.gz.
File metadata
- Download URL: endnote_utils-0.1.1.tar.gz
- Upload date:
- Size: 6.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
044df3fd26f3ff45a4e81ad7b20c4cbf4bd8b48b6d774a1410a0f7f2505dbf92
|
|
| MD5 |
7f7011e47f6814c8123e45bc0910a573
|
|
| BLAKE2b-256 |
050d6d8276466e1ac7dd1ec668d42086f232dbd94d420adbd2d7f06784c3ac0c
|
File details
Details for the file endnote_utils-0.1.1-py3-none-any.whl.
File metadata
- Download URL: endnote_utils-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
456cc8f2bf8a229b2ba851aaf70f789234d4b459aebc82704d8ac7effad114c5
|
|
| MD5 |
8c86964866faa3721e3902561d997f0e
|
|
| BLAKE2b-256 |
ab3e2a24ba679b50ad4a26e1ab82341fb7969254d6d86540e4a02d718784c4ca
|