A robust tool for converting scientific literature CSV files to OSCAP-compatible format

These details have not been verified by PyPI

Project links

Project description

Oscapify

A robust tool for converting scientific literature CSV files to OSCAP-compatible format. Oscapify processes neuroscience connectivity data from PubMed/PMC sources, validates headers, retrieves DOIs, and handles errors gracefully.

Features

Intelligent Header Validation: Automatically detects and corrects common header issues
Flexible Header Mapping: Support for custom column names and formats
DOI Retrieval: Fetches DOIs from NCBI API with built-in caching
Error Recovery: Continues processing even when individual records fail
Detailed Debugging: Comprehensive logging and header analysis tools
Batch Processing: Process multiple files or entire directories
Performance: Persistent caching and efficient batch operations

Installation

pip install oscapify

Development Installation

git clone https://github.com/yourusername/oscapify.git
cd oscapify
pip install -e ".[dev]"

Quick Start

Basic Usage

# Process a single file
oscapify process input.csv

# Process multiple files
oscapify process file1.csv file2.csv

# Process all CSV files in a directory
oscapify process /path/to/csv/directory/

# Specify output directory
oscapify process input.csv --output ./results

Header Validation and Debugging

# Validate CSV headers and see debugging info
oscapify validate input.csv

# Get header mapping suggestions
oscapify validate input.csv --suggest-mappings

Custom Header Mapping

If your CSV files use different column names:

oscapify process input.csv \
  --header-pmid "PubMedID" \
  --header-sentence "text" \
  --preserve-fields "custom_field1" "custom_field2"

Expected Input Format

Oscapify expects CSV files with the following columns (case-insensitive):

pmid - PubMed ID
sentence - Text content
pmcid (optional) - PubMed Central ID
pubmed_url (optional) - URL to PubMed/PMC article

Additional columns are preserved in the output.

Example Input CSV

ID,pmid,pmcid,sentence,structure_1,structure_2,relation,score,pubmed_url
1,12345678,PMC1234567,"The brain connects to the spinal cord.",brain,spinal cord,connects,0.95,https://pubmed.ncbi.nlm.nih.gov/12345678/

Output Format

Oscapify outputs CSV files with OSCAP-compatible formatting:

id - Unique identifier (format: nlp-{index}-{date})
pmid - PubMed ID
pmcid - PubMed Central ID
doi - Digital Object Identifier (retrieved from NCBI)
sentence - Original text
batch_name - Processing batch identifier
sentence_id - Sentence identifier
out_of_scope - "yes" if DOI couldn't be retrieved, "no" otherwise

Advanced Features

Cache Management

# View cache statistics
oscapify cache-stats

# Clear the DOI cache
oscapify clear-cache

Error Handling Options

# Stop on first error (strict mode)
oscapify process input.csv --strict

# Disable caching for testing
oscapify process input.csv --no-cache

# Skip header validation
oscapify process input.csv --no-validation

Debug Mode

# Enable detailed debug logging
oscapify process input.csv --debug

Python API

from oscapify import OscapifyProcessor
from oscapify.models import ProcessingConfig

# Create configuration
config = ProcessingConfig(
    output_dir="./output",
    batch_name="my_batch"
)

# Process files
processor = OscapifyProcessor(config)
stats = processor.process_files(["input1.csv", "input2.csv"])

# Check results
print(f"Processed {stats.processed_files} files")
print(f"Total records: {stats.total_records}")
print(f"DOI lookups: {stats.successful_doi_lookups} successful, {stats.failed_doi_lookups} failed")

Configuration

Custom Header Mapping via API

from oscapify.models import HeaderMapping, ProcessingConfig

# Define custom mapping
header_mapping = HeaderMapping(
    pmid="PubMedID",
    sentence="abstract_text",
    pmcid="PMC_ID",
    preserve_fields=["experiment_type", "confidence_score"]
)

config = ProcessingConfig(
    header_mapping=header_mapping
)

Troubleshooting

Common Issues

Missing Headers Error

# Check what headers are in your file
oscapify validate problematic.csv

# Use suggested mappings
oscapify validate problematic.csv --suggest-mappings

DOI Retrieval Failures
- Check your internet connection
- The tool implements rate limiting (3 requests/second) for NCBI API compliance
Encoding Errors
- Oscapify automatically tries multiple encodings
- If issues persist, convert your CSV to UTF-8

Getting Help

# View all commands and options
oscapify --help

# View help for specific command
oscapify process --help

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use Oscapify in your research, please cite:

@software{oscapify,
  author = {Troy Sincomb},
  title = {Oscapify: A tool for converting scientific literature CSV files to OSCAP format},
  year = {2025},
  url = {https://github.com/yourusername/oscapify}
}

Acknowledgments

Uses the NCBI E-utilities API for DOI retrieval
Built with Click for CLI interface
Pandas for data processing
Pydantic for data validation

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Jul 17, 2025

This version

0.1.0

Jul 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oscapify-0.1.0.tar.gz (18.6 kB view details)

Uploaded Jul 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oscapify-0.1.0-py3-none-any.whl (19.5 kB view details)

Uploaded Jul 17, 2025 Python 3

File details

Details for the file oscapify-0.1.0.tar.gz.

File metadata

Download URL: oscapify-0.1.0.tar.gz
Upload date: Jul 17, 2025
Size: 18.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.10.5 Darwin/24.5.0

File hashes

Hashes for oscapify-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`53bdc011d56d1a17d5851393d3df9d34829c48a7e29f8b617bb41cd6e6566a03`
MD5	`4a56c6c64ab46d52001f3bf6adb12b72`
BLAKE2b-256	`6cb3ebddc9615bb4e07a3564d6c9d352b4b4f5886183af27588ceaf31581b47e`

See more details on using hashes here.

File details

Details for the file oscapify-0.1.0-py3-none-any.whl.

File metadata

Download URL: oscapify-0.1.0-py3-none-any.whl
Upload date: Jul 17, 2025
Size: 19.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.10.5 Darwin/24.5.0

File hashes

Hashes for oscapify-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0333d17142e1d182cb724685843315c6d51a03981346c91dffb82a0dd9e9111a`
MD5	`83478989e14f0de164c3ff586852e6b0`
BLAKE2b-256	`1395b1d2d3d21047da97abd67333002c1ccc67e00b094cd7c31a9ab2a9070751`

See more details on using hashes here.

oscapify 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Oscapify

Features

Installation

Development Installation

Quick Start

Basic Usage

Header Validation and Debugging

Custom Header Mapping

Expected Input Format

Example Input CSV

Output Format

Advanced Features

Cache Management

Error Handling Options

Debug Mode

Python API

Configuration

Custom Header Mapping via API

Troubleshooting

Common Issues

Getting Help

Contributing

License

Citation

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes