Skip to main content

Parse NexisUni rtf files into a jsonlines file.

Project description

Nexis Uni Parser

PyPI Status Python Version License

Read the documentation at https://nexis-uni-parser.readthedocs.io/ Tests Codecov

pre-commit Black

This package can be used to convert NexisUni richtext files to jsonlines format.

Features

  • TODO

Requirements

  • TODO

Installation

You can install Nexis Uni Parser via pip from PyPI:

pip install nexis-uni-parser

Usage

There are two main functions that this package provides.

Convert an RTF file to plain text

Converting an RTF file to a plain text file can be achieved directly by using pandoc. That said, I have included a function that will convert an RTF file to a plain text file since it could be useful. Under the hood, it just uses pandoc.

from pathlib import Path
from nexis_uni_parser import convert_rtf_to_plain_text

inputfile = Path.home().joinpath("nexisuni-file.rtf")
output_filepath = convert_rtf_to_plain_text(inputfile)

print(output_filepath)
>>> /Users/name/nexisuni-file.txt

Parse Nexis Uni Files

The parse function can be used to parse a single file or a directory. Both produce a gzipped JSON lines file. I choose to convert to a compressed JSON lines file because the text data can get large if all files are read into memory.

from pathlib import Path
from nexis_uni_parser import parse

inputfile = Path.home().joinpath("nexisuni-file.rtf")

output_filepath = parse(inputfile)

# Reading the data into a pandas dataframe is easy from here.

import pandas as pd

nexisuni_df = pd.read_json(str(output_filepath), compression="gzip", lines=True)

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, Nexis Uni Parser is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nexis-uni-parser-0.1.3.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distribution

nexis_uni_parser-0.1.3-py3-none-any.whl (7.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page