Skip to main content

Parse NexisUni rtf files into a jsonlines file.

Project description

Nexis Uni Parser

PyPI Status Python Version License

Read the documentation at https://nexis-uni-parser.readthedocs.io/ Tests Codecov

pre-commit Black

This package can be used to convert NexisUni richtext files to jsonlines format.

Features

  • TODO

Requirements

  • TODO

Installation

You can install Nexis Uni Parser via pip from PyPI:

pip install nexis-uni-parser

Usage

There are two main functions that this package provides.

Convert an RTF file to plain text

Converting an RTF file to a plain text file can be achieved directly by using pandoc. That said, I have included a function that will convert an RTF file to a plain text file since it could be useful. Under the hood, it just uses pandoc.

from pathlib import Path
from nexis_uni_parser import convert_rtf_to_plain_text

inputfile = Path.home().joinpath("nexisuni-file.rtf")
output_filepath = convert_rtf_to_plain_text(inputfile)

print(output_filepath)
>>> /Users/name/nexisuni-file.txt

Parse Nexis Uni Files

The parse function can be used to parse a single file or a directory. Both produce a gzipped JSON lines file. I choose to convert to a compressed JSON lines file because the text data can get large if all files are read into memory.

from pathlib import Path
from nexis_uni_parser import parse

inputfile = Path.home().joinpath("nexisuni-file.rtf")

output_filepath = parse(inputfile)

# Reading the data into a pandas dataframe is easy from here.

import pandas as pd

nexisuni_df = pd.read_json(str(output_filepath), compression="gzip", lines=True)

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the MIT license, Nexis Uni Parser is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nexis-uni-parser-0.1.5.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

nexis_uni_parser-0.1.5-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file nexis-uni-parser-0.1.5.tar.gz.

File metadata

  • Download URL: nexis-uni-parser-0.1.5.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.1 Darwin/21.6.0

File hashes

Hashes for nexis-uni-parser-0.1.5.tar.gz
Algorithm Hash digest
SHA256 1e66f761194518913f7c3e5f99700ae781987b94a7c22250091add115423233d
MD5 50000b29414d2e5cabbabc4f6ddb8311
BLAKE2b-256 7e6c724b1096cac7b57bb8b9f81cb920548d50cd0b60bc036fbd20523ffa8c2e

See more details on using hashes here.

File details

Details for the file nexis_uni_parser-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for nexis_uni_parser-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 4d517eaae662e955675f39fc8dad87fac537481a43a236e81682a05f7d7e15ff
MD5 1c5b57bb59cdd160f0bda41485b72d5a
BLAKE2b-256 bf0eb2a35cf5c9d19da3cf6818d6ee4a6a3b47287218b426591eede3641cdd46

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page