Set of Python tools for the RATOM project
Python library and supporting utilities to parse and process PST and MBOX email sources.
This project is under development
Libratom requires Python 3.6 or newer, and can be installed via the Python Package Index (PyPI). Installing via pip will automatically install all required dependencies.
To install and test this software in a new Python virtual environment in Ubuntu 16.04LTS or newer:
Make sure Python 3.6 or newer, python3-pip, and python3-venv are installed:
sudo apt install python3 python3-pip python3-venv
Create and activate a Python virtual environment:
python3 -m venv venv source venv/bin/activate
Make sure pip is upgraded to the latest version:
pip install --upgrade pip
pip install libratom
Libratom provides a CLI with planned support for a range of email processing tasks. Currently, the CLI supports entity extraction from individual PST files and directories of PST files.
To see available commands, type:
(venv) user@host:~$ ratom -h
To see detailed help for the entity extraction command, type:
(venv) user@host:~$ ratom entities -h
To run the extractor with default settings over a PST file or directory of PST files, type the following:
(venv) user@host:~$ ratom entities -p /path/to/PST-file-or-directory
Progress is displayed in a bar at the bottom of the window. To terminate a job early and shut down all workers, type Ctrl-C.
By default, the tool will use the spaCy en_core_web_sm model, and will start as many concurrent jobs as there are virtual cores available. Entities are written to a sqlite3 file automatically named using the existing file or directory name and current datetime stamp, and with the following single-table schema:
sqlite> .schema CREATE TABLE entities ( id INTEGER NOT NULL, text VARCHAR, label_ VARCHAR, filename VARCHAR, message_id INTEGER, PRIMARY KEY (id) );
In this schema, id is the primary key, text is the entity instance, label_ is the entity type, filename is the PST file associated with this message and entity instance, message_id is the PST-internal identifier for the message.
Additional libratom use cases
More usage documentation will appear here as the project matures. For now, you can try out some of the functionality in Jupyter notebooks we've prepared at:
Logos, documentation, and other non-software products of the RATOM team are distributed under the terms of Creative Commons 4.0 Attribution. Software items in RATOM repositories are distributed under the terms of the MIT License. See the LICENSE file for additional details.
© 2019, The University of North Carolina at Chapel Hill.
Development Team and Support
Developed by the RATOM team at the University of North Carolina at Chapel Hill.
See https://ratom.web.unc.edu for additional project details, staff bios, and news.
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size libratom-0.1.2.dev22-py3-none-any.whl (18.5 kB)||File type Wheel||Python version py3||Upload date||Hashes View|
|Filename, size libratom-0.1.2.dev22.tar.gz (32.0 kB)||File type Source||Python version None||Upload date||Hashes View|
Hashes for libratom-0.1.2.dev22-py3-none-any.whl