Skip to main content

Set of Python tools for the RATOM project

Project description

Logo

libratom

Build Status codecov Codacy Badge

Python library and supporting utilities to parse and process PST and MBOX email sources.

This project is under development

Installation

Libratom requires Python 3.6 or newer, and can be installed via the Python Package Index (PyPI). Installing via pip will automatically install all required dependencies.

To install and test this software in a new Python virtual environment in Ubuntu 16.04LTS or newer:

Make sure Python 3.6 or newer, python3-pip, and python3-venv are installed:

sudo apt install python3 python3-pip python3-venv

Create and activate a Python virtual environment:

python3 -m venv venv
source venv/bin/activate

Make sure pip is upgraded to the latest version:

pip install --upgrade pip

Install libratom:

pip install libratom

Entity extraction

Libratom provides a CLI with planned support for a range of email processing tasks. Currently, the CLI supports entity extraction from individual PST files and directories of PST files.

To see available commands, type:

(venv) user@host:~$ ratom -h

To see detailed help for the entity extraction command, type:

(venv) user@host:~$ ratom entities -h

To run the extractor with default settings over a PST file or directory of PST files, type the following:

(venv) user@host:~$ ratom entities -p /path/to/PST-file-or-directory

Progress is displayed in a bar at the bottom of the window. To terminate a job early and shut down all workers, type Ctrl-C.

By default, the tool will use the spaCy en_core_web_sm model, and will start as many concurrent jobs as there are virtual cores available. Entities are written to a sqlite3 file automatically named using the existing file or directory name and current datetime stamp, and with the following single-table schema:

sqlite> .schema
CREATE TABLE entities (
	id INTEGER NOT NULL, 
	text VARCHAR, 
	label_ VARCHAR, 
	filename VARCHAR, 
	message_id INTEGER, 
	PRIMARY KEY (id)
);

In this schema, id is the primary key, text is the entity instance, label_ is the entity type, filename is the PST file associated with this message and entity instance, message_id is the PST-internal identifier for the message.

Additional libratom use cases

More usage documentation will appear here as the project matures. For now, you can try out some of the functionality in Jupyter notebooks we've prepared at:

https://github.com/libratom/ratom-notebooks

License(s)

Logos, documentation, and other non-software products of the RATOM team are distributed under the terms of Creative Commons 4.0 Attribution. Software items in RATOM repositories are distributed under the terms of the MIT License. See the LICENSE file for additional details.

Development Team and Support

Product of the RATOM team. See https://ratom.web.unc.edu for up to date information.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

libratom-0.1.1.dev19.tar.gz (29.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

libratom-0.1.1.dev19-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file libratom-0.1.1.dev19.tar.gz.

File metadata

  • Download URL: libratom-0.1.1.dev19.tar.gz
  • Upload date:
  • Size: 29.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.1

File hashes

Hashes for libratom-0.1.1.dev19.tar.gz
Algorithm Hash digest
SHA256 c7337470ba2480ed1fcd467a5fc70b9996ace74074a4119a92436ee016f0ccc5
MD5 999c1c240cd6cd967ec04187f0ccb96c
BLAKE2b-256 836d093c739447dfc15a25f9d5fff2701098e43e881272f1d356e2ada35f052b

See more details on using hashes here.

File details

Details for the file libratom-0.1.1.dev19-py3-none-any.whl.

File metadata

  • Download URL: libratom-0.1.1.dev19-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.1

File hashes

Hashes for libratom-0.1.1.dev19-py3-none-any.whl
Algorithm Hash digest
SHA256 7d3657561f2110c1dcdfe96a5820a72257c33c9944feed280d93be96107e05d4
MD5 3cef2d328b69ccd985199a1bc8a8aed9
BLAKE2b-256 10098efb9d22cf5782a5427bd34a06dbd0581d1095580d4d32525f739a01caef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page