Skip to main content

ACT SCIO

Project description

act-scio2

Scio v2 is a reimplementation of Scio in Python3.

Scio uses tika to extract text from documents (PDF, HTML, DOC, etc).

The result is sent to the Scio Analyzer that extracts information using a combination of NLP (Natural Language Processing) and pattern matching.

Source code

The source code the workers are available on github.

Setup

To setup, first install from PyPi:

sudo pip3 install act-scio

You will also need to install beanstalkd. On debian/ubuntu you can run:

sudo apt install beanstalkd

You then need to install NLTK data files. A helper utility to do this is included:

scio-nltk-download

You will also need to create a default configuration:

scio-config user

API

To run the api, execute:

scio-api

This will setup the API on 127.0.0.1:3000. Use --port <PORT> and --host <IP> to listen on another port and/or another interface.

Configuration

You can create a default configuration using this command (should be run as the user running scio):

scio-config user

Common configuration can be found under ~/.config/scio/etc/scio.ini

Running Manually

Scio Tika Server

The Scio Tika server reads jobs from the beanstalk tube scio_doc and the extracted text will be sent to the tube scio_analyze.

The first time the server runs, it will download tika using maven. It will use a proxy if $https_proxy is set.

scio-tika-server

Scio Analyze Server

Scio Analyze Server reads (by default) jobs from the beanstalk tube scio_analyze.

scio-analyze

You can also read directly from stdin like this:

echo "The companies in the Bus; Finanical, Aviation and Automobile industry are large." | scio-analyze --beanstalk=

Running as a service

Systemd compatible service scripts can be found under examples/systemd.

To install:

sudo cp examples/systemd/*.service /usr/lib/systemd/system
sudo systemctl enable scio-tika-server
sudo systemctl enable scio-analyze
sudo service start scio-tika-server
sudo service start scio-analyze

Local development

Use pip to install in local development mode. act-scio uses namespacing, so it is not compatible with using setup.py install or setup.py develop.

In repository, run:

pip3 install --user -e .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

act-scio-0.0.14.tar.gz (2.4 MB view details)

Uploaded Source

File details

Details for the file act-scio-0.0.14.tar.gz.

File metadata

  • Download URL: act-scio-0.0.14.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0 requests-toolbelt/0.8.0 tqdm/4.48.2 CPython/3.6.12

File hashes

Hashes for act-scio-0.0.14.tar.gz
Algorithm Hash digest
SHA256 825f009fbde058fac410773e9441a084659ab357e77279512d046e1b172031d6
MD5 d77810b81c0b3a59804393c6705a814c
BLAKE2b-256 c0c59d7b7f4e704568a3f57e0a16e01e3386eca2c5914f1e73d01f8488bae0e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page