Hemlock is a way of providing a common data access layer.
Project description
Hemlock
=======
[![PyPI version](https://badge.fury.io/py/hemlock.png)](http://badge.fury.io/py/hemlock) [![Build Status](https://travis-ci.org/Lab41/Hemlock.png?branch=master)](https://travis-ci.org/Lab41/Hemlock) [![downloads](https://pypip.in/d/hemlock/badge.png)](http://crate.io/packages/hemlock/) [![Coverage Status](https://coveralls.io/repos/Lab41/Hemlock/badge.png?branch=master)](https://coveralls.io/r/Lab41/Hemlock?branch=master)
Hemlock is an open-source project exploring ways to create a common data access
layer that eliminates the need to understand underlying data topologies but
still preserving the requirements of each data source such as access control,
performance, and formats.
![Hemlock L](https://raw.github.com/Lab41/Hemlock/master/docs/images/overview_hemlock.png "Hemlock")
Install instructions
====================
Option A, install using pip:
```bash
sudo pip install hemlock
```
Option B, build from source:
```bash
git clone https://github.com/Lab41/Hemlock.git
cd Hemlock
sudo python setup.py install
```
Required Dependencies
---------------------
Python modules:
- [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html)
- [texttable](https://pypi.python.org/pypi/texttable)
- [couchbase](http://www.couchbase.com/communities/python/getting-started) >= 1.0
- [APScheduler](https://pypi.python.org/pypi/APScheduler)
Build a server running [MySQL](http://www.mysql.com/) to store user accounts, tenants, and registered
systems.
Build a [Couchbase 2.0](http://www.couchbase.com/) cluster to store metadata and data of registered systems.
Build an [ElasticSearch 0.90.2](http://www.elasticsearch.org/) cluster to store the index of Couchbase.
Add XDCR one-way replication from Couchbase to ElasticSearch using this [plugin](https://github.com/couchbaselabs/elasticsearch-transport-couchbase) (Note, grab version 1.1.0).
Once the plugin is installed, be sure and update the couchbase_template.json under plugins/transport-couchbase/ to have the following:
```json
{
"template" : "*",
"order" : 10,
"mappings" : {
"couchbaseCheckpoint" : {
"_source" : {
"includes" : ["doc.*"]
},
"date_detection" : false,
"dynamic_templates": [
{
"store_no_index": {
"match": "*",
"mapping": {
"store" : "no",
"index" : "no",
"include_in_all" : false
}
}
}
]
},
"_default_" : {
"_source" : {
"includes" : ["meta.*"]
},
"date_detection" : false,
"properties" : {
"meta" : {
"type" : "object",
"include_in_all" : false
}
}
}
}
}
```
Once that is added, start up ElasticSearch with ``bin/elasticsearch`` and then perform the following the first time:
```bash
curl -XPUT http://localhost:9200/_template/couchbase -d @plugins/transport-couchbase/couchbase_template.json
```
Installing required databases
-----------------------------
1. Create database ``hemlock`` in [MySQL](http://www.mysql.com/).
2. Create bucket ``hemlock`` in [Couchbase](http://www.couchbase.com/).
3. Create index ``hemlock`` in [ElasticSearch](http://www.elasticsearch.org/).
Getting started
----------------
1. Create Hemlock credentials (see 'Credential files')
```bash
HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass
HEMLOCK_ELASTICSEARCH_ENDPOINT=192.168.1.30
```
(if you'd like these to persist, consider adding export before each line and performing ``source`` on the file)
2. Create a tenant, role, user, and data source system
```bash
hemlock tenant-create --name Project1
hemlock tenant-list
hemlock role-create --name User
hemlock role-list
hemlock user-create --name User1 \
--username Username1 \
--email user1@email.com \
--rold_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
hemlock user-list
hemlock register-local-system --name System1 \
--data_type csv \
--description "description" \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
--hostname system1.fqdn \
--endpoint http://hemlock.server/ \
--poc_name user1 \
--poc_email user1@email.com
hemlock system-list
```
3. Add credentials for data source system, for example: mysql_creds
```bash
MYSQL_SERVER=192.168.1.30
MYSQL_DB=db1
#MYSQL_TABLE=table1
MYSQL_USERNAME=user
MYSQL_PW=pass
```
4. Store a client
```bash
hemlock client-store --name mysql_client_1 --type mysql --system_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --credential_file /path/to/mysql_creds
hemlock client-list
```
5. Add credentials for hemlock
```bash
hemlock hemlock-server-store --credential_file /path/to/hemlock_creds
```
6. Create a schedule server (optional)
```bash
hemlock schedule-server-create --name schedule_server_1
hemlock schedule-server-list
```
7. Add a schedule for the data source system to run (optional)
```bash
hemlock client-schedule --name schedule1 \
--minute "54" \
--hour "12" \
--day_of_month "*" \
--month "*" \
--day_of_week "*" \
--client_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
--schedule_server_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
hemlock schedule-list
```
8. Perform a test run for pulling data from the data source system
```bash
hemlock client-run --uuid 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
```
9. Search for data that has been loaded into Hemlock
```bash
hemlock query-data --user 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --query foo
```
or
```bash
Direct with elasticsearch:
http://elasticsearch.fqdn:9200/hemlock/_search?q=foo
Which returns something the following:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 3.6582048,
"hits": [
{
"_index": "hemlock",
"_type": "couchbaseDocument",
"_id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
"_score": 3.6582048,
"_source": {
"meta": {
"id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
"rev": "1-0010f1ac6045ccf40000000000000000",
"flags": 0,
"expiration": 0
}
}
}
]
}
}
Now we can feed the 'id' into Couchbase to return the full document:
http://couchbase.fqdn:8092/hemlock/865f458b4421ae5fd758e3c81aca9f8d8b4696b6
Which returns something like the following:
{
"hemlock-system": "a50b86c2-59f7-42a3-aa67-3367579189fe",
"hemlock-date": "2013-09-03 16:10:20",
"stream": "DOYLIE"
}
```
Credential files
----------------
1. Create a ``hemlock_creds`` file (see hemlock_creds_sample for an example):
```bash
HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass
```
2. Create credential files for each client you intend to use ([examples](https://github.com/Lab41/Hemlock/tree/master/hemlock/clients/)).
Currently supported data sources
================================
Technology | Parameter | Python Module Dependencies
---------- | --------- | ------------
MySQL | mysql | MySQLdb
MongoDB | mongo | pymongo
Redis | redis | redis
Local FileSystem | fs | magic, pdfminer, xmltodict
RESTful API | rest |
Streams | stream_odd |
Adding a new data source type
-----------------------------
Create a new class under the clients folder for each new data source type. Most
classes will need two methods defined: ``connect_client`` and ``get_data``.
The following is a template that can be used to work from:
```python
class HMyclient:
def connect_client(self, client_dict):
# return a handle that can be used to get data from the data source
return c_server
def get_data(self, client_dict, c_server, h_server, client_uuid):
# data_list is an array of arrays to contain the data
data_list = [[]]
# desc_list is an array that contains the schema (if exists or known)
desc_list = []
return data_list, desc_list
```
Usage examples
==============
- Create a tenant
```bash
hemlock tenant-create --name Project1
```
- Create a role
```bash
hemlock role-create --name User
```
- Create a user
```bash
hemlock user-create --name User1 \
--username Username1 \
--email user1@email.com \
--rold_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
```
- Register a local system
```bash
hemlock register-local-system --name System1 \
--data_type csv \
--description "description" \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
--hostname system1.fqdn \
--endpoint http://hemlock.server/ \
--poc_name user1 \
--poc_email user1@email.com
```
- List registered systems
```bash
hemlock system-list
```
- List created users
```bash
hemlock user-list
```
- Lists created tenants
```bash
hemlock tenant-list
```
- [Connecting to a client](https://github.com/Lab41/Hemlock/tree/master/hemlock/clients/)
- [Full CLI API list](https://github.com/Lab41/Hemlock/blob/master/docs/CLI.md)
Related repositories
====================
- [Hemlock-REST](http://lab41.github.io/Hemlock-REST/)
- [Hemlock-Frontend](http://lab41.github.io/Hemlock-Frontend/)
Documentation
=============
- [Docs](http://lab41.github.io/Hemlock/docs/_build/html/index.html)
Tests
=====
The tests for this project use [py.test](http://pytest.org/latest/)
Contributing to Hemlock
=======================
What to contribute? Awesome! Issue a pull request or see more details [here](https://github.com/Lab41/Hemlock/blob/master/CONTRIBUTING.md).
=======
[![PyPI version](https://badge.fury.io/py/hemlock.png)](http://badge.fury.io/py/hemlock) [![Build Status](https://travis-ci.org/Lab41/Hemlock.png?branch=master)](https://travis-ci.org/Lab41/Hemlock) [![downloads](https://pypip.in/d/hemlock/badge.png)](http://crate.io/packages/hemlock/) [![Coverage Status](https://coveralls.io/repos/Lab41/Hemlock/badge.png?branch=master)](https://coveralls.io/r/Lab41/Hemlock?branch=master)
Hemlock is an open-source project exploring ways to create a common data access
layer that eliminates the need to understand underlying data topologies but
still preserving the requirements of each data source such as access control,
performance, and formats.
![Hemlock L](https://raw.github.com/Lab41/Hemlock/master/docs/images/overview_hemlock.png "Hemlock")
Install instructions
====================
Option A, install using pip:
```bash
sudo pip install hemlock
```
Option B, build from source:
```bash
git clone https://github.com/Lab41/Hemlock.git
cd Hemlock
sudo python setup.py install
```
Required Dependencies
---------------------
Python modules:
- [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html)
- [texttable](https://pypi.python.org/pypi/texttable)
- [couchbase](http://www.couchbase.com/communities/python/getting-started) >= 1.0
- [APScheduler](https://pypi.python.org/pypi/APScheduler)
Build a server running [MySQL](http://www.mysql.com/) to store user accounts, tenants, and registered
systems.
Build a [Couchbase 2.0](http://www.couchbase.com/) cluster to store metadata and data of registered systems.
Build an [ElasticSearch 0.90.2](http://www.elasticsearch.org/) cluster to store the index of Couchbase.
Add XDCR one-way replication from Couchbase to ElasticSearch using this [plugin](https://github.com/couchbaselabs/elasticsearch-transport-couchbase) (Note, grab version 1.1.0).
Once the plugin is installed, be sure and update the couchbase_template.json under plugins/transport-couchbase/ to have the following:
```json
{
"template" : "*",
"order" : 10,
"mappings" : {
"couchbaseCheckpoint" : {
"_source" : {
"includes" : ["doc.*"]
},
"date_detection" : false,
"dynamic_templates": [
{
"store_no_index": {
"match": "*",
"mapping": {
"store" : "no",
"index" : "no",
"include_in_all" : false
}
}
}
]
},
"_default_" : {
"_source" : {
"includes" : ["meta.*"]
},
"date_detection" : false,
"properties" : {
"meta" : {
"type" : "object",
"include_in_all" : false
}
}
}
}
}
```
Once that is added, start up ElasticSearch with ``bin/elasticsearch`` and then perform the following the first time:
```bash
curl -XPUT http://localhost:9200/_template/couchbase -d @plugins/transport-couchbase/couchbase_template.json
```
Installing required databases
-----------------------------
1. Create database ``hemlock`` in [MySQL](http://www.mysql.com/).
2. Create bucket ``hemlock`` in [Couchbase](http://www.couchbase.com/).
3. Create index ``hemlock`` in [ElasticSearch](http://www.elasticsearch.org/).
Getting started
----------------
1. Create Hemlock credentials (see 'Credential files')
```bash
HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass
HEMLOCK_ELASTICSEARCH_ENDPOINT=192.168.1.30
```
(if you'd like these to persist, consider adding export before each line and performing ``source`` on the file)
2. Create a tenant, role, user, and data source system
```bash
hemlock tenant-create --name Project1
hemlock tenant-list
hemlock role-create --name User
hemlock role-list
hemlock user-create --name User1 \
--username Username1 \
--email user1@email.com \
--rold_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
hemlock user-list
hemlock register-local-system --name System1 \
--data_type csv \
--description "description" \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
--hostname system1.fqdn \
--endpoint http://hemlock.server/ \
--poc_name user1 \
--poc_email user1@email.com
hemlock system-list
```
3. Add credentials for data source system, for example: mysql_creds
```bash
MYSQL_SERVER=192.168.1.30
MYSQL_DB=db1
#MYSQL_TABLE=table1
MYSQL_USERNAME=user
MYSQL_PW=pass
```
4. Store a client
```bash
hemlock client-store --name mysql_client_1 --type mysql --system_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --credential_file /path/to/mysql_creds
hemlock client-list
```
5. Add credentials for hemlock
```bash
hemlock hemlock-server-store --credential_file /path/to/hemlock_creds
```
6. Create a schedule server (optional)
```bash
hemlock schedule-server-create --name schedule_server_1
hemlock schedule-server-list
```
7. Add a schedule for the data source system to run (optional)
```bash
hemlock client-schedule --name schedule1 \
--minute "54" \
--hour "12" \
--day_of_month "*" \
--month "*" \
--day_of_week "*" \
--client_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
--schedule_server_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
hemlock schedule-list
```
8. Perform a test run for pulling data from the data source system
```bash
hemlock client-run --uuid 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
```
9. Search for data that has been loaded into Hemlock
```bash
hemlock query-data --user 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --query foo
```
or
```bash
Direct with elasticsearch:
http://elasticsearch.fqdn:9200/hemlock/_search?q=foo
Which returns something the following:
{
"took": 14,
"timed_out": false,
"_shards": {
"total": 20,
"successful": 20,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 3.6582048,
"hits": [
{
"_index": "hemlock",
"_type": "couchbaseDocument",
"_id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
"_score": 3.6582048,
"_source": {
"meta": {
"id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
"rev": "1-0010f1ac6045ccf40000000000000000",
"flags": 0,
"expiration": 0
}
}
}
]
}
}
Now we can feed the 'id' into Couchbase to return the full document:
http://couchbase.fqdn:8092/hemlock/865f458b4421ae5fd758e3c81aca9f8d8b4696b6
Which returns something like the following:
{
"hemlock-system": "a50b86c2-59f7-42a3-aa67-3367579189fe",
"hemlock-date": "2013-09-03 16:10:20",
"stream": "DOYLIE"
}
```
Credential files
----------------
1. Create a ``hemlock_creds`` file (see hemlock_creds_sample for an example):
```bash
HEMLOCK_MYSQL_SERVER=192.168.1.10
HEMLOCK_MYSQL_USERNAME=user
HEMLOCK_MYSQL_DB=hemlock
HEMLOCK_MYSQL_PW=pass
HEMLOCK_COUCHBASE_SERVER=192.168.1.20
HEMLOCK_COUCHBASE_BUCKET=hemlock
HEMLOCK_COUCHBASE_USERNAME=hemlock
HEMLOCK_COUCHBASE_PW=pass
```
2. Create credential files for each client you intend to use ([examples](https://github.com/Lab41/Hemlock/tree/master/hemlock/clients/)).
Currently supported data sources
================================
Technology | Parameter | Python Module Dependencies
---------- | --------- | ------------
MySQL | mysql | MySQLdb
MongoDB | mongo | pymongo
Redis | redis | redis
Local FileSystem | fs | magic, pdfminer, xmltodict
RESTful API | rest |
Streams | stream_odd |
Adding a new data source type
-----------------------------
Create a new class under the clients folder for each new data source type. Most
classes will need two methods defined: ``connect_client`` and ``get_data``.
The following is a template that can be used to work from:
```python
class HMyclient:
def connect_client(self, client_dict):
# return a handle that can be used to get data from the data source
return c_server
def get_data(self, client_dict, c_server, h_server, client_uuid):
# data_list is an array of arrays to contain the data
data_list = [[]]
# desc_list is an array that contains the schema (if exists or known)
desc_list = []
return data_list, desc_list
```
Usage examples
==============
- Create a tenant
```bash
hemlock tenant-create --name Project1
```
- Create a role
```bash
hemlock role-create --name User
```
- Create a user
```bash
hemlock user-create --name User1 \
--username Username1 \
--email user1@email.com \
--rold_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
```
- Register a local system
```bash
hemlock register-local-system --name System1 \
--data_type csv \
--description "description" \
--tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
--hostname system1.fqdn \
--endpoint http://hemlock.server/ \
--poc_name user1 \
--poc_email user1@email.com
```
- List registered systems
```bash
hemlock system-list
```
- List created users
```bash
hemlock user-list
```
- Lists created tenants
```bash
hemlock tenant-list
```
- [Connecting to a client](https://github.com/Lab41/Hemlock/tree/master/hemlock/clients/)
- [Full CLI API list](https://github.com/Lab41/Hemlock/blob/master/docs/CLI.md)
Related repositories
====================
- [Hemlock-REST](http://lab41.github.io/Hemlock-REST/)
- [Hemlock-Frontend](http://lab41.github.io/Hemlock-Frontend/)
Documentation
=============
- [Docs](http://lab41.github.io/Hemlock/docs/_build/html/index.html)
Tests
=====
The tests for this project use [py.test](http://pytest.org/latest/)
Contributing to Hemlock
=======================
What to contribute? Awesome! Issue a pull request or see more details [here](https://github.com/Lab41/Hemlock/blob/master/CONTRIBUTING.md).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hemlock-0.1.6.tar.gz
(1.1 MB
view details)
File details
Details for the file hemlock-0.1.6.tar.gz
.
File metadata
- Download URL: hemlock-0.1.6.tar.gz
- Upload date:
- Size: 1.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60c6f909bdab813b45255474e7a1489448063d36f4d63b3f7071ffe790d26d03 |
|
MD5 | 9a5ff30ef80eb3586b5de22e86ff1f6c |
|
BLAKE2b-256 | 08bf84fe2cc76a6af4f69cfedbcd82026c0d7d7cdc9dbbf19a9d9d377b08bfb3 |