Data processing library
Project description
chrisdata
Data processing tools for data analysis
Installation
-
Install Miniforge
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" bash Miniforge3-$(uname)-$(uname -m).sh
-
Clone the repository
rm -rf chrisdata* git clone git@github.com:chrisjihee/chrisdata.git cd chrisdata*
-
Create a new environment
mamba create -n chrisdata python=3.11 -y mamba activate chrisdata
-
Install the required packages
pip install -U -e . rm -rf chrisbase*; git clone git@github.com:chrisjihee/chrisbase.git pip install -U -e chrisbase* pip list | grep -E "mongo|search|Wiki|wiki|json|pydantic|chris|Flask"
-
Install MongoDB
mkdir mongodb; cd mongodb; mkdir data log if [ "$(uname)" = "Linux" ]; then aria2c https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu2204-8.0.0.tgz elif [ "$(uname)" = "Darwin" ]; then aria2c https://fastdl.mongodb.org/osx/mongodb-macos-arm64-8.0.0.tgz fi tar zxvf mongodb-*.tgz --strip-components=1 cd ..
-
Run MongoDB
cd mongodb bin/mongod --config ../cfg/mongod-8800.yaml cd ..
cd mongodb bin/mongod --config ../cfg/mongod-8801.yaml cd ..
-
Install Elasticsearch
mkdir elasticsearch7; cd elasticsearch7 if [ "$(uname)" = "Linux" ]; then aria2c https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.10-linux-x86_64.tar.gz elif [ "$(uname)" = "Darwin" ]; then aria2c https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.10-darwin-aarch64.tar.gz fi tar zxf elasticsearch-*.tar.gz --strip-components 1 sed -i '' 's/#http.port: 9200/http.port: 9717/g' ./config/elasticsearch.yml echo "xpack.security.enabled: true" >> ./config/elasticsearch.yml cd ..
-
Link input data
cd input ln -s /mnt/geo/data/wikidata . ln -s /mnt/geo/data/wikipedia . cd ..
Execution
-
Show help
python -m chrisdata.cli --help
python -m chrisdata.cli wikipedia --help
python -m chrisdata.cli wikidata --help
-
Run command
-
To convert Wikipedia articles
python -m chrisdata.cli wikipedia convert
-
To parse Wikidata dump
python -m chrisdata.cli wikidata parse
-
To filter Wikidata entities
python -m chrisdata.cli wikidata filter
-
To convert Wikidata entities
python -m chrisdata.cli wikidata convert
Reference
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chrisdata-0.5.1.tar.gz.
File metadata
- Download URL: chrisdata-0.5.1.tar.gz
- Upload date:
- Size: 49.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea6a126468996734e0fc193c683558bc7cb1f34ff16afcc74cddff6386a8734f
|
|
| MD5 |
d4672bc347b32ba61888ba440e9399b7
|
|
| BLAKE2b-256 |
cfc746da417eaab2ef0d1cab665ec015ae80e9fd039fe7cfa50cc12b08b673c6
|
File details
Details for the file chrisdata-0.5.1-py3-none-any.whl.
File metadata
- Download URL: chrisdata-0.5.1-py3-none-any.whl
- Upload date:
- Size: 62.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1d7bc2fedd46403c299530c98b0a3b4f663c5183a61324b8d17193f729cd8e1
|
|
| MD5 |
f5a1babe133b52ea40bdfd9e8056cdcf
|
|
| BLAKE2b-256 |
5f3f297db29cd9e3d39292539892ee3b4161b5b66639e1711267faa94bb57b51
|