Skip to main content

Data processing library

Project description

chrisdata

Data processing tools for data analysis

Installation

  1. Install Miniforge

    wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh"
    bash Miniforge3-$(uname)-$(uname -m).sh
    
  2. Clone the repository

    rm -rf chrisdata*
    git clone git@github.com:chrisjihee/chrisdata.git
    cd chrisdata*
    
  3. Create a new environment

    mamba create -n chrisdata python=3.11 -y
    mamba activate chrisdata
    
  4. Install the required packages

    pip install -U -e .
    rm -rf chrisbase*; git clone git@github.com:chrisjihee/chrisbase.git
    pip install -U -e chrisbase*
    pip list | grep -E "mongo|search|Wiki|wiki|json|pydantic|chris|Flask"
    
  5. Install MongoDB

    mkdir mongodb; cd mongodb; mkdir data log
    if [ "$(uname)" = "Linux" ]; then
      aria2c https://fastdl.mongodb.org/linux/mongodb-linux-x86_64-ubuntu2204-8.0.0.tgz
    elif [ "$(uname)" = "Darwin" ]; then
      aria2c https://fastdl.mongodb.org/osx/mongodb-macos-arm64-8.0.0.tgz
    fi
    tar zxvf mongodb-*.tgz --strip-components=1
    cd ..
    
  6. Run MongoDB

    cd mongodb
    bin/mongod --config ../cfg/mongod-8800.yaml
    cd ..
    
    cd mongodb
    bin/mongod --config ../cfg/mongod-8801.yaml
    cd ..
    
  7. Install Elasticsearch

    mkdir elasticsearch7; cd elasticsearch7
    if [ "$(uname)" = "Linux" ]; then
      aria2c https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.10-linux-x86_64.tar.gz
    elif [ "$(uname)" = "Darwin" ]; then
      aria2c https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.10-darwin-aarch64.tar.gz
    fi
    tar zxf elasticsearch-*.tar.gz --strip-components 1
    sed -i '' 's/#http.port: 9200/http.port: 9717/g' ./config/elasticsearch.yml
    echo "xpack.security.enabled: true" >> ./config/elasticsearch.yml
    cd ..
    
  8. Link input data

    cd input
    ln -s /mnt/geo/data/wikidata .
    ln -s /mnt/geo/data/wikipedia .
    cd ..
    

Execution

  1. Show help

    python -m chrisdata.cli --help
    
    python -m chrisdata.cli wikipedia --help
    
    python -m chrisdata.cli wikidata --help
    
  2. Run command

  • To convert Wikipedia articles

    python -m chrisdata.cli wikipedia convert
    
  • To parse Wikidata dump

    python -m chrisdata.cli wikidata parse
    
  • To filter Wikidata entities

    python -m chrisdata.cli wikidata filter
    
  • To convert Wikidata entities

    python -m chrisdata.cli wikidata convert
    

Reference

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chrisdata-0.5.2.tar.gz (49.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chrisdata-0.5.2-py3-none-any.whl (63.1 kB view details)

Uploaded Python 3

File details

Details for the file chrisdata-0.5.2.tar.gz.

File metadata

  • Download URL: chrisdata-0.5.2.tar.gz
  • Upload date:
  • Size: 49.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for chrisdata-0.5.2.tar.gz
Algorithm Hash digest
SHA256 e6d23895b9435cbf3ba8fef1b3548c234b9ba165f14d7e43c6cc1b8ca7735735
MD5 beb5b85c8e853aee2cc15c34cfef5972
BLAKE2b-256 797bc769d26f9af5a00c12e262a69bb6967f6e87a489df33aad6de31850429ea

See more details on using hashes here.

File details

Details for the file chrisdata-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: chrisdata-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 63.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for chrisdata-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ce98f82d77af705c3430a72e540d7d891c6f27d8faefa1dd44ad01b853098660
MD5 fe46a3bbfa67ecd70e9d0e57ba01e1f9
BLAKE2b-256 37d8e312b3314878c3f3301e96654b14224f2c51d63ea0912a05b3587411af56

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page