Skip to main content

Datapackage containing orders of Cabinet Secretariat from https://cabsec.gov.in/.

Project description

Data-package: orgpedia_cabsec

Posting data of Ministers of India. The data is obtained by processing posting orders from Cabinet Secretariat's website.

To get a quick peek check out the tenures-sample.csv it contains a snapshot of the tenure information of Cabinet Secretariat officers.

The tenures information is built by processing orders found on the Cabinet Secretariat's webpage (import/documents). The orders are processed to build higher level concepts of Tenure and an Org chart. To undersand the processing logic please check out the Data Processing section.

Accessing the data

All the data is available in the flow/buildTenure_/output folder and it contains the following files

  1. tenures.json, tenures.csv: Tenure information in json and csv format

  2. orders.json: Order information in json format.

  3. officer_infos.json: Officer ID to name mapping and additional information if available.

  4. post_infos.json: Contains hierarchis of different components making up the post dept, role, juri, loca and stat, which map to Department, Rank, Jurisdiction, Location and Status.

  5. orders/*.order.json: Individual orders in json format.

  6. schema/*.schema.json: Schema information for all these json files can be found in the data/schema directory. check out the README.md for an introduction.

You can also Install the orgpedia_cabsec package, the package contains all the data created by this repository.

python -m pip install orgpedia_cabsec

Once you install the package, all the data is available in data.zip. Use this command to get the path of the data.zip installed on your computer.


python -c "import pkg_resources;pkg_resources.resource_filename('orgpedia_cabsec', 'data.zip')"

<path/to/data.zip>

Data Stats

These are high level statistics, please check flow directory for more information.

  • Number of Documents: 904

  • Documents Processed: 817

  • Number of Pages: 2,145

  • Total Edits: 3,885

  • Edits per Page: 1.8112 (3,885/2,145)

Data Processing

This is a data package repository - it contains documents, configuration and code for processing the documents and creating data. In a sense it is different from code repositories that only contain code and not the artifacts the code generates.

The data processing is broken down in series of Tasks, where each task processes the data created in the upstream task (links in the input folder) and generages new data stored in the output folder. The directory layout of this repository follows the ideas mentioned in this video: Principled Data Processing by Patrick Ball. There are 3 main top-level directories import, flow and export. A simple makefile orchestrates the document flow across these folders, run make help to find out more about the commands.

You can check out the template repository template.datapackage where each directory and sub-directory is explained. To understand how the data (/flow/buildTenure_/output) is generated from documents (/import/documents/) explore the flow directory.

Deverloper Notes

If you want to make changes and regenerate data you have two choices

  1. Use GitHub codespaces (WIP).
  2. Build locally, for this you will need at least 20 GB of space, as we store documents, intermediate data and final data locally. To minimize the space requirement it is recommended that you work only on the buildOrder/* and downstream tasks.

Local Development

Prerequisites

  • Git with Git LFS
  • Python 3.7+
  • Poetry
  • make

Installation

Git & Git LFS

To install Git, visit the Git website and follow the installation instructions for your operating system. For make sure Git-LFS stays enabled (default option). For othe platforms follow these instructions on Github.

Python

To install Python, visit the Python website and download the version of Python 3.x for your operating system. Follow the installation instructions for your operating system.

Poetry

To install Poetry, visit the Poetry website and follow installation instructions for your operating system:

Make

On Unix based make should come pre-installed, on Windows use winget to install make, follow instructions here.

Setup

Orgpedia repository makes heavy use of soft-links, soft-links are stored in the GitHub repository. On non-windows platforms this is not a problem for Windows you need to do two things 1) enable soft-links and 2) tell git about it.

Symlinks Setup On Windows

On Windows 11, make sure you have enabled deverloper mode this will automatically enable soft-links on your machine. On windows 10 soft-links were added in Build 14972 and only works on Administrator cmd prompt. More info at this link.

Next you need to tell git it should create soft-links when it sees them in the respository, check the Stack Overflow answer to know more about this. Execute the following command.

git config --global core.symlinks true

To setup the project, clone the repository using git (this is a large repository, will take several minutes):

git clone https://github.com/orgpedia/cabsec.git

Navigate to the project directory:

cd cabsec

Use poetry to install software dependencies(one time only):

make install

Import models and other data-packages required for the document flow (one time only), these will be downloaded in the import folders and it takes a long time.

make import

Generate Data

After this you should have all the files needed to generate the data. Make whatever changes you need to make and then execute

make flow

This will generate the data based on your changes. Currently, make does not track dependencies as a result the entire document flow is re-executed !!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

orgpedia_cabsec-0.0.4.tar.gz (22.1 MB view details)

Uploaded Source

Built Distribution

orgpedia_cabsec-0.0.4-py3-none-any.whl (22.1 MB view details)

Uploaded Python 3

File details

Details for the file orgpedia_cabsec-0.0.4.tar.gz.

File metadata

  • Download URL: orgpedia_cabsec-0.0.4.tar.gz
  • Upload date:
  • Size: 22.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.7.13 Darwin/22.4.0

File hashes

Hashes for orgpedia_cabsec-0.0.4.tar.gz
Algorithm Hash digest
SHA256 920416f9ceb58e5d7e3521b1ccd33e4d3ab3c90a86025c446bbb4b9af429c8d2
MD5 e99cca99cca21dff41040d7df600c314
BLAKE2b-256 bbb5f15826bcebb55f3c78bae94d970c19e8041bee4d12cce63cb9bc821a3d0e

See more details on using hashes here.

File details

Details for the file orgpedia_cabsec-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: orgpedia_cabsec-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 22.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.7.13 Darwin/22.4.0

File hashes

Hashes for orgpedia_cabsec-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 05199528b6f6cdb29554fd81193dfae8d1c5005962c9f145b3ba103608d558f8
MD5 90c901a6ada938f0cfef62ba2ca259fc
BLAKE2b-256 2a45c2402268b03ef58fb43593096f711ebb48e6e968cf2a0261c10edec34024

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page