Skip to main content

Utilities for merging session data with LinkedIn member exports and optional demographic enrichment.

Project description

jcp-data-manager

This folder now contains a package-friendly Python implementation of the workflow from merge_testing.ipynb. It is designed to work with arbitrary input files, not just the example JSON files in this repo, as long as they follow the same two source formats.

What it does

  • Loads LinkedIn member JSON data
  • Loads session JSON data
  • Normalizes and merges both datasets on user_id
  • By default, enriches rows with image-based DeepFace analysis
  • By default, enriches rows with name-based gender and ethnicity predictions

Expected input shapes

The LinkedIn file should be a top-level JSON list of member records and must include either wordpress_user_id or user_id.

The sessions file should be a top-level JSON object with a sessions key whose value is a list. Each session record must include at least user_id and session_id.

Install

pip install -e .

Optional extras:

pip install -e ".[image]"
pip install -e ".[names]"
pip install -e ".[image,names]"

CLI usage

jcp-data-manager ^
  --sessions .\jcpst-sessions-2026-04-07-22-48-30.json ^
  --linkedin .\linkedin-member-data-2026-04-07-224846.json ^
  --output .\merged.parquet

You can also run it as a module:

python -m jcp_data_manager.cli --help

To skip an enrichment step, use --skip-image-analysis or --skip-name-analysis.

Project layout

src/jcp_data_manager/
  __init__.py
  cli.py
  enrichment.py
  io.py
  merge.py

The notebook was left untouched so you can compare outputs while you migrate.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jcp_data_manager-0.1.0.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jcp_data_manager-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file jcp_data_manager-0.1.0.tar.gz.

File metadata

  • Download URL: jcp_data_manager-0.1.0.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for jcp_data_manager-0.1.0.tar.gz
Algorithm Hash digest
SHA256 029424be995e92767cce54fa2b94f431886760dc2419f1a49b9ab8a9693f5516
MD5 54bf1971588b8a93f67433b757a31d93
BLAKE2b-256 8b6f35e7e1bb45b1a3adaba4e14eac47835219a8aff830d4a857e0e9e7bed329

See more details on using hashes here.

File details

Details for the file jcp_data_manager-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for jcp_data_manager-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69a4fb3d4ef3df91d16e385a9e5beae3d15acae4b1751217852f834cce6b98d4
MD5 5e18b0a003965655ea647c6115db7f14
BLAKE2b-256 cb4f71abbffda38f0491df74650d55da7e94034d972049784cf0962c7b8f56ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page