Skip to main content

Mine and extract complete package lists from NPM registry

Project description

NPM Package Miner

This tool mines the npm registry to collect information about all npm packages.

Features

  • Fetches complete list of npm packages from the official npm registry
  • Retrieves package metadata including homepage and repository URLs via npm registry API
  • Parallel processing with 50 workers for efficient data collection
  • Handles various repository URL formats (git+https, git@github, etc.)
  • Progress tracking with visual feedback
  • Outputs to CSV format compatible with cross-ecosystem analysis

Setup

Run the setup script

chmod +x setup.sh
./setup.sh

This will:

  • Create a virtual environment
  • Install required dependencies (requests, tqdm)

Manual Setup (Alternative)

# Create virtual environment
python3 -m venv .venv

# Activate virtual environment
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Usage

source .venv/bin/activate
python mine_npm.py

The script will:

  1. Download the complete list of package names from npm registry (~2-3 million packages)
  2. Fetch detailed metadata for each package in parallel
  3. Save results to ../../../Resource/Package/Package-List/NPM.csv

Output Format

CSV file with columns:

  • ID: Sequential package identifier
  • Platform: "NPM"
  • Name: Package name
  • Homepage URL: Package homepage URL (from package.json)
  • Repository URL: Source code repository URL (normalized to HTTPS format)

Data Source

Performance

  • Expected runtime: 10-20 hours for ~2-3 million packages
  • 50 parallel workers for API requests
  • Network-dependent (typically limited by API rate and network speed)

Notes

  • The npm registry is continuously updated, so package counts may vary
  • Repository URLs are normalized to HTTPS format
  • Missing or invalid URLs are marked as "nan"
  • The script handles API errors gracefully and continues processing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

npm_miner-1.0.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

npm_miner-1.0.0-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file npm_miner-1.0.0.tar.gz.

File metadata

  • Download URL: npm_miner-1.0.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for npm_miner-1.0.0.tar.gz
Algorithm Hash digest
SHA256 09a527a547588c40bae40c80133043bcd5744d0c719ddc3b1b9ce4301fbe729e
MD5 514b19458478fc2cc4ff75f01032e883
BLAKE2b-256 a8a61b098c2c9ee589866e2761a94870d8020666af176409ed65b7aa29feb665

See more details on using hashes here.

File details

Details for the file npm_miner-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: npm_miner-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for npm_miner-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6386f5a9fc3062db0fc4400a63eb4fac4b23c985bbfa74b9a787a4bc20a2a3b3
MD5 bab63999eb53c3ef96c1e660bd419abb
BLAKE2b-256 d6d71ed480982741b1328d41a9f9ee14395253ebac1198aa23e27b52730e406a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page