Skip to main content

Mine and extract complete package lists from NPM registry

Project description

NPM Package Miner

This tool mines the npm registry to collect information about all npm packages.

Features

  • Fetches complete list of npm packages from the official npm registry
  • Retrieves package metadata including homepage and repository URLs via npm registry API
  • Parallel processing with 50 workers for efficient data collection
  • Handles various repository URL formats (git+https, git@github, etc.)
  • Progress tracking with visual feedback
  • Outputs to CSV format compatible with cross-ecosystem analysis

Setup

Run the setup script

chmod +x setup.sh
./setup.sh

This will:

  • Create a virtual environment
  • Install required dependencies (requests, tqdm)

Manual Setup (Alternative)

# Create virtual environment
python3 -m venv .venv

# Activate virtual environment
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Usage

source .venv/bin/activate
python mine_npm.py

The script will:

  1. Download the complete list of package names from npm registry (~2-3 million packages)
  2. Fetch detailed metadata for each package in parallel
  3. Save results to ../../../Resource/Package/Package-List/NPM.csv

Output Format

CSV file with columns:

  • ID: Sequential package identifier
  • Platform: "NPM"
  • Name: Package name
  • Homepage URL: Package homepage URL (from package.json)
  • Repository URL: Source code repository URL (normalized to HTTPS format)

Data Source

Performance

  • Expected runtime: 10-20 hours for ~2-3 million packages
  • 50 parallel workers for API requests
  • Network-dependent (typically limited by API rate and network speed)

Notes

  • The npm registry is continuously updated, so package counts may vary
  • Repository URLs are normalized to HTTPS format
  • Missing or invalid URLs are marked as "nan"
  • The script handles API errors gracefully and continues processing

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

npm_miner-1.0.1.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

npm_miner-1.0.1-py3-none-any.whl (8.8 kB view details)

Uploaded Python 3

File details

Details for the file npm_miner-1.0.1.tar.gz.

File metadata

  • Download URL: npm_miner-1.0.1.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for npm_miner-1.0.1.tar.gz
Algorithm Hash digest
SHA256 8221a1f75f2d238a579c123bb135b0d24c4898bcc51ad73b39ffbdb17c8149a7
MD5 62bfb807c46f0e7599994365a6180bdd
BLAKE2b-256 f79c4f00af56783167daef742a2ebe94ee8e8b0f9e6368d3a2330f428db1d2c1

See more details on using hashes here.

File details

Details for the file npm_miner-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: npm_miner-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for npm_miner-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bfdfd332356e7000e53cc24c0718c453c57fa5dcc8c17c58dd0e0d0002eead34
MD5 9768a06db7f2d745be91973964e85fa3
BLAKE2b-256 7f97ed1f0e9cdcd1611b503dfec020189b482108e3d0abeed3ec7e24d19f4e13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page