Mine and extract complete package lists from NPM registry
Project description
NPM Package Miner
This tool mines the npm registry to collect information about all npm packages.
Features
- Fetches complete list of npm packages from the official npm registry
- Retrieves package metadata including homepage and repository URLs via npm registry API
- Parallel processing with 50 workers for efficient data collection
- Handles various repository URL formats (git+https, git@github, etc.)
- Progress tracking with visual feedback
- Outputs to CSV format compatible with cross-ecosystem analysis
Setup
Run the setup script
chmod +x setup.sh
./setup.sh
This will:
- Create a virtual environment
- Install required dependencies (requests, tqdm)
Manual Setup (Alternative)
# Create virtual environment
python3 -m venv .venv
# Activate virtual environment
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
Usage
source .venv/bin/activate
python mine_npm.py
The script will:
- Download the complete list of package names from npm registry (~2-3 million packages)
- Fetch detailed metadata for each package in parallel
- Save results to
../../../Resource/Package/Package-List/NPM.csv
Output Format
CSV file with columns:
ID: Sequential package identifierPlatform: "NPM"Name: Package nameHomepage URL: Package homepage URL (from package.json)Repository URL: Source code repository URL (normalized to HTTPS format)
Data Source
- Registry: https://registry.npmjs.org/
- All packages list: https://replicate.npmjs.com/_all_docs
- Package metadata: https://registry.npmjs.org/{package-name}
Performance
- Expected runtime: 10-20 hours for ~2-3 million packages
- 50 parallel workers for API requests
- Network-dependent (typically limited by API rate and network speed)
Notes
- The npm registry is continuously updated, so package counts may vary
- Repository URLs are normalized to HTTPS format
- Missing or invalid URLs are marked as "nan"
- The script handles API errors gracefully and continues processing
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
npm_miner-1.0.1.tar.gz
(9.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file npm_miner-1.0.1.tar.gz.
File metadata
- Download URL: npm_miner-1.0.1.tar.gz
- Upload date:
- Size: 9.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8221a1f75f2d238a579c123bb135b0d24c4898bcc51ad73b39ffbdb17c8149a7
|
|
| MD5 |
62bfb807c46f0e7599994365a6180bdd
|
|
| BLAKE2b-256 |
f79c4f00af56783167daef742a2ebe94ee8e8b0f9e6368d3a2330f428db1d2c1
|
File details
Details for the file npm_miner-1.0.1-py3-none-any.whl.
File metadata
- Download URL: npm_miner-1.0.1-py3-none-any.whl
- Upload date:
- Size: 8.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfdfd332356e7000e53cc24c0718c453c57fa5dcc8c17c58dd0e0d0002eead34
|
|
| MD5 |
9768a06db7f2d745be91973964e85fa3
|
|
| BLAKE2b-256 |
7f97ed1f0e9cdcd1611b503dfec020189b482108e3d0abeed3ec7e24d19f4e13
|