Piiip Interactively Installs Intended Packages
Project description
piiip - piiip interactively installs intended packages
piiip (Piiip Interactively Installs Intended Packages) is a wrapper around pip that helps to
avoid installation of a different package than was intended. For example, when
executing piiip install pandaa
(pandaa instead of pandas), piiip asks
for a confirmation before commencing the installation of pandaa. Accidentally
installing a different package than was intended can result in security risks,
including attackers getting control over the machine on which the unintended
package is installed^3. piiip is a drop-in replacement for pip; usage is
exactly equal.
What can go wrong?
Using pip, it is trivial to install any desired package from PyPI by just specifying the desired package name. If the package name is incorrect however, for example due to a typo, a different package is installed than was intended. This package might contain outdated, vulnerable or even outright malicious software, which can result in a compromised machine (see ^3) for an overview when and how packages can do arbitrary code execution). Malicious parties are actively uploading malicious packages to compromise systems, similar to domain typosquatting attacks. These packages, which have a name that is designed to be confused with a legitimate package name, are used to steal information, private keys or install backdoors on target machines^10.
Does this actually go wrong in practice?
Yes. Several projects to protect users of pip have registered dummy packages with names that can be easily confused with popular packages. By claiming these names, real attackers cannot use the names for typosquatting purposes anymore. This is called "defensive typosquatting". Two defensive typosquatting projects ^8 ^9 received more than a million downloads in total on their packages, showing how often a typo happens. Furthermore, a student was able to run code on 17,000 unique hosts only 7 weeks after uploading 200 packages with a name that could be easily confused with popular packages^7. The Advanced Persistent Threat (APT) Lazarus also employed the package name confusion technique^6. Other groups have also attracted attention by using package name confusion techniques to steal source code, cryptocurrency, SSH and GPG keys, credentials and Discord tokens.
Package name confusion and typosquatting
The term "package name confusion" is used to describe all ways in which a user can install a different package than intended. The most intuitive example of package name confusion is a typing error (typosquatting: panddas instead of pandas). Other causes include a different spelling (colourama instead of colorama), delimiter modification (charsetnormalizer instead of charset-normalizer), prefix/suffix augmentation (py-pandas instead of pandas). Neupane et al. created an overview of package name confusion categories[^1].
How does piiip help?
piiip adds a layer of safety by asking confirmation before installing packages.
It only asks for confirmation if a package name might not represent the package
that was intended to be installed. This way, piiip is not a burden on the user,
but can prevent security issues. For example, when running piiip install pandas
the behavior of piiip is identical to pip. But when running
piiip install pandaa
, piiip asks:
A package named pandas instead of pandaa exists. Are you sure you want to install pandaa? (y/n)
Examples of real malicious packages that would have triggered a warning by piiip are:
Malicious package name | Real package name | Category according to [^1] | Source |
---|---|---|---|
python3-dateutil | dateutil | Prefix/suffix augmentation | Snyk |
urlib3 | urllib3 | 1-step Damerau/Levenshtein distance | IQT |
colourama | colorama | Alternate spelling | Neupane et al. |
Usage
piiip is fully compatible with PIP. You can use piiip
in the exact same manner
as pip
(or pip3
) and you won't see any difference until a possible name
confusion occurs. In that case, piiip will ask you to confirm the installation of
the package. Note that packages installed with the option --index-url
are
not analyzed for name confusion.
For example, if you want to install pandas
you run:
piiip install pandas
For more information, run
piiip --help
Features
piiip currently detects the following categories[^1] of package name confusion:
Category | Protects against: | Example |
---|---|---|
Character omission | Forgetting a character in the package name | panda |
Character addition | Adding an additional character in the package name | panddas |
Swapped character | Changing the location of two characters | panads |
Substituted character | Exchanging a character for a random other character | panfas |
Prefix/suffix augmentation[^2] | Adding a keyword before or after the package name | pandas-py |
Alternate spelling | Exchanging a British word for an American word or vice versa | colorama -> colourama |
Homographic replacement | Exchanging one or more characters that look alike | colorama -> col0rama |
Note that only one mistake can be made in the package name. Packages with two mistakes, or mistakes from two categories are not detected. Examples of what is not detected: panddas-py, pandddas and pndass.
Installing piiip
Method 1:
- Clone the repository
- Run
python -m pip install .
Method 2:
Run pip install piiip
.
Roadmap
- Add detection methods for other categories[^1] of package name confusion:
- Sequence reordering
- Grammatical substitution
- Semantic substitution
- Asemantic substitution
- Homophonic similarity
- Simplification
- Implement a more robust method to determine package popularity
How does piiip work?
piiip performs two main tasks when it receives a package name:
- Generating alternative package names that the user might have intended instead of the received package name
- Determining the popularity of all packages alternative package names and the received package
If one of the alternative package names belong to a package that is more popular than the received package name, the warning is shown. The generation of alternative package names is performed for the categories listed under Features. Popularity of packages is currently determined by using download statistics from pypistats.org.
Alternatives for other online package repositories
piiip only works for pip. For npm, TypoGard^5 by Taylor et al. can be used. TypoGard has the same goal as piiip and has been integrated in (a specific version of) the npm package installer^4.
[^1]: the listed categories are taken from "Beyond Typosquatting: An In-depth Look at Package Confusion" by Neupane et al.
[^2]: for a very limited set of prefixes/suffixes
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file piiip-1.0.0.tar.gz
.
File metadata
- Download URL: piiip-1.0.0.tar.gz
- Upload date:
- Size: 34.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 557afe3cfbf1aba97d4c5b5079587ad570d565acd197f4fd65af9e728ae3efd3 |
|
MD5 | a87c667ffc174fa1174da25fd73b9773 |
|
BLAKE2b-256 | baac243012b10d88db05fad93d59f55d72460cbee5ffad292f9f5923085c0e74 |
File details
Details for the file piiip-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: piiip-1.0.0-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa35919fc122de59c7917e7eb248a232190e7553c7c3f7d28537a717b8986434 |
|
MD5 | a731e584a5c930349bc9821bd3ff8082 |
|
BLAKE2b-256 | f08da6f028d7dee726d66ece2350333995c2b53dd360b42bfec815b141157684 |