Skip to main content
Python Software Foundation 20th Year Anniversary Fundraiser  Donate today!

Parses files

Project description

This package returns information from a python packages file, without installing the package or trusting the code inside the

What’s this for?

This is really an experiment in large scale processing of files. The information provided in these files lists the project description, author and other metadata like package dependencies.

Unfortunately there isn’t any easy way of programmatically accessing this information for an arbitrary script. Each script is a full python program, and the data can be arbitrary python objects passed as arguments to a function. This package aims to provide a way for python programs to retrieve all this metadata programmatically, without doing things like installing the program or trusting the file to not have malicious side effects.

I’ve verified it by running on every Python repository on GitHub that has more than 500 stars (about 2500 repos). Every package that can be installed on a clean python installation can be parsed by this code, and additionally a bunch of packages that can’t be installed for different reasons can also have their setup contents retrieved here.

How does this work?

Since files are Python files, the arguments to the setup call can be arbitrary Python expressions. This means that the only reliable way of getting these arguments is by evaluating the file using a Python interpreter.

To return the arguments passed to setuptools.setup call from a file, I’m temporarily monkey patching the setuptools.setup function to collect its arguments - and then using exec to execute the file:

setup_args = [None]

# patch setup functions to just keep track of arguments passed to them
def patched_setup(**kwargs):
    setup_args[0] = kwargs

setuptools.setup = distutils.core = patched_setup

exec(open(setup_py_filename).read(), {
     "__name__": "__main__",
     "__builtins__": __builtins__,
     "__file__": setup_py_filename})

Globals are explicitly set up to match what most scripts expect, including __name__ == "__main__" guards. Likewise there are some special cases with setting up the python path, current directory etc that are taken care of by the full code.

Sandboxing with Docker

Running exec on an untrusted python file is a bad idea. As an example, some scripts do interesting things like mess around with your root git config - but the potential harm could be much much worse than that.

To prevent harmful side-effects, this package runs by default in a sandboxed Docker container. In addition to security benefits, this also lets us to cleanly fall back to using a Python2.7 Docker image in the case where the syntax is invalid for Python3.

Running in a docker container can be disabled by setting a trusted=True flag when calling. Also note that its probably worth configuring Docker to use gVisor to provide some extra piece of mind when parsing untrusted code.

Handling Missing Dependencies

A common pattern for files is to import the uninstalled module to look up things like version strings. While this works, it can have the side effect of importing the modules dependencies before they are installed.

As an example, tensorlayer imports some metadata from its root module - which in turn imports tensorflow, which hasn’t been installed yet in the docker image.

To hack around this problem, this code has the option of hooking into Python’s import handling to prevent ImportError’s from surfacing when running.

The idea is to provide a module importer to sys.meta_path that always finds a module if the existing resolution fails:

class MockModuleImporter(object):
    def find_module(self, fullname, path=None):
        return self

    def load_module(self, name):
        mock = MockModule(name)
        sys.modules[name] = mock
        return mock

# This hooks into Pythons' import mechanism, meaning that any
# module that fails to import will be replaced with a MockModule
# object

The MockModule inherits from types.Modules and just returns a Mock object object with the common magic methods defined

class MockModule(types.ModuleType):
    def __getattr__(self, name, *args, **kwargs):
        return Mock()

    def __call__(self, *args, **kwargs):
        return Mock()

class Mock(object):
    def __getattr__(self, *args, **kwargs):
        return self
    __call__ = __getitem__ = __setitem__ = __add__ = __getattr__

    ... etc ...

This prevents a sizeable number of errors, and doesn’t seem to affect the output noticeably. This behaviour can be disabled by setting mock_imports=False.


This code can be installed via pip:

pip install parsesetup

To run

import parsesetup

# parses the file, returning arguments as a dict
setup_args = parsesetup.parse(path_to_setup_py)

# Parses a single package without using docker (dangerous!)
setup_args = parsesetup.parse(path_to_setup_py, trusted=True)

# Parses multiple packages in a single docker container. All packages
# need to share a common directory root for this to work
with parsesetup.DockerSetupParser(ROOT_PATH) as parser:
    setup_a = parser.parse(path_to_setup_py_a)
    setup_b = parser.parse(path_to_setup_py_a)


  • Programmatically lets you inspect information contained in files
  • Handles both python2.7 and 3.6 scripts
  • Hooks into setuptools, distutils.core and numpy.distutils.core setups
  • Runs untrusted files in a docker container
  • Reads files with a __name__ == “__main__” guard

Released under the MIT License

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for parsesetup, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size parsesetup-0.0.1.tar.gz (6.6 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page