Skip to main content

A tool for fetching and filtering IRS 990 data.

Project description

PyRS990 Header

It's a pun. Get it?

A Python application and library that can grab all sorts of IRS Form 990 data on non-profit organizations and put it into a format that can be consumed easily by other applications.

Up and Running

The instructions below should allow you to get the software working for your purpose (user or developer). If you run into trouble please, please let us know so that we can update the instructions (or fix the bug you ran into).

User

For now you need to clone the repo to use it. Eventually we'll package it.

  1. Make sure you have Python 3.8 available
  2. Install Poetry if you don't already have it
  3. Clone the whole repo, cd into the pyrs990 directory
  4. Install dependencies - poetry install
  5. Run it, some very simple examples are below:
    1. poetry run python -m pyrs990 --zip 59801 --use-disk-cache
    2. ...more examples coming soon
  6. Run the commands again, notice the cache speedup
  7. The cache is set to ./.pyrs990-cache/

Developer

This project uses Poetry because it's pretty slick and does a lot of stuff automatically and the developers are not usually Python people, so that's great!

  1. Make sure you have Python 3.8 available
  2. Install Poetry if you don't already have it
  3. Clone the whole repo, cd into the pyrs990 directory
  4. Install dependencies - poetry install
  5. If you need to add dependencies:
    1. poetry add coolpkg
  6. Make a pull request!

About the Data

Right now we pull data that originated with the IRS (hence the silly name) but we get it from a couple sources and information about what is actually available is a little spread out as well.

Structure

There are two indices used to narrow down the list of filing documents that must be downloaded a satisfy a given query. The first is an annual index (we refer to it as "Annual" or "Annual Index" in the code). This index contains all filings processed by the IRS for a given calendar year.

Note that this does not necessarily have anything to do with the filing year. An organization might, for example, file its 2016 990 in either 2017 or 2018 (or even later). There is a field, described below, called tax_period that reflects the filing period. In the future, we intend to further abstract this so that it is easier to use.

The annual index also contains a field called object_id that tells us where to find the XML document that corresponds to that row in the index. PyRS990 abstracts this away, but it is still good to be aware of it.

The second index is the "Exempt Organizations Business Master File" distributed by the IRS. We refer to it as the "BMF Index". This index provides the physical address of each organization, along with some other helpful information. This allows the data to be queried by state, zip code, and so on, which greatly reduces the number of filing documents that must be downloaded for many queries.

Indices may be used to query filing documents from the command line using various options. Note that there are options for both indices and for the filing documents themselves. If possible, it is a good idea to try to use as many index fields as you can to reduce the number of files you have to download.

See the example queries for more information.

Sources

The IRS BMF index files are hosted by the IRS directly and are available by state and region.

Descriptions of the variables contained in the files and the process used to build them are also available (it is also linked from the page above).

The annual index files come from an AWS S3 bucket managed by the IRS. The contents of the bucket are described there.

There is also a readme that demonstrates how to download the files here (it is also linked from the page above):

The filing documents themselves also come from this same AWS S3 bucket in XML format. For the extremely XML-savvy, you can checked out the schema documentation on the IRS website. PyRS990 abstracts this away, however, so there's no real need to understand it if you only want to access the data in a convenient format.

Finally, while not strictly a data source, the IRSx documentation created by ProPublica contains descriptions of many of the filing fields in a simple, readable format. For developers, PyRS990 has been designed to work with the exact XPath selectors listed in the IRSx documentation, so if you want to add a field to the Filing object, this is the place to look first.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyrs990-0.1.4.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

pyrs990-0.1.4-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file pyrs990-0.1.4.tar.gz.

File metadata

  • Download URL: pyrs990-0.1.4.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.4 CPython/3.8.1 Darwin/19.3.0

File hashes

Hashes for pyrs990-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9465a0a037ab3192d77380efc882b6ad37d7b4302b8d79907fba124576e38c29
MD5 b1eef395454876ac4c0bc8fc5146e0f7
BLAKE2b-256 497bceb0db30968c73e24e83df047d71447cacf693adbdfaaa9fd0c357219d76

See more details on using hashes here.

File details

Details for the file pyrs990-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: pyrs990-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.4 CPython/3.8.1 Darwin/19.3.0

File hashes

Hashes for pyrs990-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a212eec55dc4d86f769d19b3fd6e8b0d44a7348c2c14db58fdae291e8232a799
MD5 1b5b5aa806f1b40a0013287ea5973bdd
BLAKE2b-256 ca1eac16007c9bb2b59ecf2432d2736092768824c2703c1541983ed643508737

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page