Skip to main content

Python interface to EDGAR filings.

Project description

pyedgar

Python package for downloading EDGAR documents and data.

PyPI version shields.io PyPI license PyPI pyversions GitHub latest commit

Usage

There are two primary interfaces to this library, namely filings and indices.

filing.py

filing.py is the main module for interacting with EDGAR forms.

Simple example:

from pyedgar import Filing
f = Filing(20, '0000893220-96-000500')

print(f)
#output: <EDGAR filing (20/0000893220-96-000500) Headers:False, Text:False, Documents:False>

print(f.type, f)
# output: 10-K <EDGAR filing (20/0000893220-96-000500) Headers:True, Text:True, Documents:False>

print(f.documents[0]['full_text'][:800])
# Output:
#                         SECURITIES AND EXCHANGE COMMISSION
#                               WASHINGTON, D.C. 20549
#
#                                     FORM 10-K
#
#  (Mark One)
#  /X/  Annual report pursuant to section 13 or 15(d) of the Securities Exchange
#       Act of 1934 [Fee Required] for the fiscal year ended December 30, 1995 or
#
#  / / Transition report pursuant to section 13 or 15(d) of the Securities
#      Exchange Act of 1934 [No Fee Required] for the transition period from
#      ________ to ________
#
#  COMMISSION FILE NUMBER 0-9576
#
#
#                             K-TRON INTERNATIONAL, INC.
#               (EXACT NAME OF REGISTRANT AS SPECIFIED IN ITS CHARTER)
#
#                 New Jersey                                22-1759452
#     (State or other jurisdiction of         (I.R.S. Employer Identification No.)

The forms are loaded lazily, so only when you request the data is the file read from disk or downloaded from the EDGAR website. Filing objects have the following properties:

  • path: path to cached filing on disk
  • urls: URLs the EDGAR website location for the full text file and the index file
  • full_text: Full text of the entire .nc filing (not just the first document)
  • headers: Dictionary of all the headers from the full filing (i.e. not the exhibits). E.g. CIK, ACCESSION, PERIOD, etc.
  • type: The general type of the document, extracted from the TYPE header and cleaned up (so 10-K405 --> 10-K)
  • type_exact: The exact text extracted from the TYPE field
  • documents: Array of all the documents (between tags). 0th is typically the main form, i.e. the 10-K filing, subsequent documents are exhibits.
    • Each document in this array is itself a dictionary, with fields: TYPE, SEQUENCE, DESCRIPTION (typically the file name), FULL_TEXT. The latter is the text of the exhibit, i.e. just the 10-K filing in text or HTML.

index.py

index.py is the main module for accessing extracted EDGAR indices. The indices are created in pyedgar.utilities.indices by the IndexMaker class. Once these indices are created (which you can do by setting force_download=True), you can view them via the indices property:

from pyedgar import EDGARIndex
all_indices = EDGARIndex(force_download=False)

print(all_indices.indices)
# Output:
# {'form_all.tab': '/data/storage/edgar/indices/form_all.tab',
#  'form_10-Q.tab': '/data/storage/edgar/indices/form_10-Q.tab',
#  'form_13s.tab': '/data/storage/edgar/indices/form_13s.tab',
#  'form_DEF14A.tab': '/data/storage/edgar/indices/form_DEF14A.tab',
#  'form_8-K.tab': '/data/storage/edgar/indices/form_8-K.tab',
#  'form_20-F.tab': '/data/storage/edgar/indices/form_20-F.tab',
#  'form_10-K.tab': '/data/storage/edgar/indices/form_10-K.tab'}

These indices are accessible as a pandas dataframe via [] or the get_index method, where the index is selected via the key above (with or without the form_ or .tab).

form_10k = all_indices['10-K']

print(form_10k.head(1))
# Output:
#       cik                      name  form    filedate             accession
#    0   20  K TRON INTERNATIONAL INC  10-K  1996-03-28  0000893220-96-000500

To get a type of form that isn't automatically extracted, you can use form_all:

df_s1 = EDGARIndex().get_index('all').query("form.str.startswith('S-1')")

print(df_s1.head(1))
# Output:
#        cik        name form    filedate             accession
# 5600  1961  WORLDS INC  S-1  2014-02-04  0001264931-14-000033

All indices are loaded and saved by pandas, so pandas is a requirement for using this functionality.

Config

Config files named pyedgar.conf, .pyedgar, pyedgar.ini are searched for at (in order):

  1. os.environ.get("PYEDGAR_CONF", '.') <-- PYEDGAR_CONF environmental variable
  2. ./
  3. ~/.config/pyedgar
  4. ~/AppData/Local/pyedgar
  5. ~/AppData/Roaming/pyedgar
  6. ~/Library/Preferences/pyedgar
  7. ~/.config/
  8. ~/
  9. ~/Documents/
  10. os.path.abspath(os.path.dirname(__file__)) <-- directory of the package. Default package ships with this existing.

See the example config file for commented config settings.

Running multiple configs is quite easy, by setting os.environ manually:

import os
# os.environ['PYEDGAR_CONF'] = os.path.expanduser('~/Dropbox/config/pyedgar/hades.local.pyedgar.conf')
os.environ['PYEDGAR_CONF'] = os.path.expanduser('~/Dropbox/config/pyedgar/hades.desb.pyedgar.conf')

from pyedgar import config
print(config.CONFIG_FILE)

# Output:
#     WARNING:pyedgar.config:Loaded config file from '[~]/Dropbox/config/pyedgar/hades.desb.pyedgar.conf'.
#     ALERT!!!! FILING_PATH_FORMAT is '{accession[11:13]}/{accession}.nc'.
#     [~]/Dropbox/config/pyedgar/hades.desb.pyedgar.conf

downloader

There is a convenience downloader script, for downloading filing feed files and indexes.

To see the status of current cached downloads (shows the latest downloaded files) and to see the config setup:

$ python -m pyedgar.downloader --status --config

To download and extract index files:

$ python -m pyedgar.downloader -i --log info

And to download and extract the last 30 days of filings:

$ python -m pyedgar.downloader -d

To download and extract filings since the beginning:

$ python -m pyedgar.downloader -d --start-date 1995-01-01

Install

Pip installable:

pip install pyedgar

Or pip installable from github:

pip install git+https://github.com/gaulinmp/pyedgar#egg=pyedgar

or by checking out from github and installing in editable mode:

git clone https://github.com/gaulinmp/pyedgar
cd pyedgar
pip install -e ./

Requirements

w3m for converting HTML to plaintext (tested on Linux). A fallback method might one day be added.

Tested only on Python >3.4

HTML parsing tested only on Linux. Other HTML->text conversion methodologies were tried (html2text, BeautifulSoup, lxml) but w3m was fastest even with the subprocess calling. Converting multiple HTML files could probably be optimized with one instance of w3m instead of spawning a subprocess for each call. But that's for future Mac to work on.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyedgar-0.1.10.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyedgar-0.1.10-py3-none-any.whl (51.7 kB view details)

Uploaded Python 3

File details

Details for the file pyedgar-0.1.10.tar.gz.

File metadata

  • Download URL: pyedgar-0.1.10.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for pyedgar-0.1.10.tar.gz
Algorithm Hash digest
SHA256 7b8014b0860ca08333cc2e0f997c2c09976af3d53362fdb6616756baef57455f
MD5 d3d2f9870eecd10cf02bfa3100838f53
BLAKE2b-256 9d312b24ad2fe79a9db9cde8bf3b75ac0979634f2e265a9ba3b91f38ef266c97

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyedgar-0.1.10.tar.gz:

Publisher: pypi-publish.yml on gaulinmp/pyedgar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pyedgar-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: pyedgar-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 51.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for pyedgar-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 625ab37a888cc7fe9193e4c98b90ecca09e36e2aad6cf4f4596dd54c3ef04d35
MD5 93169a2106b8421b4ab94c64fd91e623
BLAKE2b-256 607c95c1cbc2f43ebde62db7aa2b2abf946825454340c2c95bee5b80330cb535

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyedgar-0.1.10-py3-none-any.whl:

Publisher: pypi-publish.yml on gaulinmp/pyedgar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page