Skip to main content

sec/edgar file downloader

Project description

pysec_downloader

downloader for sec filings and other data available from the sec

install:

pip install pysec_downloader

Features: supports most filings, needs a lot of refining still. exposes some of the sec xbrl api. self updating lookup table for ticker:cik so we can search xbrl api with ticker instead of only cik. not async as the rate limit of the sec is quite low so the benefit for the added complexity is minimal (correct me if I am wrong).

no tests at the moment.

usage:

General Usage

# Make sure you have needed permission for the root_path!
# Instantiate the Downloader and download some 10-Q Filings as XBRL for AAPL
dl = Downloader(r"C:\Users\Download_Folder", user_agent="john smith js@test.com")
dl.get_filings(
    ticker_or_cik="AAPL",
    form_type="10-Q",
    after_date="2019-01-01",
    before_date="",
    prefered_file_type="xbrl",
    number_of_filings=10,
    want_amendments=False,
    skip_not_prefered_extension=True,
    save=True)

# if the `number_of_filings` is large you might consider using `get_filings_bulk()` 
# instead of `get_filings()` for a more efficent index creation.

Bulk Files (companyfacts XBRL and submissions)

# get Facts (individual values) from a single Concept ("AccountPayableCurrent") of a Taxonomy ("us-gaap")
facts_file = dl.get_xbrl_companyconcept("AAPL", "us-gaap", "AccountsPayableCurrent")
# download the zip containing all information on submissions of every company and extract it
# Calling `get_bulk_submissions` or `get_bulk_companyfacts` downloads >10GB of files!
dl.get_bulk_submissions()

# get the company-ticker map/file 
other_file = dl.get_file_company_tickers()

13f securities (CUSIPS of most securities)

Get the file containg all CUSIPS relating to 13f securities (as defined in 17 CFR § 240.13f-1)

# download the most current 13f securities pdf
dl.get_13f_securities_pdf(path_to/save_as.pdf)
# get a byte reprensentation of the pdf without saving it
dl.get_13f_securities_pdf(target_path=None)

easy way to convert the 13f securities pdf into a usuable dataframe/list -> tabula-py

from tabula import read_pdf
from pathlib import Path
import pandas as pd


def convert_13f_securities_pdf(pdf_path: str, target_path: str=None, mode: str="csv", overwrite=True):
    '''
    Args:
        pdf_path: path to the pdf file
        target_path: output file
        mode: set output mode. valid modes are: 'csv' 
    
    Raises:
        FileExistsError: if overwrite is False and a file already exists at target_path
    '''
    df = read_pdf(pdf_path, pages="all", pandas_options={"header": None})
        
    if mode == "csv":
        if Path(target_path).is_file():
            if overwrite is False:
                raise FileExistsError("a file with that name already exists")
            else:
                Path(target_path).unlink()
    dfs = []
    for d in df:
        if d.shape[1] == 5:
            d = d.drop(d.columns[1], axis="columns")
        if d.shape[1] == 4:
            d = d.drop(d.columns[-1], axis="columns")
        if mode == "csv":
            d.to_csv(target_path, mode="a", index=False, header=False)
        if target_path is None:
            dfs.append(d)
    if target_path is None:
        return dfs

Usage of IndexHandler

# check if S-3's were filed after "2020-01-01", get the submission info and download them.

newfiles = dl.index_handler.get_newer_filings_meta("0001718405", "2020-01-01", set(["S-3"]))
for key, values in newfiles.items():
    for v in values:
        dl.get_filing_by_accession_number(key, *v)
# If you dont know the CIK call `dl._convert_to_cik10(ticker)` to get it

# check the index for none existing files and remove the entries from the index
dl.index_handler.check_index()

# get index entry of downloaded filings with the same file number
dl.index_handler.get_related_filings("some cik", "some file number")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysec-downloader-0.0.47.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysec_downloader-0.0.47-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file pysec-downloader-0.0.47.tar.gz.

File metadata

  • Download URL: pysec-downloader-0.0.47.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.10.1

File hashes

Hashes for pysec-downloader-0.0.47.tar.gz
Algorithm Hash digest
SHA256 2cf3343bbdd6863f71b5bfdcdd749cda91043bfd04f9131d6f0f85213d4d2e22
MD5 76687c7c60edc44e1131e41676541758
BLAKE2b-256 8f0be1af7fa2cb4f2f99aa1df7e224c38481a4068d0a43689103b8e47fa39a70

See more details on using hashes here.

File details

Details for the file pysec_downloader-0.0.47-py3-none-any.whl.

File metadata

File hashes

Hashes for pysec_downloader-0.0.47-py3-none-any.whl
Algorithm Hash digest
SHA256 fa245e5b2fef9bd7237531594a967c69b642e1c3da20bb97a3d60f13b185ea5e
MD5 cd3455f9dda94b2c759d799ce478173d
BLAKE2b-256 bd293de3e8c3bd54588d1672bc826d26b34cd30c7f6e7e5e051e42dccd570708

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page