Skip to main content

Scrape data from SEC's EDGAR

Project description

EDGAR

A small library to access files from SEC's edgar.

Installation

pip install edgar

Example

To get a company's latest 5 10-Ks, run

from edgar import Company
company = Company("Oracle Corp", "0001341439")
tree = company.get_all_filings(filing_type = "10-K")
docs = Company.get_documents(tree, no_of_documents=5)

or

from edgar import Company, TXTML

company = Company("INTERNATIONAL BUSINESS MACHINES CORP", "0000051143")
doc = company.get_10K()
text = TXTML.parse_full_10K(doc)

To get all companies and find a specific one, run

from edgar import Edgar
edgar = Edgar()
possible_companies = edgar.find_company_name("Cisco System")

To avoid pull of all company data from sec.gov on Edgar initialization, pass in a local path to the data

from edgar import Edgar
edgar = Edgar("/path/to/cik-lookup-data.txt")
possible_companies = edgar.find_company_name("Cisco System")

To get XBRL data, run

from edgar import Company, XBRL, XBRLElement

company = Company("Oracle Corp", "0001341439")
results = company.get_data_files_from_10K("EX-101.INS", isxml=True)
xbrl = XBRL(results[0])
XBRLElement(xbrl.relevant_children_parsed[15]).to_dict() // returns a dictionary of name, value, and schemaRef

API

Company

Company(name, cik, timeout=10)
  • name (company name)
  • cik (company CIK number)
  • timeout (optional) (default: 10)

Methods

get_filings_url(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> str

Returns a url to fetch filings data

  • filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
  • prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
  • ownership: defaults to include. Options are include, exclude, only.
  • no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.

get_all_filings(self, filing_type="", prior_to="", ownership="include", no_of_entries=100) -> lxml.html.HtmlElement

Returns the HTML in the form of lxml.html

  • filing_type: The type of document you want. i.e. 10-K, S-8, 8-K. If not specified, it'll return all documents
  • prior_to: Time prior which documents are to be retrieved. If not specified, it'll return all documents
  • ownership: defaults to include. Options are include, exclude, only.
  • no_of_entries: defaults to 100. Returns the number of entries to be returned. Maximum is 100.

get_10Ks(self, no_of_documents=1, as_documents=False) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of concatenation of all the documents in the 10-K

  • no_of_documents (default: 1): numer of documents to be retrieved
  • When as_documents is set to True, it returns -> List[edgar.document.Documents] a list of Documents

get_10Ks_metadata(self) -> List[dict]

Returns the HTML in the form of a dictionary of concatenation of all the document metadata in the 10-K

get_document_type_from_10K(self, document_type, no_of_documents=1) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the document within 10-K

  • document_type: Tye type of document you want, i.e. 10-K, EX-3.2
  • no_of_documents (default: 1): numer of documents to be retrieved

get_data_files_from_10K(self, document_type, no_of_documents=1, isxml=False) -> List[lxml.html.HtmlElement]

Returns the HTML in the form of lxml.html of the data file within 10-K

  • document_type: Tye type of document you want, i.e. EX-101.INS
  • no_of_documents (default: 1): numer of documents to be retrieved
  • isxml (default: False): by default, things aren't case sensitive and is parsed with html in lxml. If this is True, then it is parsed with etree` which is case sensitive

Class Method

get_documents(self, tree: lxml.html.Htmlelement, no_of_documents=1, debug=False, as_documents=False) -> List[lxml.html.HtmlElement] Returns a list of strings, each string contains the body of the specified document from input

  • tree: lxml.html form that is returned from Company.getAllFilings
  • no_of_documents: number of document returned. If it is 1, the returned result is just one string, instead of a list of strings. Defaults to 1.
  • debug (default: False): if True, displays the URL and form
  • When as_documents is set to True, it returns -> List[edgar.document.Documents] a list of Documents

Edgar

Gets all companies from EDGAR

get_cik_by_company_name(company_name: str) -> str: Returns the CIK if given the exact name or the company

get_company_name_by_cik(cik: str) -> str: Returns the company name if given the CIK (with the 000s)

find_company_name(words: str) -> List[str]: Returns a list of company names by exact word matching

find_company_name_cik(words: str) -> List[tuple[str, str]]: Return a list of company names and their CIK values

match_company_by_company_name(self, name, top=5) -> List[Dict[str, Any]]: Returns a list of dictionarys, with company names, CIK, and their fuzzy match score

  • top (default: 5) returns the top number of fuzzy matches. If set to None, it'll return the whole list (which is a lot)

XBRL

Parses data from XBRL

Properties

relevant_children

  • get children that are not context relevant_children_parsed
  • get children that are not context, unit, schemaRef
  • cleans tags

Documents

Filing and Documents Details for the SEC EDGAR Form (such as 10-K)

Documents(url, timeout=10)

Properties

url: str: URL of the document

content: dict: Dictionary of meta data of the document

content['Filing Date']: str: Document filing date

content['Accepted']: str: Document accepted datetime

content['Period of Report']: str: The date period that the document is for

element: lxml.html.HtmlElement: The HTML element for the Document (from the url) so it can be further parsed

Contribution

Buy Me A Coffee

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edgar-5.6.3.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

edgar-5.6.3-py3-none-any.whl (23.6 kB view details)

Uploaded Python 3

File details

Details for the file edgar-5.6.3.tar.gz.

File metadata

  • Download URL: edgar-5.6.3.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.1.0 keyring/24.3.1 pkginfo/1.10.0 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.31.0 rfc3986/1.5.0 tqdm/4.66.2 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for edgar-5.6.3.tar.gz
Algorithm Hash digest
SHA256 6e8ca83809a72ee3872736fac211b01fada7d3228052a088437717147f445a07
MD5 dfdd5f639ac0b9f567abad987371ac9d
BLAKE2b-256 fc626aad0e8c98c11935b90e4c256bdc209fd38ebfe4c4dfd4ffe7a4f14cad07

See more details on using hashes here.

File details

Details for the file edgar-5.6.3-py3-none-any.whl.

File metadata

  • Download URL: edgar-5.6.3-py3-none-any.whl
  • Upload date:
  • Size: 23.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/7.1.0 keyring/24.3.1 pkginfo/1.10.0 readme-renderer/34.0 requests-toolbelt/1.0.0 requests/2.31.0 rfc3986/1.5.0 tqdm/4.66.2 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for edgar-5.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 4f4d1c299c8ef61e4be40812d24588ff6d230ed16e884083204a5601f6a465ba
MD5 b95e999d6713697f85d1c445197e8a31
BLAKE2b-256 d0b436966cabbaddc606111c59adb6b0fa25b615679089fcbc9bbfc167e2f582

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page