chakki Financial Report Corpus
Project description
chaFiC: chakki Financial Report Corpus
We organized Japanese financial reports to encourage applying NLP techniques to financial analytics.
Dataset
You can download dataset by command line tool.
pip install chafic
chafic download --kind F --year 2014
Please refer the usage by --
.
chafic --
Raw dataset file
The corpora are separated to each financial years.
fiscal_year | Raw file version (F) | Text extracted version (E) |
---|---|---|
2014 | .zip (9.3GB) | .zip (270.8MB) |
2015 | .zip (9.8GB) | .zip (291.1MB) |
2016 | .zip (10.2GB) | .zip (334.7MB) |
2017 | .zip (9.1GB) | .zip (310.2MB) |
2018 | .zip (10.5GB) | .zip (260.9MB) |
Statistics
fiscal_year | number_of_reports | has_csr_reports | has_financial_data | has_stock_data |
---|---|---|---|---|
2014 | 3,724 | 92 | 3,583 | 3,595 |
2015 | 3,870 | 96 | 3,725 | 3,751 |
2016 | 4,066 | 97 | 3,924 | 3,941 |
2017 | 3,578 | 89 | 3,441 | 3,472 |
2018 | 3,513 | 70 | 2,893 | 3,413 |
- financial data is from 決算短信情報.
- We use non-cosolidated data if it exist.
- stock data is from 月間相場表(内国株式).
close
is fiscal period end andopen
is 1 year before of it.
Content
Raw file version
The structure of dataset is following.
chakki_esg_financial_{year}.zip
└──{year}
├── documents.csv
└── docs/
docs
includes XBRL and PDF file.
- XBRL file of annual reports (files are retrieved from [EDINET]).
- PDF file of CSR reports (additional content).
documents.csv
has metadata like following.
- edinet_code:
E0000X
- filer_name:
XXX株式会社
- fiscal_year:
201X
- fiscal_period:
FY
- doc_path:
docs/S000000X.xbrl
- csr_path:
docs/E0000X_201X_JP_36.pdf
Text extracted version
Text extracted version includes txt
files that match each part of an annual report.
The extracted parts are defined at edinet-python
.
chakki_esg_financial_{year}_extracted.zip
└──{year}
├── documents.csv
└── docs/
Utilize Data for NLP
We offer the parser for the financial documents based on GiNZA. Please refer the ficser to use this feature.
Example: Parse
</code></pre>
<p>Example: NER</p>
<pre lang="py"><code>
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.