A collection of tools to conduct research on the Modern Slavery Statements text corpus.
Project description
Research on Modern Slavery
This repository is going to contain a collection of experiments and analyses performed on the Modern Slavery Statements Dataset.
Introduction
The UN Sustainable Development Goal 8.7 states: Take immediate and effective measures to eradicate forced labour, end modern slavery and human trafficking and secure the prohibition and elimination of the worst forms of child labour, including recruitment and use of child soldiers, and by 2025 end child labour in all its forms.
In 2018, the Global Slavery Index found that there were 40.3 M people in modern slavery, of whom 25M were in forced labor producing computers, clothing, agricultural products, raw materials, etc and 15M were in forced marriage.
The Future Society, an independent nonprofit think-and-do tank launched a partnership with the Walk Free Initiative to automate the analysis of modern slavery statements produced by businesses to boost compliance and help combat and eradicate modern slavery. The team at The Future Society is curating an up-to-date repository of >16K modern slavery statements (and counting) to boost machine learning research in this area. The data is scraped based on the collection of report links provided by the modernslaveryregistry.org.
By sharing your analysis and contributing to this repository you help the global community to hold multi-national corporations accountable for how they treat their workforce and suppliers.
Prerequisites
- Python 3.6+ installed on your system
- If you'd like to use the provided tutorials, you also need access to a Jupyter notebook
Quickstart
It's recommended that you use a virtual environment such as virtualenv, pipenv or similar.
Option 1 - notebook
Copy this notebook and follow the tutorial.
The linked notebook above also shows you how to plot the word count distribution of all the documents downloaded.
Option 2 - command line
Install the package:
pip install modern-slavery-statements-research
Specify your AWS access credentials as -i
(aws access key id) and -a
(secret access key) arguments and run (without the curly brackets):
download-statements -i {aws_access_key_id} -a {aws secret access key}
The logs printed in the console will tell you the name of the data folder.
If you've set up your modern slavery project related AWS CLI credentials as default you can simply run
download-statements
You can explore more options by running download-statements --help
How to get data access?
The data is available in the modern-slavery-dataset-txt
bucket in AWS S3. The future plans involve releasing the dataset for the general public access.
The bucket contains multiple copies of statements from various scraping runs so the logic is to always pick the latest folder. The provided scripts and examples take care of this automatically. The raw documents are found in the /data/raw_statements/
folder in the modern-slavery-dataset-raw
bucket. Metadata is found in s3://modern-slavery-dataset-raw/data/ms_registry/
following the same logic.
As it's work in progress, at present, if you'd like to work with this data, please send an email to edgar@bravetech.io with a link to your social profile (linkedin, facebook or similar ) and you'll receive IAM user credentials on the first possible instance that would allow you to download and access the data.
Get Help
If you'd like to get help with domain expertise or technical requirements and implementations then get in touch with Adriana or Edgar respectively.
Roadmap
Over the next few weeks and months, the following improvements are planned to the dataset and the repository:
Provide a convenient one-command entry point to the data, including download to a pandas dataframe.- Improve the dataset quality by continuously including more documents and improving the data cleaning pipeline.
- Provide examples of analysis.
- Provide manually annotaded labels for a subset of the corpus to enable analyses using supervised methods.
- Open source the data and research for public access.
Citation
If you intend to share any form of public research and analysis based on the data from this repository and the modern-slavery-dataset-raw
and modern-slavery-dataset-txt
buckets in AWS S3, then please include the following citation to your publication:
The Future Society. (2020) Modern Slavery Statements Research. Retrieved from https://github.com/the-future-society/modern-slavery-statements-research.
Contributions
If you'd like to contribute to the research then take a look at any of the issues or get in touch with Adriana or Edgar.
Take a look at colab notebooks based on the modern slavery corpus:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for modern-slavery-statements-research-0.1.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | a4509d2347d6e4228868bbaf66703d26592bd2875ee54426e0e09e69af42fa6e |
|
MD5 | 2a69dd394c5842d90e5f08b0ec6f931d |
|
BLAKE2b-256 | 0c6df4ca59b55a4eff70c6fd7911c1098459594167af8950f496536206fbb4ba |
Hashes for modern_slavery_statements_research-0.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09ae42d060d4281734f96ed6a4a6015a73f8e156b04a5e7ee2090046ab1650c6 |
|
MD5 | 8f0c91fbb1165befed83a531bc51aa3f |
|
BLAKE2b-256 | 8fd4069b53a33987bc9f7493934e2650b313f0cad356af8fecf8c50b263971ab |