Command line tool for matching cases to controls
Project description
Using opensafely-matching
System requirements
Requires Python 3.8+
Install with:
pip install opensafely-matching
Input data
This is expected to be provided in two files in one of the supported formats (.csv, .csv.gz or .arrow) - one for the case/exposed group and one for the population to be matched.
Use
In a python script
Matching is run by calling the match function with at least the required arguments, as per:
from osmatching import match, load_config, load_dataframe
config = load_config(
{
"matches_per_case": 3,
"index_date_variable": "index_date",
"match_variables": {
"sex": "category",
"age": 5
}
}
)
match(
case_df=load_dataframe("input_cases.arrow"),
match_df=load_dataframe("input_matches.arrow"),
match_config=load_config(config)
)
This matches 3 matches per case, on the variables sex, and age (±5 years) and produces output files in the default .arrow format.
Outputs:
output/matched_cases.arrow
output/matched_matches.arrow
output/matched_combined.arrow
output/matching_report.txt
From the command line
usage: match [-h] (--config CONFIG | --config-file CONFIG) [--cases CASES]
[--controls CONTROLS] [--output-format {arrow,csv.gz,csv}]
Matches cases to controls if provided with 2 datasets
options:
-h, --help show this help message and exit
--config CONFIG The configuration for the matching action (a JSON string)
--config-file CONFIG Path to the configuration JSON file for the matching action
--cases CASES Data file that contains the cases
--controls CONTROLS Data file that contains the cohort for cases
--output-format {arrow,csv.gz,csv}
Format for the output files
To run the above example from the command line:
match --cases input_cases.arrow --controls input_matches.arrow --config-file config.json
where config.json is a file containing additional arguments to match():
{
"matches_per_case": 3,
"match_variables": {
"sex": "category",
"age": 5
},
"index_date_variable": "indexdate"
}
Alternatively, pass config on the command line as a json string:
match \
--cases input_cases.arrow \
--controls input_matches.arrow \
--config '{"matches_per_case": 3, "match_variables": {"sex": "category", "age": 5}, "index_date_variable": "indexdate"}'
For configuration options, please see the reusable action documentation
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file opensafely_matching-1.2.3.tar.gz.
File metadata
- Download URL: opensafely_matching-1.2.3.tar.gz
- Upload date:
- Size: 29.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8b5fbfffeef8c02ca8cf8f59aa67007b20540fe55c8b7abb1d94e52ee420131
|
|
| MD5 |
496321484d8469fbf3093352131a874b
|
|
| BLAKE2b-256 |
0b2ac8dcaab8ce9cae4e73cf7f87786bbb670c724a1a3ecd24932ea51333b859
|
File details
Details for the file opensafely_matching-1.2.3-py3-none-any.whl.
File metadata
- Download URL: opensafely_matching-1.2.3-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97e55331bd13b90c3bbdad4230481cda0d6c49240fa14f52041491513a3abbdd
|
|
| MD5 |
db5198e8a285434ea1d0629028771a60
|
|
| BLAKE2b-256 |
80d154af815a9e9f2b9539986162393ea531248e26b76dcb64feeb7733734cfd
|