A simple package for converting PubMed output (ie, the output from PubMed with all PubMed fields) into a flat file or pandas.DataFrame object. Simple functions are provided to merge the results of separate searches and automatically create variables indicating which search(es) a record was found in. Users can then add their own notes on the references by importing the flat file into a spreadsheet program of their choice and adding their notes in new columns.
Project description
Summary
A simple package for converting PubMed output with all PubMed fields into a
flat file or pandas.DataFrame
object. Simple functions are provided to
merge the results of separate searches and automatically create tags
(variables that are 'Y' or '') to indicate which search(es) a reference is
found in.
Users can then add notes on the references by importing the flat file into a spreadsheet program and adding their notes in new columns. A user can then export this spreadsheet back to a flat file, convert it to a data frame, merge it with the results of additional searches, and convert this merged data frame back to a text file for further user notes, etc...
Obviously this is not a particularly robust 'database system' to use, and in particular users should take care that when updating flat files they do not omit any columns they manually created in the previous flat file.
Trademark Notice
The PubMed wordmark is a registered trademark of the U.S. Department of Health and Human Services. This software is not endorsed by or affiliated with the trademark holders.
Example
import flatmed as fm
# When rerunning, uncomment the `txt_to_df` call and change the next lines
# to a `merge_to_df()` call.
#master_df = fm.txt_to_df('ex_refs_mynotes.txt')
inpdir = 'downloaded/'
# Start with a single set of results and convert to a data frame
master_df = fm.pubmed_to_df(inpdir + 'TMLE.txt')
master_df['lta_TMLE'] = 'Y'
# Merge to this data frame four other searches using the default merge
# method (outer join) and create four more indicator variables.
master_df = fm.merge_to_df(master_df, inpdir + 'MSM.txt', 'lta_MSM')
master_df = fm.merge_to_df(master_df, inpdir + 'TVCONF.txt', 'lta_TVCONF')
master_df = fm.merge_to_df(master_df, inpdir + 'AIPW.txt', 'lta_AIPW')
master_df = fm.merge_to_df(master_df, inpdir + 'MSM_excl.txt',
'lta_MSM_excl')
# The `lta_MSM_excl` are a subset of the MSM results that we have already
# determined can be excluded.
master_df = master_df[~((master_df.lta_MSM_excl == 'Y'))]
# After the above filter, the variable is always empty, so can be dropped.
master_df = master_df.drop(columns = ['lta_MSM_excl'])
# publication year
master_df['l_pubyear'] = master_df.DP.str.slice(start=0, stop=4)
# keep only a subset of the PubMed fields. Can add more if desired.
# We use the convention that any variables automatically created as a tag
# starts with 'lta_' and other variables we create (either externally or in
# this file) start with 'l_'.
my_cols = [ x for x in master_df.columns if x.startswith('lta_')
or x.startswith('l_') ]
fm.df_to_txt(master_df, 'ex_refs.txt',
['PMID', 'TI', 'AU', 'SO', 'l_pubyear'] + my_cols + ['AB'])
Release Notes
Version 0.0.1
- Initial release
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flatmed-0.0.1.tar.gz
.
File metadata
- Download URL: flatmed-0.0.1.tar.gz
- Upload date:
- Size: 44.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8147140aaa9da3a3318c72082713480bd5fa7cc8430fe8a2c269cfe1e0ffe02a |
|
MD5 | 5f4214fe40e2da8ba76b0b94602dee79 |
|
BLAKE2b-256 | 18ba629732fc9834927eeb362911ba6be8c5288841dc1d20be7aecabcf267477 |
File details
Details for the file flatmed-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: flatmed-0.0.1-py3-none-any.whl
- Upload date:
- Size: 33.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c50a58a7d62d01bf11ef1be5ff107d8579e64c0fc3085199e795bfe2c4d5bf8 |
|
MD5 | fe3ac7cf6c59c039f057ee65cca74774 |
|
BLAKE2b-256 | 3bf4d871bc490449ac0d9935d14e20d8d872d61d16050014a5afda42bb3b7d7b |