Skip to main content

Download files from the Cencora secure file transfer site

Project description

Purpose

This Python package can be used to download files from the Cencora (formerly Amerisource) secure file transfer site for ingest into clinical data systems.

Downloads are performed from the web-based secure site located at https://secure.amerisourcebergen.com/. FTP is not supported. (There are many easier ways to automate FTP-based downloads.)

Requirements

  • Python 3.10 or newer

Installation

Use pip to install the medberg package.

pip install medberg

Usage

Establishing a connection

Import the SecureSite class from the medberg module.

from medberg import SecureSite

Initialize a connection to the secure site by providing a username and password.

con = SecureSite(username='yourname', password='yourpassword')

Reviewing files

A list of files is automatically downloaded at connection time and stored in the files variable. Files are represented by objects comprising a name, filesize, and upload date.

print(con.files)
# [File(name=340B037AM1234567890330.TXT, filesize=self.filesize='1.3MB', date='03/30/2025'),  ...]

print(con.files[0].name)
# 340B037AM1234567890330.TXT

print(con.files[0].filesize)
# 1.3MB

print(con.files[0].date)
# datetime.datetime(2025, 3, 30, 8, 13, 58)

The library will attempt to automatically extract additional metadata from the filename describing account type (e.g., 340B, GPO, WAC), file specification (e.g., 037, 039), and account number.

print(con.files[0].account_type)
# 340B

print(con.files[0].specification)
# 037AM

print(con.files[0].account_number)
# 123456789

If the metadata is not present in the filename, the corresponding property will simply evaluate to None.

Downloading files

Any individual file can be downloaded using the get() method of the File class. Optional parameters can be specified for the save directory (save_dir) and local filename (save_name). If these are omitted, the file will be saved in the current working directory using the original filename by default. Five attempts will be made to download the file by default. This can be overriden with the max_tries parameter.

con.files[0].get(save_dir='C:\\Users\\yourname\\Downloads\\',
                 save_name='new_filename.txt',
                 max_tries=10)

Files can also be downloaded using the get_file() method of the SecureSite class. In this case, the file to download must be specified in the first parameter as either an instance of the File class or a string containing the filename as it appears on the remote site. Other, optional arguments, exceptions raised, and return values are the same as for the File.get() method.

# Using a File object
file_to_get = con.files[0]
con.get_file(file_to_get)

# Using a string filename
con.get_file('039A_012345678_0101.TXT')

When a file is downloaded using either of the methods above, the return value will be a pathlib Path object pointing to the local file.

Filtering files

The list of files obtained from the server can be filtered using the match_files() method, which can take any number of arguments in the format file_property=filter_value. For example, to retrieve all files with account number 123456789, you can call match_files(account_number="123456789"). The result will be a list of File objects matching the specified arguments.

con.match_files(account_number="123456789")
# [File(name=340B037AM1234567890330.TXT, filesize=self.filesize='1.3MB', date='03/30/2025'),  ...]

Files can be matched on any attribute. In cases where the file property type differs from the filter value type, the filter value will be converted to the correct type automatically. For example, the account number above was filtered using a string (as account_number is stored in the file class), but it can just as well be filtered using an integer:

con.match_files(account_number=123456789)
# [File(name=340B037AM1234567890330.TXT, filesize=self.filesize='1.3MB', date='03/30/2025'),  ...]

String filter values can contain a wildcard (*) at the beginning or end of the filter. For example, match_files(file_specification="039*") will match "039", "039A", "039AM", etc.

List and tuple filters will cause a match if any one of the inner values matches. Effectively, this acts as a nested OR filter.

Callables can also be passed to allow for more complex filtering. For example, we can get all files from the current month as follows:

from datetime import datetime

current_month = datetime.now().month
current_year = datetime.now().year
con.match_files(date=lambda x: x >= datetime(current_year, current_month, 1))

Multiple filter arguments can be passed together to create a more specific filter.

To get a single file with the most recent upload time that matches a filter or series of filters, use match_latest_file(). This method takes the same arguments as the match_files() method.

Manipulating files

Once files are downloaded, you can perform row-level manipulations using the File.filter_() method. To do this, you must have already downloaded the target file using get(), otherwise this will be performed for you using the default parameters.

Next, a row pattern must be present in File.row_pattern. This is essentially a regex that defines the named capture groups of each line within the file. When the file is downloaded, the library will attempt to match a row pattern based on the parsed specification. If this fails, you must set it manually, e.g.:

file.row_pattern = RowPattern.ICS_039A

Filters are defined as lambda functions based on row properies. Each Row object contains two properties: raw, which is simply a raw string representation of the row from the file, and parts, which contains the parsed elements from the row in a dictionary. Take the following row as an example:

11111111111222222  333333333444444444

When parsed with the ICS_039A row pattern, parts results as the following:

{
    "ndc11": "11111111111",
    "item_id": "222222",
    "price": "333333333",
    "pack_size": "444444444"
}

You could filter in rows that contain an NDC-11 beginning with 11111 with the following lambda:

file.filter_(lambda row: row.parts['ndc11'].startswith("11111"))

If called as a standalone function, filter_() will open the file, filter rows, and save the result on its own. If multiple applications of filter_() need to be performed, it's recommended to use a with block, which opens and saves the file at the beginning and end of the block, respectively.

with file as f:
    # Writing to disk occurs only after the final filter is applied
    f.filter_(lambda row: row.parts['ndc11'].startswith("11111"))
    f.filter_(lambda row: int(row.parts['price']) / 1000 > 100)

Contributing

Pull requests are welcome. Please ensure all code submitted is formatted with Black and tested with pytest. For major changes, please open an issue first to discuss what you would like to change.

When editing the codebase locally, you may install medberg in development mode to use it in REPLs:

pip install -e '.[dev]'

License

This software is licensed under the MIT License.

Disclaimer

This package and its authors are not afiliated, associated, authorized, or endorsed by Cencora, Inc. All names and brands are properties of their respective owners.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medberg-1.2.0.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medberg-1.2.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file medberg-1.2.0.tar.gz.

File metadata

  • Download URL: medberg-1.2.0.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for medberg-1.2.0.tar.gz
Algorithm Hash digest
SHA256 86eb6058b7f1185773597e8461dd03e186f712572b9d850e25bf73efb82144ac
MD5 985c0da643cb6a4fc832a655fc4b09b0
BLAKE2b-256 7bd847f4a80c974ea005ef188e609ce91a54b22b2376b5cba8d13068ae6689d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for medberg-1.2.0.tar.gz:

Publisher: python-publish.yml on eddie-cosma/medberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medberg-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: medberg-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for medberg-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 62329871dbb2dee31769c2ee7f5c47224a1e96d473506778a5cd6b4fccb1834c
MD5 21b35657d305c7bdec792eefced2fbfa
BLAKE2b-256 26e727abf4f8e3e7c4d5c94dd5741f3dc8952e4e58ed9130572cf87a523909b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for medberg-1.2.0-py3-none-any.whl:

Publisher: python-publish.yml on eddie-cosma/medberg

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page