Skip to main content

Extract chemical data from Safety Data Sheet documents

Project description

alt text

SDSParser

SDSParser is a browser-based app for extracting chemical data from Safety Data Sheet documents. SDSParser will speed up your data-entry process by eliminating the need to read through Safety Data Sheets to get the data you care about.

For a live demo, click here: SDSParser

For testing purposes, here are some SDS files to download and use:

Motivation

Built out of the need to quickly access chemical data from Safety Data Sheets for data-entry purposes. Each chemical manufacturer will stylize and structure their SDSs a little bit differently. SDSParser can easily be updated to read a new manufacturer format by adding a new set of regular expressions to match the format that that specific manufacturer uses.

Tech/framework used

  • pdfminer, a tool for extracting information from PDF documents
  • pytesseract, a python wrapper for Google's Tesseract-OCR

Features

Have some physical SDSs you need to scan and get data from? Have no fear, sds_parser will recognize your scanned file as an image and perform optical character recognition (ocr) to extract the text for you.

How to use?

Simply initialize SDSParser with an optional list of data fields you wish to extract (e.g. ['manufacturer', 'flash_point']) to request_keys. See configs.SDSRegexes.SDS_DATA_TITLES for the proper keys to use. If no keys are requested, all available data fields will be searched.

sds_parser = SDSParser(**request_keys=<[keys]>)

then call .get_sds_data() to retrieve the matches by passing in your SDS document in .pdf format.

chemical_data = sds_parser.get_sds_data(file_path)

chemical_data will be a dictionary object mapping request key names to their corresponding matches:

{'Manufacturer': 'Sigma-Aldrich', 
 'Product Name': 'Sodium dodecyl sulfate', 
 'Flash Point': '338', 
 'Specific Gravity': 'No data available', 
 'NFPA Fire': '3', 
 'NFPA Health': '2', 
 'NFPA Reactivity': '3', 
 'SARA 311/312': 'Data not listed', 
 'Revision Date': '06/13/2018', 
 'Physical State': 'Rods', 
 'CAS # (if pure)': '151-21-3', 
 'format': 'sigma_aldrich', 
 'filename': 'sigma_aldrich_23.pdf'}

If the specific field is not found in the SDS, .get_sds_data() will return the string 'Data not listed'. If the field is found, but no data is found under it, .get_sds_data() will return the string 'No data available'.

License

MIT © Aris Stepe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SDSParser-0.1.2.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

SDSParser-0.1.2-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file SDSParser-0.1.2.tar.gz.

File metadata

  • Download URL: SDSParser-0.1.2.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for SDSParser-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bfc3add3b4c52ae261b67ec6ede1195f12791c5525133bf7421b906ce0cf5c1b
MD5 4b19b0cb9d02cdf92ff386eca1ac6b29
BLAKE2b-256 20d7e2d46a52683507e22abbd3df09fe8d06e2393d82a4d06a60453aea56eb4e

See more details on using hashes here.

File details

Details for the file SDSParser-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: SDSParser-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for SDSParser-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 eac9070f6698cf717af607afbb764e9788372b272e725406a7ec43d990581634
MD5 8e9d608b583547bc0b9ddf3b1e9f2463
BLAKE2b-256 0f3f4c8052342f16748467e7f6c1bed085b964ec2a8572bb0fe90d2b076cdf57

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page