Skip to main content

Extract chemical data from Safety Data Sheet documents

Project description

alt text

SDSParser

SDSParser is a browser-based app for extracting chemical data from Safety Data Sheet documents. SDSParser will speed up your data-entry process by eliminating the need to read through Safety Data Sheets to get the data you care about.

For a live demo, click here: SDSParser

For testing purposes, here are some SDS files to download and use:

Motivation

Built out of the need to quickly access chemical data from Safety Data Sheets for data-entry purposes. Each chemical manufacturer will stylize and structure their SDSs a little bit differently. SDSParser can easily be updated to read a new manufacturer format by adding a new set of regular expressions to match the format that that specific manufacturer uses.

Tech/framework used

  • pdfminer, a tool for extracting information from PDF documents
  • pytesseract, a python wrapper for Google's Tesseract-OCR

Features

Have some physical SDSs you need to scan and get data from? Have no fear, sds_parser will recognize your scanned file as an image and perform optical character recognition (ocr) to extract the text for you.

How to install

pip install SDSParser

How to use

Simply initialize SDSParser with an optional list of data fields you wish to extract (e.g. ['manufacturer', 'flash_point']) to the request_keys key-word argument. See configs.SDSRegexes.REQUEST_KEYS for the proper keys to use. If no keys are requested, all available data fields will be searched.

>>> from sdsparser import SDSParser
>>> request_keys = ['manufacturer', 'flash_point', 'specific_gravity', 'product_name', 'sara_311', 'nfpa_fire']
>>> parser = SDSParser(request_keys=request_keys)

Here is a list of the keys to use.

>>> from sdsparser.configs.SDSRegexes import REQUEST_KEYS
>>> REQUEST_KEYS
[
    'manufacturer',
    'product_name',
    'flash_point',
    'specific_gravity',
    'nfpa_fire',
    'nfpa_health',
    'nfpa_reactivity',
    'sara_311',
    'revision_date',
    'physical_state',
    'cas_number',
]

Call parser.get_sds_data('path/to/ExampleSDS.pdf') and pass in the path to your SDS document to get the sds data.

>>> sds_data = parser.get_sds_data('path/to/SafetyDataSheet.pdf')

.get_sds_data returns a dictionary object mapping request key names to their corresponding matches

>>> sds_data
{
 'manufacturer': 'Sigma-Aldrich',
 'product_name': 'Sodium dodecyl sulfate',
 'flash_point': '338 F',
 'specific_gravity': '3.2',
 'sara_311': 'Data not listed'
 'nfpa_fire': 'No data available'
}

If the heading for the requested data type is not found in the SDS, .get_sds_data will return the string 'Data not listed'. If the heading is found, but no data is found under it, .get_sds_data will return the string 'No data available'.

SDSParser-cli

In your terminal

path/to/sds/directory $ sdsparser parse --flash_point --specific_gravity
{'fisher_1.pdf': {'flash_point': 'No data available',
                  'specific_gravity': 'No data available'},
 'fisher_2.pdf': {'flash_point': 'No data available',
                  'specific_gravity': 'No data available'},
 'fisher_3.pdf': {'flash_point': 'No data available',
                  'specific_gravity': '1.84'},
 'fisher_5.pdf': {'flash_point': 'No data available',
                  'specific_gravity': 'No data available'}}

or

path/to/sds/directory $ sdsparser parse --csv
path/to/sds/directory $ cat sds_data.csv
Fisher,Data not listed,No data available,No data available,1,0,0,/312 Hazard CategoriesSee section 2 for more informationCWA (Clean Water Act)Not,26-Jan-2018,Powder,Data not listed
Fisher,"Salicylic acid, sodium salt",No data available,(etc...)

for more information

$ sdsparser --help

or

$ sdsparser parse --help

License

MIT © Aris Stepe

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

SDSParser-0.1.4.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

SDSParser-0.1.4-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file SDSParser-0.1.4.tar.gz.

File metadata

  • Download URL: SDSParser-0.1.4.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for SDSParser-0.1.4.tar.gz
Algorithm Hash digest
SHA256 36c42a0617a59423bc15e7e1e2addeca994d5fce0a438f47686b71e318c3ca6a
MD5 41951c6c0a8e6c9a91fbfa40d4876b48
BLAKE2b-256 2143d0cbd73424c89c7ccb011b6b7e7d2f496064e5cff6c373a4b59a14dfcce7

See more details on using hashes here.

File details

Details for the file SDSParser-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: SDSParser-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2

File hashes

Hashes for SDSParser-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8b970a0d52134fbebb81cae092338eb58375977f6b1d4140bd9195c5ead5a50d
MD5 3f767964006d7409cf5acad438a49d22
BLAKE2b-256 24dfe315fac14363b717e9a3d8b6652e167b467848572e8fcc349b4f3bac6dd6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page