Skip to main content

A tool for parsing crime statistics reports (form 4-ЕГС) from crimestat.ru.

Project description

crimestat3000

A tool for automated parsing of Russian crime statistics reports (form 4-ЕГС) from crimestat.ru. All you need to know is which section of report you need, which sheets and columns. (Beware: these tend to change over the years so make sure to check for that and if needed separate you parsing process into several parts with different configurations.)

There's no need to download files manually -- crimestat3000 will take care of that without generating temporary files. But if you happen to have the files locally you can pass the path to their location to local_dir argument to slightly increase processing speed.

Important: a 4-ЕГС report mostly shows cumulative sums since the beginning of the year. By default crimestat3000 turns them into monthly values -- one can switch it off by setting cumsum argument to True.

You can also optionally specify the level of detail you need. Some sheets contain information on a previously mentioned article's specific part or paragraph -- you can drop those or keep those or just start with parsing all the sheets there are to decide knowingly later.

Finally you can set shorten_descr argument to True to turn column names like Строка 12: умышленное причинение легкого вреда здоровью, совершенное по мотивам политической, идеологической, расовой, национальной или религиозной ненависти или вражды либо по мотивам ненависти или вражды в отношении какой-либо социальной группы п. «б» ч. 2 ст. 115 УК РФ to 115_ч2_б. It is neat -- but keep in mind that you should use shortener only if you are interested just in the sheets dedicated to some specific article or an article's part/paragraph. If no article is mentioned the shortner will return the sheets number with "no articles mentioned" comment instead of a proper column name: e.g. Строка 3: небольшой и средней тяжести turns into Строка 3: no articles mentioned.

To install crimestat3000 use pip:

pip install crimestat3000

Here's an example call:

import crimestat3000 as cs

kwargs = {
    'first_month': '01-2016',
    'last_month' : '12-2016',
    'section'    : 2,

    # optional arguments                                  
    # ==================                                  ========
    # 'sheets'       : {'all', a list of sheets}          # defaults to 'all'

    # 'keep'         : {'all', 'articles', 'articles+'}   # 'all'       -- get all sheets (default).
                                                          # 'articles'  -- all sheets with an article mentioned in description,
                                                          #                but not the sheets with specific article part or paragraph:
                                                          #                i.e. it will get you 228 data but not 228.1.
                                                          # 'articles+' -- all sheets with anything specific mentioned.

    'columns'      : ['C', 'E'],                          # defaults to 'C', usually the sheet's total.
                                                          # Include only the value columns in your list -- 
                                                          # regions column is always included automaticly. 
    'shorten_descr': True                                 # defaults to False
    # 'local_dir'    : {None, path to a local directory}  # defaults to None
    # 'cumsum'       : {True, False}                      # defaults to False
}

table_2016 = cs.parse.period(**kwargs)

(OLE2 inconsistency warnings may pop up sometimes -- don't worry about that: it happens while reading an .xls file content into pandas because some of the files at crimestat.ru are a little bit malformed -- but it doesn't affect anything.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crimestat3000-0.1.8.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crimestat3000-0.1.8-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file crimestat3000-0.1.8.tar.gz.

File metadata

  • Download URL: crimestat3000-0.1.8.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for crimestat3000-0.1.8.tar.gz
Algorithm Hash digest
SHA256 23d9814211d6223493123e4f82da713b0bf8f11c01c0875f2b1fe93ddb2d0abf
MD5 8689c652ea81b1918cdf14e7ea9d2d59
BLAKE2b-256 2b27f98f4187d7ec8e3e68c04929e127c40ed5936838935e693aa29c09c58a1a

See more details on using hashes here.

File details

Details for the file crimestat3000-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: crimestat3000-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for crimestat3000-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 cbb460128a73dc55129259b128c32100fb7515a7b0e848c8b1362c6b2cdd1867
MD5 5f9854371bee58474abb548aaef89100
BLAKE2b-256 0ebe9bfaee04e3a7f1ba10b1b41a5a1ffc02a61e9a6696efda432dbe8db8983e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page