A tool for parsing crime statistics reports (form 4-ЕГС) from crimestat.ru.
Project description
crimestat3000
A tool for automated parsing of Russian crime statistics reports (form 4-ЕГС) from crimestat.ru. All you need to know is which section of report you need, which sheets and columns. (Beware: these tend to change over the years so make sure to check for that and if needed separate you parsing process into several parts with different configurations.)
There's no need to download files manually -- crimestat3000 will take care of that without generating temporary files. But if you happen to have the files locally pass the path to their location to local_dir argument to slightly increase processing speed.
A 4-ЕГС report shows cumulative sums since the beginning of the year. By default crimestat3000 turns them into monthly values -- one can switch it off by setting cumsum argument to True.
You can also optionally specify the level of detail you need. Some sheets contain information on a previously mentioned article's specific part or paragraph -- you can drop those or keep those or just start by parsing all the sheets there are to decide knowingly later.
Finally you can set shorten_descr argument to True to turn column names like Строка 12: умышленное причинение легкого вреда здоровью, совершенное по мотивам политической, идеологической, расовой, национальной или религиозной ненависти или вражды либо по мотивам ненависти или вражды в отношении какой-либо социальной группы п. «б» ч. 2 ст. 115 УК РФ to 115_ч2_б. It is neat -- but keep in mind that you should use shortener only if you are interested just in the sheets dedicated to some specific article or an article's part/paragraph. If no article is mentioned the shortner will return the sheets number with "no articles mentioned" comment instead of a proper column name: e.g. Строка 3: небольшой и средней тяжести turns into Строка 3: no articles mentioned.
To install crimestat3000 use pip:
pip install crimestat3000
Here's an example call:
import crimestat3000 as cs
kwargs = {
'first_month': '01-2016',
'last_month' : '12-2016',
'section' : 2,
# optional arguments defaults
# ================== ========
# 'sheets' : {'all', a list of sheets} # 'all'
# 'keep' : {'all', 'articles', 'articles+'} # 'all';
# 'article+' will get you all
# the sheet's with anything specific
# mentioned: article, its part or
# paragraph;
# choose 'articles' if you need
# ONLY articles sheets,
# WITH NO subdifferentiation.
'columns' : ['C', 'E'], # 'C' -- usually the sheet's total.
# Include only the value columns
# in your list -- regions column
# is always included automaticly.
'shorten_descr': True # False
# 'local_dir' : {None, path to a local directory} # None
# 'cumsum' : {True, False} # False
}
table_2016 = cs.parse.period(**kwargs)
(OLE2 inconsistency warnings may pop up sometimes -- don't worry about that: it happens while reading an .xls file content into pandas because some of the files at crimestat.ru are a little bit malformed -- but it doesn't affect anything.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crimestat3000-0.1.6.tar.gz.
File metadata
- Download URL: crimestat3000-0.1.6.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
315d6c781e40a47e19386172ee678042d24fff7d74f1b1a835e1aaa7c14ce9dc
|
|
| MD5 |
38253e2defa054ae97924b91e60b5752
|
|
| BLAKE2b-256 |
5fef997fb0b83fd014922cff36f18839746e26ded5e50d4a2717f28b4c22dcc6
|
File details
Details for the file crimestat3000-0.1.6-py3-none-any.whl.
File metadata
- Download URL: crimestat3000-0.1.6-py3-none-any.whl
- Upload date:
- Size: 18.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0da47f8d9857e78d57aee26d11b887d60c91ac873eabdf11f1bd5315f3ff5b21
|
|
| MD5 |
270aee1222f945d170708e44971c79fa
|
|
| BLAKE2b-256 |
7a276ff6b71d99022362bc380736abe0b86cd8bd887e1789c9a887d29703ca4d
|