Parser for CV in russian language. Supported formats: pdf, txt, docx
Project description
General information
Parser extracts as many as possible information from the CV text. It uses natasha
library for
entities recognition and yargy
parser for rule-based parsing.
Information that can be extracted:
- socdem:
- name,
- gender,
- date_of_birth,
- age,
- location
- career:
- period,
- org_name,
- occupation
- education:
- year,
- name,
- specialization
- hobby:
- name
Installation
pip install russianCVparser
Usage
Parser supports documents in docx, pdf and txt formats.
from russianCVparser import CVparser, Document, show_json
parser = CVparser()
document = Document('path/to/doc.pdf')
data = parser.parse_text(document.text) # returns an OrderedDict instance
show_json(data)
Example
from russianCVparser import CVparser, Document, show_json
parser = CVparser()
document = Document('CV.pdf')
data = parser.parse_text(document.text)
show_json(data)
Output:
{
"socdem": {
"name": "Иванов Иван Иванович",
"gender": "male",
"date_of_birth": {
"year": 1981,
"month": 5,
"day": 2
},
"age": "39 лет",
"location": {
"name": "Казань"
}
},
"career": [
{
"period": {
"from_date": {
"month": 12,
"year": 2017
}
},
"org_name": "ООО "Инвест-консалт"",
"occupation": "Ведущий специалист"
},
{
"period": {
"from_date": {
"month": 2,
"year": 2011
},
"to_date": {
"month": 6,
"year": 2017
}
},
"org_name": "Казгорсеть",
"occupation": "Ведущий специалист"
},
{
"period": {
"from_date": {
"month": 2,
"year": 2010
},
"to_date": {
"month": 2,
"year": 2011
}
},
"org_name": "ООО Адванс",
"occupation": "Аналитик"
}
],
"education": [
{
"year": 2015,
"name": "Российский государственный аграрный университет"
},
{
"year": 2016,
"name": "Московский Государственный Технический Университет"
}
],
"hobby": [
{
"name": [
"футбол",
"рыбалка",
"шахматы"
]
}
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
russianCVparser-1.1.tar.gz
(101.7 kB
view hashes)
Built Distribution
russianCVparser-1.1-py3-none-any.whl
(107.7 kB
view hashes)
Close
Hashes for russianCVparser-1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58b1f586c87eece1e23f1ca3b6d7fd397d06cfb4627a6526ec0f99baee7e6bed |
|
MD5 | b5c73dff53680694c814ced24ad54272 |
|
BLAKE2b-256 | de8bee321d32f086b336c42a9cf6f869fe892bf2f6596b3cd4f30be88bba252a |