Parser for CV in russian language. Supported formats: pdf, txt, docx
Project description
General information
Parser extracts as many as possible information from the CV text. It uses natasha
library for
entities recognition and yargy
parser for rule-based parsing.
Information that can be extracted:
- socdem:
- name,
- gender,
- date_of_birth,
- age,
- location
- career:
- period,
- org_name,
- occupation
- education:
- year,
- name,
- specialization
- hobby:
- name
Installation
pip install russianCVparser
Usage
Parser supports documents in docx, pdf and txt formats.
from russianCVparser import CVparser, Document, show_json
parser = CVparser()
document = Document('path/to/doc.pdf')
data = parser.parse_text(document.text) # returns an OrderedDict instance
show_json(data)
Example
from russianCVparser import CVparser, Document, show_json
parser = CVparser()
document = Document('CV.pdf')
data = parser.parse_text(document.text)
show_json(data)
Output:
{
"socdem": {
"name": "Иванов Иван Иванович",
"gender": "male",
"date_of_birth": {
"year": 1981,
"month": 5,
"day": 2
},
"age": "39 лет",
"location": {
"name": "Казань"
}
},
"career": [
{
"period": {
"from_date": {
"month": 12,
"year": 2017
}
},
"org_name": "ООО "Инвест-консалт"",
"occupation": "Ведущий специалист"
},
{
"period": {
"from_date": {
"month": 2,
"year": 2011
},
"to_date": {
"month": 6,
"year": 2017
}
},
"org_name": "Казгорсеть",
"occupation": "Ведущий специалист"
},
{
"period": {
"from_date": {
"month": 2,
"year": 2010
},
"to_date": {
"month": 2,
"year": 2011
}
},
"org_name": "ООО Адванс",
"occupation": "Аналитик"
}
],
"education": [
{
"year": 2015,
"name": "Российский государственный аграрный университет"
},
{
"year": 2016,
"name": "Московский Государственный Технический Университет"
}
],
"hobby": [
{
"name": [
"футбол",
"рыбалка",
"шахматы"
]
}
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
russianCVparser-1.0.tar.gz
(102.0 kB
view hashes)
Built Distribution
russianCVparser-1.0-py3-none-any.whl
(107.1 kB
view hashes)
Close
Hashes for russianCVparser-1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da78f5ec1a0021fa377a822148d6e666969ac7e67fe1a5f114f966fa35488736 |
|
MD5 | 9e1a465cdd1bb1f2aeb62cf3c1bdb1c7 |
|
BLAKE2b-256 | d1e5c8eafc808c0e789acdc6f4b32856a814663dc29f407abfbe7f9075db393e |