Парсер данных NSPD.GOV.RU/ЕГРН для работы с кадастровыми номерами

Project description

NSPD Data Parser

Русская версия

Описание

Этот скрипт предназначен для парсинга данных ЕГРН по кадастровым номерам. На вход подается JSON-файл, содержащий список кадастровых номеров, а на выходе можно получить данные в формате XLSX или JSON.

Требования

Python 3.6+
Selenium
```
pip install selenium
```
OpenPyXL
```
pip install openpyxl
```
ChromeDriver для Google Chrome
Убедитесь, что версия ChromeDriver соответствует установленной версии браузера и он находится в PATH.

Структура файлов

bot.py: Содержит функцию nspd_bot, которая открывает сайт, осуществляет необходимые действия и возвращает текстовое содержимое страницы.
config.py: Файл конфигурации, содержащий, например, переменную fields — список ключевых полей для парсинга.
script.py: Основной скрипт, который осуществляет чтение входных данных, обработку и сохранение результатов.

Формат входного файла

Входной файл должен быть в формате JSON и содержать список кадастровых номеров. Пример файла cad_numbers.json:

[
  "31:05:1901001:831",
  "31:05:1901001:832",
  "31:05:1901001:978",
  "31:05:1901001:982",
  "31:05:1901001:985"
]

Аргументы командной строки

input_file: Путь к входному файлу JSON с кадастровыми номерами.
-o, --output: Путь к выходному файлу (по умолчанию: земельный_участок.xlsx).
-f, --format: Выходной формат данных. Допустимые значения:
- xlsx – сохранение в Excel (по умолчанию)
- json – сохранение в JSON

Примеры использования

Сохранение в Excel:

python script.py cad_numbers.json -o result.xlsx -f xlsx

Сохранение в JSON:

python script.py cad_numbers.json -o result.json -f json

Как работает скрипт

Чтение входного файла: Скрипт считывает список кадастровых номеров из указанного JSON-файла.
Обработка данных: Для каждого кадастрового номера:
- Инициализируется Selenium WebDriver.
- Вызывается функция nspd_bot, которая взаимодействует с сайтом и возвращает текстовую информацию.
- Полученный текст парсится функцией parse_egrn_data, которая извлекает необходимые поля.
Сохранение результатов: Собранные данные сохраняются в указанный файл в формате XLSX или JSON.

Примечания

Для работы в headless-режиме можно модифицировать инициализацию драйвера в скрипте, добавив соответствующие опции.
Убедитесь, что все зависимости установлены, а ChromeDriver настроен корректно.

English Version

Description

This script is designed for parsing EGRN data by cadastral numbers. It accepts a JSON file containing a list of cadastral numbers as input, and outputs data in either XLSX or JSON format.

Requirements

Python 3.6+
Selenium
```
pip install selenium
```
OpenPyXL
```
pip install openpyxl
```
ChromeDriver for Google Chrome
Ensure that the ChromeDriver version matches your installed browser version and is in the PATH.

File Structure

bot.py: Contains the nspd_bot function which opens the website, performs necessary actions, and returns the page's text content.
config.py: Configuration file that may include variables like fields – a list of key fields for parsing.
script.py: Main script that reads the input data, processes it, and saves the results.

Input File Format

The input file should be in JSON format and contain a list of cadastral numbers. Example cad_numbers.json:

[
  "12:34:5678901:234",
  "56:78:9012345:678",
  "90:12:3456789:012",
  "34:56:7890123:456",
  "78:90:1234567:890"
]

Command-Line Arguments

input_file: Path to the input JSON file with cadastral numbers.
-o, --output: Path to the output file (default: земельный_участок.xlsx).
-f, --format: Output data format. Allowed values:
- xlsx – save as Excel (default)
- json – save as JSON

Usage Examples

Saving as Excel:

python script.py cad_numbers.json -o result.xlsx -f xlsx

Saving as JSON:

python script.py cad_numbers.json -o result.json -f json

How the Script Works

Reading the Input File: The script reads a list of cadastral numbers from the specified JSON file.
Data Processing: For each cadastral number:
- A Selenium WebDriver is initialized.
- The nspd_bot function is called to interact with the website and retrieve textual information.
- The retrieved text is parsed by the parse_egrn_data function to extract the necessary fields.
Saving the Results: The collected data is saved to the specified file in either XLSX or JSON format.

Notes

For headless mode, you can modify the WebDriver initialization in the script by adding the appropriate options.
Ensure that all dependencies are installed and that ChromeDriver is properly configured.

About the Author / Об авторе

I'm Glebushnik
I am the author of this project. Feel free to check out my GitHub repository for more information.

Project details

Release history Release notifications | RSS feed

0.1.3

Feb 15, 2025

This version

0.1.2

Feb 15, 2025

0.1.1

Feb 15, 2025

0.1.0

Feb 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nspd_parser-0.1.2.tar.gz (6.1 kB view details)

Uploaded Feb 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

nspd_parser-0.1.2-py3-none-any.whl (6.8 kB view details)

Uploaded Feb 15, 2025 Python 3

File details

Details for the file nspd_parser-0.1.2.tar.gz.

File metadata

Download URL: nspd_parser-0.1.2.tar.gz
Upload date: Feb 15, 2025
Size: 6.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for nspd_parser-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`db0989b77bc449249c9cb36a7504f2e78d2132e3eb9105b5d28b070fcb296ec2`
MD5	`9e51d70e6368150cf5b3efb60747a930`
BLAKE2b-256	`4d84931402635584760db8dcf6480dd313856d221a91f794310555220bdeb024`

See more details on using hashes here.

File details

Details for the file nspd_parser-0.1.2-py3-none-any.whl.

File metadata

Download URL: nspd_parser-0.1.2-py3-none-any.whl
Upload date: Feb 15, 2025
Size: 6.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.0

File hashes

Hashes for nspd_parser-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ff550359d54b3daad45885960eecec6fbeaf4aadc4851063d661b8b6ef1cadbc`
MD5	`ffe960c01cd7a0c7e529743a56385a4d`
BLAKE2b-256	`51157d94d7ab943ed207ac7af9f836700918284df808c950f68bf183a316ed57`

See more details on using hashes here.

nspd-parser 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

NSPD Data Parser

Русская версия

Описание

Требования

Структура файлов

Формат входного файла

Аргументы командной строки

Примеры использования

Как работает скрипт

Примечания

English Version

Description

Requirements

File Structure

Input File Format

Command-Line Arguments

Usage Examples

How the Script Works

Notes

About the Author / Об авторе

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes