Skip to main content

Pure-Python converter for legacy Nesstar survey files

Project description

Nesstar Converter

Python 3.10+ License: MIT CI

Pure-Python conversion for legacy Nesstar survey files - no NesstarExporter.exe, no Windows-only GUI, no dependency on discontinued desktop tooling.

Nesstar was once a common dissemination format across social-science archives, national data services, and statistical agencies worldwide. The legacy ecosystem persists, but the original tooling is fragmented: many servers are gone, much documentation is outdated, and the surviving migration tools still depend on a proprietary Windows executable.

nesstar-converter takes the opposite approach. It reverse-engineers the binary format directly in Python and writes open outputs such as Parquet, CSV, Excel, Stata, and JSON on Linux, macOS, and Windows.

This project started with India's MoSPI survey archives, but the underlying problem is global. The wider Nesstar ecosystem touched the UK Data Archive, the European Social Survey, Statistics Canada / ODESI, GESIS, SSJDA in Japan, and the IHSN / World Bank metadata workflow. See docs/global-coverage.md for the evidence-backed map.


Why this exists

  • Zero .exe dependency - no NesstarExporter.exe, no batch wrappers, no Wine
  • Cross-platform - works anywhere Python 3.10+ works
  • Reverse-engineered binary parser - reads .Nesstar files directly
  • Open output formats - Parquet, CSV, TSV, Excel, Stata, JSON, JSONL, fixed-width text
  • Validation-first - compares converted output against official Nesstar Explorer exports
  • Non-technical friendly - one CLI, clear commands, sensible defaults

nesstar-converter vs ihsn/nesstar-exporter

The IHSN tool is useful if you already have the official Windows exporter binary and want to automate that workflow. It is not a replacement for the binary itself.

Dimension ihsn/nesstar-exporter nesstar-converter
Core approach Python wrapper around NesstarExporter.exe Pure-Python binary parser
Requires NesstarExporter.exe Yes No
OS model Windows-oriented workflow Linux / macOS / Windows
Reads binary directly No Yes
Reverse-engineered format support No Yes
Parquet output No Yes
RDF / DDI export via official tool Yes No
Validation against text exports No built-in validation layer Yes
Install model Repo scripts + external exe path Standard Python package / console script

Evidence: the IHSN repo's own README, config.json, src/config.py, and src/exporter.py all require a path to NesstarExporter.exe and shell out to it with subprocess.run(...).


Who uses Nesstar?

Nesstar was not just an India-specific format. It was part of a broader international archive ecosystem.

Institution / repository Country / region What we verified
NSD / Sikt Norway Original Nesstar developer and ESS host
UK Data Archive / UK Data Service United Kingdom Co-developer and former Nesstar WebView operator
European Social Survey Pan-European Disseminated through Nesstar from 2004
Statistics Canada / ODESI Canada Licensed the full Nesstar suite; former WebView instance
GESIS ZACAT Germany Former Nesstar WebView catalog
Sciences Po / CDSP France Publicly documented migration away from Nesstar
SSJDA / CSRDA Japan Publicly documented Nesstar deployment
IHSN / World Bank ecosystem Global Still distributes Nesstar Publisher and maintains migration tooling
India MoSPI / NSO India Active distributor of .Nesstar survey files
DataFirst / Stats SA South Africa Important related archive / testing target, but evidence is legacy or mixed

For the full institution table, confidence levels, and source links, see docs/global-coverage.md.


Supported formats

Format Extension Best for
parquet .parquet Analytics, DuckDB, pandas, R, long-term storage
csv .csv Universal spreadsheet compatibility
tsv .tsv Tab-separated workflows and legacy survey tooling
excel .xlsx Non-technical users
stata .dta Stata users, with leading zeros preserved
json .json Web apps and structured interchange
jsonl .jsonl Streaming and line-oriented pipelines
fwf .txt Fixed-width text output

Quick start

Install from source

git clone https://github.com/abhinavjnu/nesstar-converter.git
cd nesstar-converter
python -m pip install -e ".[dev]"

Inspect a file

nesstar-converter info path/to/file.Nesstar path/to/ddi.xml

Convert to open formats

nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv,parquet,stata

Validate against official text exports

nesstar-converter validate ./output ./exported_text

If the companion ddi.xml sits beside the .Nesstar file, you can omit it and the tool will auto-detect it.


Validation and coverage

This repository distinguishes between:

  1. Cell-level validation - converted output matched official Nesstar Explorer exports row-for-row and value-for-value.
  2. Structure-level verification - official export files matched published file counts and variable counts, but the raw package lacked the companion DDI XML required for full binary re-validation.
Survey / corpus Years / rounds Verification level Result
EUS 38th Round (1983) Cell-level 9/9 blocks, 3.4M rows, zero mismatches against official exports
HCES 38th (1983), 45th (1989-90), 66th (2009-10) Cell-level 27/28 blocks, 23.4M+ rows, zero mismatches for blocks present in DDI
PLFS 2017-18 to 2022-23 Structure-level 24/24 official export files matched NADA data-dictionary row/column counts; one 2017-18 revisit export includes a trailing blank tab column

PLFS note: the local PLFS raw ZIPs contain .Nesstar files, official text exports, and the legacy Nesstar Explorer installer, but not the companion DDI XML needed for full binary decoding in the current open parser. That means PLFS is confirmed as a real Nesstar distribution corpus, but its current evidence in this repo is structural rather than full cell-level re-validation.


For non-technical users

If your goal is simply "turn this old survey file into something Excel can open", the shortest path is:

git clone https://github.com/abhinavjnu/nesstar-converter.git
cd nesstar-converter
python -m pip install -e .
nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv

Then open the generated .csv files in Excel, LibreOffice, Google Sheets, Stata, R, or Python.

If you are unsure which format to choose:

You want to... Use
Open the data in Excel csv
Work in Stata stata
Analyze in Python / R / DuckDB parquet
Preserve a text-like interchange format tsv or fwf

Python API

from nesstar_converter import convert_nesstar, show_info

show_info("survey.Nesstar", "ddi.xml")

report = convert_nesstar(
    "survey.Nesstar",
    "ddi.xml",
    "./output",
    formats=["csv", "parquet"],
    year="2022-23",
)

Key functions:

Function Purpose
convert_nesstar(...) Convert one .Nesstar file to one or more formats
parse_ddi(...) Parse DDI XML block and variable metadata
show_info(...) Inspect a file before conversion
validate_against_export(...) Compare converted output to official text exports
batch_convert(...) Convert a survey corpus in batch mode

Limitations

  • Full decoding currently expects DDI metadata. If a distributor ships only the .Nesstar binary and omits the companion DDI XML, the current parser cannot yet do full open extraction on its own.
  • This is a data-conversion tool, not an RDF packager. If your goal is specifically DDI / RDF export via the official legacy toolchain, the IHSN wrapper may still be useful - but it still requires NesstarExporter.exe.
  • Legacy ecosystems are inconsistent. Different institutions used different Nesstar-era conventions, so community test cases from outside India are especially valuable.

Documentation


Testing

python -m pip install -e ".[dev]"
pytest tests/ -v

CI runs unit tests on Python 3.10-3.13 and checks formatting with Ruff.


Contributing

Good contributions for this project:

  • Test the converter on non-MoSPI Nesstar files
  • Report datasets that still circulate as .Nesstar / .NSDstat
  • Share evidence of legacy Nesstar repositories or migrations
  • Improve metadata recovery for archives that omit ddi.xml

Community testing requests are tracked in the issue tracker, including:

  • Stats SA GHS
  • UK Data Archive legacy Nesstar packages
  • World Bank / IHSN LSMS-style Nesstar corpora

Citation

If you use this tool in research, please cite it using CITATION.cff.


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nesstar_converter-1.0.1.tar.gz (42.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nesstar_converter-1.0.1-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file nesstar_converter-1.0.1.tar.gz.

File metadata

  • Download URL: nesstar_converter-1.0.1.tar.gz
  • Upload date:
  • Size: 42.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for nesstar_converter-1.0.1.tar.gz
Algorithm Hash digest
SHA256 055df96887ad9e67d9089e823abb444ea118486c0a782393beb09c022c93e06d
MD5 f61efae57e510051a4ed6385844bdc32
BLAKE2b-256 d05717ad716fff58e8518362d9ca4c5d17b41f528a8bbc64c4b5198b1506ad50

See more details on using hashes here.

File details

Details for the file nesstar_converter-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for nesstar_converter-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b67e3233aa3cf8e8dedb45e38d72a5b9bfd5ba158987b6de127fe9948b1a3259
MD5 5df8406359380728619243d92fcabfc7
BLAKE2b-256 e5f5a4da49817aba083357784b2b3d85a0b299607710fe8400f7384e7bbccca9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page