Pure-Python converter for legacy Nesstar survey files
Project description
Nesstar Converter
Pure-Python conversion for legacy Nesstar survey files - no NesstarExporter.exe, no Windows-only GUI, no dependency on discontinued desktop tooling.
Nesstar was once a common dissemination format across social-science archives, national data services, and statistical agencies worldwide. The legacy ecosystem persists, but the original tooling is fragmented: many servers are gone, much documentation is outdated, and the surviving migration tools still depend on a proprietary Windows executable.
nesstar-converter takes the opposite approach. It reverse-engineers the binary format directly in Python and writes open outputs such as Parquet, CSV, Excel, Stata, and JSON on Linux, macOS, and Windows.
This project started with India's MoSPI survey archives, but the underlying problem is global. The wider Nesstar ecosystem touched the UK Data Archive, the European Social Survey, Statistics Canada / ODESI, GESIS, SSJDA in Japan, and the IHSN / World Bank metadata workflow. See docs/global-coverage.md for the evidence-backed map.
Why this exists
- Zero
.exedependency - noNesstarExporter.exe, no batch wrappers, no Wine - Cross-platform - works anywhere Python 3.10+ works
- Reverse-engineered binary parser - reads
.Nesstarfiles directly - Open output formats - Parquet, CSV, TSV, Excel, Stata, JSON, JSONL, fixed-width text
- Validation-first - compares converted output against official Nesstar Explorer exports
- Non-technical friendly - one CLI, clear commands, sensible defaults
nesstar-converter vs ihsn/nesstar-exporter
The IHSN tool is useful if you already have the official Windows exporter binary and want to automate that workflow. It is not a replacement for the binary itself.
| Dimension | ihsn/nesstar-exporter |
nesstar-converter |
|---|---|---|
| Core approach | Python wrapper around NesstarExporter.exe |
Pure-Python binary parser |
Requires NesstarExporter.exe |
Yes | No |
| OS model | Windows-oriented workflow | Linux / macOS / Windows |
| Reads binary directly | No | Yes |
| Reverse-engineered format support | No | Yes |
| Parquet output | No | Yes |
| RDF / DDI export via official tool | Yes | No |
| Validation against text exports | No built-in validation layer | Yes |
| Install model | Repo scripts + external exe path | Standard Python package / console script |
Evidence: the IHSN repo's own README, config.json, src/config.py, and src/exporter.py all require a path to NesstarExporter.exe and shell out to it with subprocess.run(...).
Who uses Nesstar?
Nesstar was not just an India-specific format. It was part of a broader international archive ecosystem.
| Institution / repository | Country / region | What we verified |
|---|---|---|
| NSD / Sikt | Norway | Original Nesstar developer and ESS host |
| UK Data Archive / UK Data Service | United Kingdom | Co-developer and former Nesstar WebView operator |
| European Social Survey | Pan-European | Disseminated through Nesstar from 2004 |
| Statistics Canada / ODESI | Canada | Licensed the full Nesstar suite; former WebView instance |
| GESIS ZACAT | Germany | Former Nesstar WebView catalog |
| Sciences Po / CDSP | France | Publicly documented migration away from Nesstar |
| SSJDA / CSRDA | Japan | Publicly documented Nesstar deployment |
| IHSN / World Bank ecosystem | Global | Still distributes Nesstar Publisher and maintains migration tooling |
| India MoSPI / NSO | India | Active distributor of .Nesstar survey files |
| DataFirst / Stats SA | South Africa | Important related archive / testing target, but evidence is legacy or mixed |
For the full institution table, confidence levels, and source links, see docs/global-coverage.md.
Supported formats
| Format | Extension | Best for |
|---|---|---|
parquet |
.parquet |
Analytics, DuckDB, pandas, R, long-term storage |
csv |
.csv |
Universal spreadsheet compatibility |
tsv |
.tsv |
Tab-separated workflows and legacy survey tooling |
excel |
.xlsx |
Non-technical users |
stata |
.dta |
Stata users, with leading zeros preserved |
json |
.json |
Web apps and structured interchange |
jsonl |
.jsonl |
Streaming and line-oriented pipelines |
fwf |
.txt |
Fixed-width text output |
Quick start
Install from source
git clone https://github.com/abhinavjnu/nesstar-converter.git
cd nesstar-converter
python -m pip install -e ".[dev]"
Inspect a file
nesstar-converter info path/to/file.Nesstar path/to/ddi.xml
Convert to open formats
nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv,parquet,stata
Validate against official text exports
nesstar-converter validate ./output ./exported_text
If the companion ddi.xml sits beside the .Nesstar file, you can omit it and the tool will auto-detect it.
Validation and coverage
This repository distinguishes between:
- Cell-level validation - converted output matched official Nesstar Explorer exports row-for-row and value-for-value.
- Structure-level verification - official export files matched published file counts and variable counts, but the raw package lacked the companion DDI XML required for full binary re-validation.
| Survey / corpus | Years / rounds | Verification level | Result |
|---|---|---|---|
| EUS | 38th Round (1983) | Cell-level | 9/9 blocks, 3.4M rows, zero mismatches against official exports |
| HCES | 38th (1983), 45th (1989-90), 66th (2009-10) | Cell-level | 27/28 blocks, 23.4M+ rows, zero mismatches for blocks present in DDI |
| PLFS | 2017-18 to 2022-23 | Structure-level | 24/24 official export files matched NADA data-dictionary row/column counts; one 2017-18 revisit export includes a trailing blank tab column |
PLFS note: the local PLFS raw ZIPs contain .Nesstar files, official text exports, and the legacy Nesstar Explorer installer, but not the companion DDI XML needed for full binary decoding in the current open parser. That means PLFS is confirmed as a real Nesstar distribution corpus, but its current evidence in this repo is structural rather than full cell-level re-validation.
For non-technical users
If your goal is simply "turn this old survey file into something Excel can open", the shortest path is:
git clone https://github.com/abhinavjnu/nesstar-converter.git
cd nesstar-converter
python -m pip install -e .
nesstar-converter convert path/to/file.Nesstar path/to/ddi.xml ./output --formats csv
Then open the generated .csv files in Excel, LibreOffice, Google Sheets, Stata, R, or Python.
If you are unsure which format to choose:
| You want to... | Use |
|---|---|
| Open the data in Excel | csv |
| Work in Stata | stata |
| Analyze in Python / R / DuckDB | parquet |
| Preserve a text-like interchange format | tsv or fwf |
Python API
from nesstar_converter import convert_nesstar, show_info
show_info("survey.Nesstar", "ddi.xml")
report = convert_nesstar(
"survey.Nesstar",
"ddi.xml",
"./output",
formats=["csv", "parquet"],
year="2022-23",
)
Key functions:
| Function | Purpose |
|---|---|
convert_nesstar(...) |
Convert one .Nesstar file to one or more formats |
parse_ddi(...) |
Parse DDI XML block and variable metadata |
show_info(...) |
Inspect a file before conversion |
validate_against_export(...) |
Compare converted output to official text exports |
batch_convert(...) |
Convert a survey corpus in batch mode |
Limitations
- Full decoding currently expects DDI metadata. If a distributor ships only the
.Nesstarbinary and omits the companion DDI XML, the current parser cannot yet do full open extraction on its own. - This is a data-conversion tool, not an RDF packager. If your goal is specifically DDI / RDF export via the official legacy toolchain, the IHSN wrapper may still be useful - but it still requires
NesstarExporter.exe. - Legacy ecosystems are inconsistent. Different institutions used different Nesstar-era conventions, so community test cases from outside India are especially valuable.
Documentation
docs/TECHNICAL.md- binary format notes and implementation detailsdocs/global-coverage.md- institutions, countries, archives, and source links
Testing
python -m pip install -e ".[dev]"
pytest tests/ -v
CI runs unit tests on Python 3.10-3.13 and checks formatting with Ruff.
Contributing
Good contributions for this project:
- Test the converter on non-MoSPI Nesstar files
- Report datasets that still circulate as
.Nesstar/.NSDstat - Share evidence of legacy Nesstar repositories or migrations
- Improve metadata recovery for archives that omit
ddi.xml
Community testing requests are tracked in the issue tracker, including:
- Stats SA GHS
- UK Data Archive legacy Nesstar packages
- World Bank / IHSN LSMS-style Nesstar corpora
Citation
If you use this tool in research, please cite it using CITATION.cff.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file nesstar_converter-1.0.1.tar.gz.
File metadata
- Download URL: nesstar_converter-1.0.1.tar.gz
- Upload date:
- Size: 42.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
055df96887ad9e67d9089e823abb444ea118486c0a782393beb09c022c93e06d
|
|
| MD5 |
f61efae57e510051a4ed6385844bdc32
|
|
| BLAKE2b-256 |
d05717ad716fff58e8518362d9ca4c5d17b41f528a8bbc64c4b5198b1506ad50
|
File details
Details for the file nesstar_converter-1.0.1-py3-none-any.whl.
File metadata
- Download URL: nesstar_converter-1.0.1-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b67e3233aa3cf8e8dedb45e38d72a5b9bfd5ba158987b6de127fe9948b1a3259
|
|
| MD5 |
5df8406359380728619243d92fcabfc7
|
|
| BLAKE2b-256 |
e5f5a4da49817aba083357784b2b3d85a0b299607710fe8400f7384e7bbccca9
|