Synthetic multilingual accommodation review data generator for Hack4Her travel-safety prototypes.

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

behrouznl

These details have not been verified by PyPI

Project links

Homepage

Project description

Hack4Her Mock Accommodation Reviews

This repo contains a dependency-free Python generator for synthetic Booking.com-style accommodation reviews for the Hack4Her challenge theme: women's safety while travelling.

The generated data is mock data only. Reviews, properties, labels, and coordinates are synthetic and must not be interpreted as real Booking.com customer reviews or real safety ratings for any location.

Generated Files

The default 1k balanced dataset has already been generated:

data/mock_reviews_balanced_1000.csv
data/mock_reviews_balanced_1000.jsonl
data/mock_reviews_balanced_1000.summary.json
data/mock_review_source_context_pool_10000.csv
data/mock_review_source_context_pool_10000.jsonl

Additional 1k scenario datasets are available in:

data/scenarios/
data/random/

Larger 10k scenario datasets are available in:

data/scenarios_10k/
data/random_10k/

Pre-generated participant-ready starter packs are available in:

data/starter_1000/
data/starter_10000/

New generated outputs default to data_output_generated/, which is ignored by git.

The dataset includes multilingual reviews in English, Spanish, French, German, Dutch, Italian, Portuguese, and Arabic.

Run

For detailed usage, see docs/USAGE.md. For PyPI publishing, see docs/PUBLISHING.md.

Installable Package

After the package is published to PyPI:

python -m pip install hack4her-review-data
hack4her-data --starter-pack --records 1000

For the Rich/Typer visual terminal:

python -m pip install "hack4her-review-data[cli]"
hack4her-data-cli

Participant Start Point

Use starter packs when teams need data to begin building without seeing organizer labels.

Fancy terminal UI:

macOS/Linux:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements-cli.txt
python3 scripts/hack4her_cli.py

Windows PowerShell:

py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements-cli.txt
python scripts\hack4her_cli.py

The fancy CLI opens a Booking.com Hack4Her branded terminal menu where teams select the dataset type, record count, output format, and output folder. Visual menu outputs automatically hide organizer/evaluation labels in the main dataset and create a separate 10% labeled golden sample for validation or scoring. It uses a cross-platform Rich/Typer interface with an animated Booking.com header, smaller Hack4Her text in pink, scenario safety-mix previews, output-folder checks, generation-plan panels, written-file summaries, and animated progress bars. The dependency-free script below remains available for teams that only want Python standard library commands.

Direct fancy CLI commands also work:

python3 scripts/hack4her_cli.py menu
python3 scripts/hack4her_cli.py doctor
python3 scripts/hack4her_cli.py starter --records 1000
python3 scripts/hack4her_cli.py scenarios

Generate participant-ready CSV files for all deterministic scenarios:

python3 scripts/generate_mock_reviews.py --starter-pack --records 1000

Choose any size from 1000 to 10000 in steps of 1000:

python3 scripts/generate_mock_reviews.py --starter-pack --records 5000
python3 scripts/generate_mock_reviews.py --starter-pack --records 10000

Starter packs default to:

data_output_generated/

Each starter pack contains one public CSV per scenario, one 10% labeled golden CSV per scenario, summaries, and a small README explaining how to choose a dataset.

Generate the default deterministic 1k balanced dataset:

python3 scripts/generate_mock_reviews.py

Generate a specific scenario:

python3 scripts/generate_mock_reviews.py --records 1000 --scenario safety_heavy --output-dir data_output_generated
python3 scripts/generate_mock_reviews.py --records 1000 --scenario location_focus --output-dir data_output_generated
python3 scripts/generate_mock_reviews.py --records 1000 --scenario host_focus --output-dir data_output_generated
python3 scripts/generate_mock_reviews.py --records 1000 --scenario stay_focus --output-dir data_output_generated
python3 scripts/generate_mock_reviews.py --records 1000 --scenario mostly_positive --output-dir data_output_generated

Generate all deterministic scenarios:

python3 scripts/generate_mock_reviews.py --all-scenarios --records 1000 --output-dir data_output_generated

Generate all deterministic 10k scenarios:

python3 scripts/generate_mock_reviews.py --all-scenarios --records 10000 --output-dir data_output_generated

Generate a deliberately random set. This changes on each run unless --seed is provided:

python3 scripts/generate_mock_reviews.py --scenario random --records 1000 --output-dir data_output_generated

Generate a 10k random set:

python3 scripts/generate_mock_reviews.py --scenario random --records 10000 --output-dir data_output_generated

Generate a participant-facing version without helper labels:

python3 scripts/generate_mock_reviews.py --records 1000 --scenario balanced --public --format csv --output-dir data_output_generated

With --public, the full main dataset hides organizer labels and the script also writes a _golden_10pct.csv file with labels for 10% of rows.

Reproducibility

Deterministic scenarios use a stable 10k source context pool and a default seed of 20260522, so everyone running the same command gets the same records. The normal record choices are 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000, and 10000. The random scenario intentionally uses a fresh random seed unless you pass --seed.

Use --write-source-pool to write the synthetic 10k source context pool:

python3 scripts/generate_mock_reviews.py --records 1000 --scenario balanced --write-source-pool --output-dir data_output_generated

Scenarios

balanced: mixed travel reviews with a visible safety signal.
safety_heavy: many safety-related reviews across location, host, and stay.
location_focus: safety around neighborhood, route, entrance, or transit.
host_focus: host conduct, check-in conduct, and support response.
stay_focus: room, lock, access, privacy, and on-property safety concerns.
mostly_positive: mostly normal or positive reviews with sparse safety concerns.
random: non-deterministic topic mix for surprise testing.

Useful Columns

review_text, review_title, language, rating: primary participant-facing review fields.
city, country, latitude, longitude, area_type: useful for map prototypes.
is_safety_related, safety_category, safety_concern_level, safety_signal: helper labels for testing or evaluation.
topic, sentiment, labels: additional organizer-facing metadata.

Project details

These details have been verified by PyPI

Project links

Repository

GitHub Statistics

Maintainers

behrouznl

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.2

May 22, 2026

This version

0.1.0

May 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hack4her_review_data-0.1.0.tar.gz (48.1 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hack4her_review_data-0.1.0-py3-none-any.whl (44.1 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file hack4her_review_data-0.1.0.tar.gz.

File metadata

Download URL: hack4her_review_data-0.1.0.tar.gz
Upload date: May 22, 2026
Size: 48.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hack4her_review_data-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3128a17d7d9d5d5b88119dbfb153ea8be897d5817da53607d9ae1c7a8e6bba69`
MD5	`a985263704bb3587c361367ce62faeff`
BLAKE2b-256	`f4be4798681cea0ba5fad2ed9314a0da50f37e7676245595fb4ee1b08692c4fa`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hack4her_review_data-0.1.0.tar.gz:

Publisher: publish.yml on iflashlord/hack4her-review-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hack4her_review_data-0.1.0.tar.gz
- Subject digest: 3128a17d7d9d5d5b88119dbfb153ea8be897d5817da53607d9ae1c7a8e6bba69
- Sigstore transparency entry: 1604910349
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: iflashlord/hack4her-review-data@4e465891a70e8b234ff0743ccfa50a03f84f8960
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/iflashlord
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4e465891a70e8b234ff0743ccfa50a03f84f8960
- Trigger Event: release

File details

Details for the file hack4her_review_data-0.1.0-py3-none-any.whl.

File metadata

Download URL: hack4her_review_data-0.1.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 44.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hack4her_review_data-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6345d25d79604cea51af6b977a015b3e3c62c6bc65ce1f11344df4a6a52b615c`
MD5	`aa66c48c5b2fc69dc2689b308fe33244`
BLAKE2b-256	`0598f3ed5998b4da16c09ac27ab02a1a69e149ca751dff76beb6f46c9f1442b5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hack4her_review_data-0.1.0-py3-none-any.whl:

Publisher: publish.yml on iflashlord/hack4her-review-data

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hack4her_review_data-0.1.0-py3-none-any.whl
- Subject digest: 6345d25d79604cea51af6b977a015b3e3c62c6bc65ce1f11344df4a6a52b615c
- Sigstore transparency entry: 1604910483
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: iflashlord/hack4her-review-data@4e465891a70e8b234ff0743ccfa50a03f84f8960
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/iflashlord
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4e465891a70e8b234ff0743ccfa50a03f84f8960
- Trigger Event: release

hack4her-review-data 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Hack4Her Mock Accommodation Reviews

Generated Files

Run

Installable Package

Participant Start Point

Reproducibility

Scenarios

Useful Columns

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance