A package for detecting home and work locations from timestamped stop locations.

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

HoWDe

HoWDe (Home and Work Detection) is a Python package designed to identify home and work locations from individual timestamped sequences of stop locations. It processes stop location data to label each location as 'Home', 'Work', or 'None' based on user-defined parameters and heuristics.

A complete description of the algorithm can be found in our pre-print

Features

Processes stop location datasets to detect home and work locations.
Allows customization through various parameters to fine-tune detection heuristics.
Supports batch processing with multiple parameter configurations.
Outputs results as a PySpark DataFrame for seamless integration with big data workflows.

Installation

HoWDe requires Python 3.6 or later and a functional PySpark environment.

1. Install PySpark

Before installing HoWDe, ensure PySpark and Java are properly configured. For detailed setup instructions, please refer to the official PySpark Installation Guidelines

Installation Note:
PySpark may raise Py4JJavaError if Java or Spark are not properly configured. We recommend checking the Debugging PySpark and Py4JJavaError Guidelines

Compatibility Note:
Once PySpark/Java is correctly configured, HoWDe runs consistently across macOS, Ubuntu, and Windows. The following environments have been tested:

Python 3.9 + PySpark 3.3 + Java 20.0

Python 3.12 + PySpark 4.0 + Java 17.0

2. Install HoWDe

Once PySpark is installed and configured, you can install HoWDe via pip:

pip install HoWDe

Usage

The core function of the HoWDe package is HoWDe_labelling, which performs the detection of home and work locations.

def HoWDe_labelling(
    input_data,
    edit_config_default=None,
    range_window_home=28,
    range_window_work=42,
    C_hours=0.4,
    C_days_H=0.4,
    C_days_W=0.5,
    f_hours_H=0.7,
    f_hours_W=0.4,
    f_days_W=0.6,
    output_format="stop",
    verbose=False,
):
    """
    Perform Home and Work Detection (HoWDe)
    """

📥 Input Data

HoWDe expects the input to be a PySpark DataFrame containing one row per user stop, with the following columns:

Column	Type	Description
`useruuid`	str or int	Unique user identifier.
`loc`	str or int	Stop location ID (unique per `useruuid`). ⚠️ Avoid using `-1` to label meaningful stops, as these are dropped following the Infostop convention.
`start`	long	Start time of the stop (Unix timestamp).
`end`	long	End time of the stop (Unix timestamp).
`tz_hour_start`, `tz_minute_start`	int	Optional. Time zone offsets (hours and minutes) used to convert UTC timestamps to local time, if applicable.
`country`	int	Optional. Country code; if not provided, a default `"GL0B"` label is assigned.

Example

+---------+-----+-------------+-------------+---------------+----------------+---------+
| useruuid| loc | start       | end         | tz_hour_start | tz_minute_start| country |
+---------+-----+-------------+-------------+---------------+----------------+---------+
| 1001    |  1 | 1704031200  | 1704034800  | 1             | 0              | DK      |
| 1001    |  2 | 1704056400  | 1704060000  | 1             | 0              | DK      |
+---------+-----+-------------+-------------+---------------+----------------+---------+

💡 Scalability Tip: This package involves heavy computations (e.g., window functions, UDFs). To ensure efficient parallel processing, use df.repartition("useruuid") to distribute data across partitions evenly. This reduces memory bottlenecks and improves resource utilization.

⚙️ Key Parameters

Parameter	Type	Description	Suggested value and range
`range_window_home`	int or list	Sliding window size (in days) used to detect home locations.	28 [14-112]
`range_window_work`	int or list	Sliding window size (in days) used to detect work locations.	42 [14-112]
`C_hours`	float or list	Minimum fraction of night/business hourly-bins with data in a day	0.4 [0.2-0.9]
`C_days_H`	float or list	Minimum fraction of days with data in a window	0.4 [0.1-0.6]
`C_days_W`	float or list	Minimum fraction of days with data in a window	0.5 [0.4-0.6]
`f_hours_H`	float or list	Minimum average fraction of night hourly-bins (across days in the window) required for a location to qualify as Home.	0.7 [0.5-0.9]
`f_hours_W`	float or list	Minimum average fraction of business hourly-bins (across days in the window) required for a location to qualify as Work.	0.4 [0.4-0.6]
`f_days_W`	float or list	Minimum fraction of days within the window a location should be visited to qualify as Work.	0.6 [0.5-0.8]

All parameters listed above can also be provided as lists to explore multiple configurations in a single run.

💡 Tuning Tip: When adjusting detection parameters, start by refining the temporal coverage filters C_days_H, C_days_W to match the characteristics of your data. Once these are well aligned, tune the estimation thresholds f_hours_H, f_hours_W, f_days_W based on the case of study according to the specifics of your case study. These estimation thresholds play a major role in determining how strictly the algorithm identifies consistent home and work locations.

While we provide recommended parameter ranges to guide your exploration, the hard-coded limits in howde/config.py are intentionally more relaxed—they simply prevent non-sensical values. Inputs falling outside these hard limits will raise an error.

🔧 Other Parameters

edit_config_default (dict, optional): Optional dictionary that allows overriding the default settings in howde/config.py to fine-tune preprocessing and detection behavior.
The dictionary should include parameters:
- is_time_local — interpret timestamps as local time (True) or UTC (False)
- min_stop_t — minimum stop duration (seconds)
- start_hour_day, end_hour_day — hours used for home detection
- start_hour_work, end_hour_work — hours used for work detection
- data_for_predict — use only past data for estimation
stops_output (bool): If stop, returns stop-level data with location_type and one row per stop. If change, returns a compact DataFrame with only one row per day with home/work location changes.
verbose (bool): If True, reports processing steps.

📤 Returns

If a single parameter configuration is used, the function returns a PySpark DataFrame with three additional columns:

detect_H_loc The location ID (loc) identified as Home. Assigned if the location satisfies all filtering criteria. As such, it represents a day-level assessment, taking into account observations within a sliding window of t ± range_window_home / 2 days.
detect_W_loc The location ID (loc) identified as Work. Assigned if the location satisfies all filtering criteria. As such, it represents a day-level assessment, taking into account observations within a sliding window of t ± range_window_work / 2 days.
location_type Indicates the detected location type for each stop ('H' for Home, 'W' for Work, or 'O' for Other), based on matching the stop location to the inferred home/work labels.

If multiple parameter configurations are provided (as lists), the function returns a list of dictionaries, each with keys:

configs: including the configuration used
res: including the resulting labeled PySpark DataFrame (as described above)

Example Usage

from pyspark.sql import SparkSession
from howde import HoWDe_labelling

# Initialize Spark session
spark = SparkSession.builder.appName('HoWDeApp').getOrCreate()

# Load your stop location data
input_data = spark.read.parquet('path_to_your_data.parquet')

# Run HoWDe labelling
labeled_data = HoWDe_labelling(
    input_data,
    range_window_home=28,
    range_window_work=42,
    C_hours=0.4,
    C_days_H=0.4,
    C_days_W=0.5,
    f_hours_H=0.7,
    f_hours_W=0.4,
    f_days_W=0.6,
    output_format="stop",
    verbose=False,
)

# Show the results
labeled_data.show()

See more examples at /tutorials

Data

Anonymized stop location data with true home and work labels will be available at:

De Sojo Caso, Silvia; Lucchini, Lorenzo; Alessandretti, Laura (2025). Benchmark datasets for home and work location detection: stop sequences and annotated labels. Technical University of Denmark. Dataset. https://doi.org/10.11583/DTU.28846325

License

This project is licensed under the MIT License. See the License file for details.

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

2.0.0

Nov 22, 2025

1.1.1

Nov 22, 2025

1.1

Jun 23, 2025

1.0.0

May 27, 2025

0.1.2

May 19, 2025

0.1.1.1

Apr 30, 2025

0.1.1

Feb 13, 2025

0.1.0

Feb 10, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

howde-2.0.0.tar.gz (19.1 kB view details)

Uploaded Nov 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

howde-2.0.0-py3-none-any.whl (18.4 kB view details)

Uploaded Nov 22, 2025 Python 3

File details

Details for the file howde-2.0.0.tar.gz.

File metadata

Download URL: howde-2.0.0.tar.gz
Upload date: Nov 22, 2025
Size: 19.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for howde-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`33f13bb5577b4061039a79e8c62d71cf31bdc9d0eac39721f07bb73552dbc5d2`
MD5	`904bfdb0a9a9eeb7a5a71e616ac06e99`
BLAKE2b-256	`91f88a29bfeeef0df89984b05ad8f66f7f3dc48caa9803ed3b80659dc7e2d933`

See more details on using hashes here.

File details

Details for the file howde-2.0.0-py3-none-any.whl.

File metadata

Download URL: howde-2.0.0-py3-none-any.whl
Upload date: Nov 22, 2025
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for howde-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bb5a834a2d83dc98503660e2c4d4dae92f86e9cad0d5ab5f2fa1fca19a7c8aa4`
MD5	`e26d0b82bbbab0143c573543249102d1`
BLAKE2b-256	`728055fa29618135a9baeba0c424c47c84b93cdcaa7fea3792c525bb44f0afc0`

See more details on using hashes here.

HoWDe 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

HoWDe

Features

Installation

Usage

📥 Input Data

Example

⚙️ Key Parameters

🔧 Other Parameters

📤 Returns

Example Usage

Data

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes