Skip to content

TourismOutboundStatisticsCalculation Configuration

To initialise and run the component two configs are used: general_config.ini and tourism_outbound_statistics_calculation.ini.

General configuration

In general_config.ini all paths to the corresponding data objects shall be specified. Example:

...
[Paths.Silver]
...
time_segments_silver = ${Paths:silver_dir}/time_segments
mcc_iso_timezones_data_bronze = ${Paths:bronze_dir}/mcc_iso_timezones
tourism_outbound_trips_silver = ${Paths:silver_dir}/tourism_outbound_trips
tourism_outbound_aggregations_silver = ${Paths:silver_dir}/tourism_outbound_aggregations
...
time_segments_silver, mcc_iso_timezones_data_bronze and tourism_outbound_trips_silver are input data paths.

tourism_outbound_trips_silver and tourism_outbound_aggregations_silver are output data paths.

tourism_outbound_trips_silver is both the input and output path and it does not need to contain data for the component to be executed, but its data can be used as input if it does.

Configuration parameters

The component configuration file tourism_outbound_statistics_calculation.ini has three sections:

Spark and Logging contain generic session name and logging parameters.

The section TourismOutboundStatisticsCalculation controls component logic and contains the following parameters:

  • data_period_start: YYYY-MM format string. Indicates the first month for which the component will generate results for. Example: 2023-01.
  • data_period_end: YYYY-MM format string. Indicates the last month for which the component will generate results for. Example: 2023-02.
  • clear_destination_directory: Boolean. Indicates if existing results should be deleted before execution. If True, existing data in path tourism_outbound_aggregations_silver will be deleted before calculations start. Example: True.
  • delete_existing_trips: Boolean. Indicates if existing trips (from previous executions) should be deleted before execution. If True, existing data in path tourism_outbound_trips_silver will be deleted before calculations start. If they are not deleted, they may be used as input data during the execution. Example: False.
  • max_outbound_trip_gap_h: Integer. Maximum number of hours allowed between two time segments for them to be possibly marked as part of the same trip. Additionally is used to determine the size of the look-forward window to retrieve next month entries when processing monthly data. Example: 24.
  • min_duration_segment_m: Integer. Minimum duration in minutes for a time segment to be used as input data. Example: 72.
  • functional_midnight_h: Integer. Hour of day acting as the functional midnight. Example: 4.
  • min_duration_segment_night_m: Integer. Minimum duration in minutes for a time segment to be possibly marked as an overnight segment. Example: 200.

Configuration example

[Logging]
level = DEBUG

[Spark]
session_name = TourismOutboundStatisticsCalculation

[TourismOutboundStatisticsCalculation]
data_period_start = 2023-01
data_period_end = 2023-02
clear_destination_directory = true
delete_existing_trips = False
max_outbound_trip_gap_h = 72
min_duration_segment_m = 180
functional_midnight_h = 4
min_duration_segment_night_m = 200