TourismStatisticsCalculation Configuration
To initialise and run the component two configs are used: general_config.ini
and tourism_statistics_calculation.ini
.
General configuration
In general_config.ini
all paths to the corresponding data objects shall be specified. Example:
...
[Paths.Silver]
...
tourism_stays_silver = ${Paths:silver_dir}/tourism_stays
mcc_iso_timezones_data_bronze = ${Paths:bronze_dir}/mcc_iso_timezones
tourism_trips_silver = ${Paths:silver_dir}/tourism_trips
tourism_geozone_aggregations_silver = ${Paths:silver_dir}/tourism_geozone_aggregations
tourism_trip_aggregations_silver = ${Paths:silver_dir}/tourism_trip_aggregations
...
tourism_stays_silver
, mcc_iso_timezones_data_bronze
and tourism_trips_silver
are input data paths.
tourism_geozone_aggregations_silver
, tourism_trip_aggregations_silver
and tourism_trips_silver
are output data paths.
tourism_trips_silver
is both the input and output path and it does not need to contain data for the component to be executed, but its data can be used as input if it does.
Configuration parameters
The configuration file tourism_statistics_calculation.ini
has three sections:
Spark
and Logging
contain generic session name and logging parameters.
The section TourismStatisticsCalculation
controls component logic and contains the following parameters:
- data_period_start: YYYY-MM format string. Indicates the first month for which the component will generate results for. Example:
2023-01
. - data_period_end: YYYY-MM format string. Indicates the last month for which the component will generate results for. Example:
2023-02
. - clear_destination_directory: Boolean. Indicates if existing results should be deleted before execution. If True, existing data in paths
tourism_geozone_aggregations_silver
andtourism_trip_aggregations_silver
will be deleted before calculations start. Example:True
. - delete_existing_trips: Boolean. Indicates if existing trips (from previous executions) should be deleted before execution. If True, existing data in path
tourism_trips_silver
will be deleted before calculations start. If they are not deleted, they may be used as input data during the execution. Example:False
. - zoning_dataset_ids_and_levels_list: List of zoning dataset name and hierarchical level pairs. Each entry pair in the list should specify the name of a zoning dataset and the list of the hierarchical levels to calculate results for. For single-level datasets (such as
INSPIRE_1KM
), the hierarchical level should be[1]
. Example:[('test_dataset',[1,2,3])]
. - max_trip_gap_h: Integer. Maximum number of hours allowed between two stays for them to be possibly marked as part of the same trip. Additionally is used to determine the size of the look-forward window to retrieve next month entries when processing monthly data. Example:
24
. - max_visit_gap_h: Integer. Maxmimum number of hours allowed between two stays for them to be possibly marked as part of the same visit. Example:
24
.
Configuration example
[Logging]
level = DEBUG
[Spark]
session_name = TourismStatisticsCalculation
[TourismStatisticsCalculation]
data_period_start = 2023-01
data_period_end = 2023-02
clear_destination_directory = true
delete_existing_trips = False
zoning_dataset_ids_and_levels_list = [('test_dataset',[1,2,3])]
max_trip_gap_h = 24
max_visit_gap_h = 24