OutputIndicators Configuration
To initialise and run the component two configs are used - general_config.ini
and output_indicators.ini
. In general_config.ini
all paths to the corresponding data objects shall be specified. For example:
[Paths.Bronze]
# Optional dataset for Inbound Tourism use case, contains deduplication and mno-to-target-population factors
# for nationalities that visit the local country
inbound_estimation_factors_bronze = ${Paths:bronze_dir}/inbound_estimation_factors
[Paths.Silver]
# Only needed for Present Population and Usual Environment use cases, when spatial aggregation is to be performed
# for a custom zoning (not INSPIRE grid)
geozones_grid_map_data_silver = ${Paths:silver_dir}/geozones_grid_map
# Present Population
present_population_silver = ${Paths:silver_dir}/present_population
# Usual Environment
aggregated_usual_environments_silver = ${Paths:ue_dir}/aggregated_usual_environment
# Internal Migration
internal_migration_silver = ${Paths:ue_dir}/internal_migration
# Inbound Tourism
tourism_geozone_aggregations_silver = ${Paths:silver_dir}/tourism_geozone_aggregations
tourism_trip_aggregations_silver = ${Paths:silver_dir}/tourism_trip_aggregations
# Outbound Tourism
tourism_outbound_aggregations_silver = ${Paths:silver_dir}/tourism_outbound_aggregations
[Paths.Gold]
# Present Population
present_population_zone_gold = ${Paths:gold_dir}/present_population_zone_gold
# Usual Environment
aggregated_usual_environments_zone_gold = ${Paths:gold_dir}/aggregated_usual_environments_zone_gold
# Internal Migration
internal_migration_gold = ${Paths:gold_dir}/internal_migration
# Inbound Tourism
tourism_geozone_aggregations_gold = ${Paths:gold_dir}/tourism_geozone_aggregations
tourism_trip_aggregations_gold = ${Paths:gold_dir}/tourism_trip_aggregations
# Outbound Tourism
tourism_outbound_aggregations_gold = ${Paths:gold_dir}/tourism_outbound_aggregations
The OutputIndicators component can be executed for one of five use cases, processing the output data objects of that use case and preparing them to be exported out of the MNO premises. The execution is configured through the output_indicators.ini
file, that contains the following sections.
General section
Under the general section, the user may specify for which use case outputs the process will be executed and other parameters that are common to all use cases. The following parameters are expected:
- use_case: string, it specifies the use case whose outputs will be processed by the component. Currently there are five accepted values:
- PresentPopulationEstimation
for the Present Population use case
- UsualEnvironmentAggregation
for the Usual Environment use case
- InternalMigration
for the Internal Migration use case
- TourismStatisticsCalculation
for the Inbound Tourism use case
- TourismOutboundStatisticsCalculation
for the Outbound Tourism use case
- clear_destination_directory: boolean, whether to delete all previous results in the output directory before running the component.
- deduplication_factor_local: float, this factor will be used to multiply the appropriate values computed at device level to take into account the possibility that a real person might own more than one device. In particular, this factor will be applied to indicators that are based on devices that are part of the local MNO network. Example:
0.98
. - deduplication_factor_default_inbound: float, this factor will be used to multiply the appropriate values computed at device level to take into account the possibility that a real person might own more than one device. In particular, this factor will be applied to indicators that are based on devices that belong to an MNO of a country other than the local one, and will only be used as a default value for countries whose factor is not specified in the InboundEstimationFactors data object. Example:
0.95
. - mno_to_target_population_factor_local: float, this factor will be used to multiply the (now deduplicated) population value computed at device level to represent the total target population. In particular, this factor will be applied to indicators that are based on devices that are part of the local MNO network .Example:
5.4
. - mno_to_target_population_factor_default_inbound: float, this factor will be used to multiply the (now deduplicated) population value computed at device level to represent the total target population. In particular, this factor will be applied to indicators that are based on devices that belong to an MNO of a country other than the local one, and will only be used as a default value for countries whose factor is not specified in the InboundEstimationFactors data object. Example:
4.2
. - k: integer, it indicates the threshold value to select what records should have k-anonymity applied to them, that is, all records that have a population value strictly lower than k. Example:
5
. - anonymity_type: string, it indicates what k-anonymity metholodogy to employ. If it is set to
delete
, it will delete all records that have a population value strictly lower than k. If it is set toobfuscate
, it will replace all population values strictly lower than k by the negative value-1
. Example:delete
.
Present Population section
In this section, the user may configure exactly what part of the output of the present population use case will be processed by the component.
- zoning_dataset_id: string, ID of the zones dataset to use to aggregate the data from grid level to zone level. The user may also specify one of the reserved dataset names INSPIRE_100m
and INSPIRE_1km
, in which case no geozones_grid_map_data_silver
needs to be read.
- hierarchical_levels: comma-separated list of positive integers, they are the hierarchical levels of the zones dataset used that should be processed by this component. If the zones dataset was not hierarchical, this should just take the value 1
. The list of levels is ignored and set to be 1
if a different value is encountered when zoning_dataset_id is one of the reserved dataset names. Example: 1,2,3
.
- start_date: string, in YYYY-MM-DD
format, indicates the starting date (inclusive) for which the present population data should be processed by this component. Example: 2023-01-01
.
- end_date: string, in YYYY-MM-DD
format, indicates the ending date (inclusive) for which the present population data should be processed by this component. Example: 2023-01-01
.
Usual Environment section
In this section, the user may configure exactly what part of the output of the usual environment use case (also encompassing the home location use case) will be processed by the component.
- zoning_dataset_id: string, ID of the zones dataset to use to aggregate the data from grid level to zone level. The user may also specify one of the reserved dataset names INSPIRE_100m
and INSPIRE_1km
, in which case no geozones_grid_map_data_silver
needs to be read.
- hierarchical_levels: comma-separated list of positive integers, they are the hierarchical levels of the zones dataset used that should be processed by this component. If the zones dataset was not hierarchical, this should just take the value 1
. The list of levels is ignored and set to be 1
if a different value is encountered when zoning_dataset_id is one of the reserved dataset names. Example: 1,2,3
.
- labels: comma-separated list of strings, they indicate what usual environment labels should be processed by this component. Allowed values are home
, work
, ue
. Example: ue, home, work
.
- start_month: string, in YYYY-MM
format, it indicates the first month of the period used to compute the specific usual environment dataset that the user desires to process through this component. Example: 2023-01
.
- end_month: string, in YYYY-MM
format, it indicates the last month of the period used to compute the specific usual environment dataset that the user desires to process through this component. Example: 2023-03
.
- season: string, value of the season of the usual environment dataset that the user desires to process through this component. Allowed values are: all
, spring
, summer
, autumn
, winter
. Example: all
.
## Internal Migration section
In this section, the user may configure exactly what part of the output of the home location use case will be processed by the component.
- zoning_dataset_id: string, ID of the zones dataset that the internal migration data refers to.
- hierarchical_levels: comma-separated list of positive integers, they are the hierarchical levels of the zones dataset used that should be processed by this component. If the zones dataset was not hierarchical, this should just take the value 1
. Example: 1,2,3
.
- start_month_previous: string, in YYYY-MM
format, it indicates the first month of the first long-term period used to compute the internal migration. Example: 2023-01
.
- end_month_previous: string, in YYYY-MM
format, it indicates the last month of the first long-term period used to compute the internal migration. Example: 2023-06
.
- season_previous: string, value of the season of the first long-term period used to compute the internal migration. Allowed values are: all
, spring
, summer
, autumn
, winter
. Example: all
.
- start_month_new: string, in YYYY-MM
format, it indicates the first month of the second long-term period used to compute the internal migration. Example: 2023-07
.
- end_month_new: string, in YYYY-MM
format, it indicates the last month of the second long-term period used to compute the internal migration. Example: 2023-12
.
- season_new: string, value of the season of the second long-term period used to compute the internal migration. Allowed values are: all
, spring
, summer
, autumn
, winter
. Example: all
.
## Inbound Tourism section
In this section, the user may configure exactly what part of the output of the inbound tourism use case will be processed by the component.
- zoning_dataset_id: string, ID of the zones dataset that the inbound tourism data refers to.
- hierarchical_levels: comma-separated list of positive integers, they are the hierarchical levels of the zones dataset used that should be processed by this component. If the zones dataset was not hierarchical, this should just take the value 1
. The list of levels is ignored and set to be 1
if a different value is encountered when zoning_dataset_id is one of the reserved dataset names. Example: 1,2,3
.
- start_month: string, in YYYY-MM
format, it indicates the first month of the month interval of inbound tourism data that the user desires to process through this component. Example: 2023-01
.
- end_month: string, in YYYY-MM
format, it indicates the last month of the month interval of inbound tourism data that the user desires to process through this component. Example: 2023-03
.
## Outbound Tourism section
In this section, the user may configure exactly what part of the output of the outbound tourism use case will be processed by the component.
- start_month: string, in YYYY-MM
format, it indicates the first month of the month interval of outbound tourism data that the user desires to process through this component. Example: 2023-01
.
- end_month: string, in YYYY-MM
format, it indicates the last month of the month interval of outbound tourism data that the user desires to process through this component. Example: 2023-03
.
## Configuration example
[Spark]
session_name = OutputIndicators
[OutputIndicators]
use_case = InternalMigration
clear_destination_directory = False
deduplication_factor_local = 0.95
deduplication_factor_default_inbound = 0.95
mno_to_target_population_factor_local = 5.4
mno_to_target_population_factor_default_inbound = 4.2
anonymity_type = obfuscate # `obfuscate`, `delete`
k = 5
[OutputIndicators.InternalMigration]
# Target InternalMigration dataset
zoning_dataset_id = nuts # ID of the zoning dataset
hierarchical_levels = 1,2,3,4 # Hierarchical level(s) of the zoning dataset. Comma-separated list
start_month_previous = 2023-01 # Start month (inclusive) of the first long-term period
end_month_previous = 2023-06 # End month (inclusive) of the first long-term period
season_previous = all # Allowed values: `all`, `spring`, `summer`, `autumn`, `winter`. First long term period
start_month_new = 2023-07 # Start month (inclusive) of the second long-term period
end_month_new = 2023-12 # End month (inclusive) of the second long-term period
season_new = all # Allowed values: `all`, `spring`, `summer`, `autumn`, `winter`. Second long term period
[OutputIndicators.PresentPopulationEstimation]
# Target InternalMigration dataset
zoning_dataset_id = nuts # ID of the zoning dataset
hierarchical_levels = 1,2,3 # Hierarchical level(s) of the zoning dataset. Comma-separated list
start_date = 2023-01-03 # start date (inclusive)
end_date = 2023-01-03 # end date (inclusive)
[OutputIndicators.UsualEnvironmentAggregation]
zoning_dataset_id = nuts # ID of the zoning dataset
hierarchical_levels = 1,2,3 # Hierarchical level(s) of the zoning dataset. Comma-separated list
labels = ue, home, work # Allowed values: `ue`, `home`, `work`. Comma-separated list
start_month = 2023-01 # Start month (inclusive)
end_month = 2023-06 # End month (inclusive)
season = all
[OutputIndicators.TourismStatisticsCalculation]
zoning_dataset_id = nuts
hierarchical_levels = 1,2,3
start_month = 2023-01
end_month = 2023-02
[OutputIndicators.TourismOutboundStatisticsCalculation]
start_month = 2023-01
end_month = 2023-02