CellConnectionProbabilityEstimation Configuration

To initialise and run the component two configs are used - general_config.ini and cell_connection_probability_estimation.ini. In general_config.ini to execute the component specify all paths to its four corresponding data objects (input + output). Example:

[Paths.Bronze]
event_data_bronze = ${Paths:bronze_dir}/mno_events

[Paths.Silver]
cell_footprint_data_silver = ${Paths:silver_dir}/cell_footprint
grid_data_silver = ${Paths:silver_dir}/grid
cell_connection_probabilities_data_silver = ${Paths:silver_dir}/cell_conn_probs
# only if used
enriched_grid_data_silver = ${Paths:silver_dir}/grid_enriched

In cell_connection_probability_estimation.ini parameters are as follows: - clear_destination_directory - boolean, if True, the component will clear all the data in output paths.

partition_number - integer, the number of partitions to use for the Spark DataFrame. The higher the number, the more memory is required.
data_period_start - string, format should be “yyyy-MM-dd“ (e.g. 2023-01-01), the date from which start Event Cleaning
data_period_end - string, format should be “yyyy-MM-dd“ (e.g. 2023-01-05), the date till which perform Event Cleaning
use_land_use_prior - boolean, if True, the land use prior will be used for cell connection posterior probability estimation. If False, the land use prior will not be used, only connection probability based on cell footprint will be estimated.
landuse_prior_weights - dictionary, keys are land use types, values are weights for these landuse types. The land use types are:
- residential_builtup
- other_builtup
- roads
- other_human_activity
- open_area
- forest
- water

Configuration example

[CellConnectionProbabilityEstimation]
data_period_start = 2023-01-01
data_period_end = 2023-01-15
use_land_use_prior = False

landuse_prior_weights = {
    "residential_builtup": 1.0,
    "other_builtup": 1.0,
    "roads": 0.5,
    "other_human_activity": 0.1,
    "open_area": 0.0,
    "forest": 0.1,
    "water": 0.0
    }