utils
This module contains utility functions for the multimno package.
apply_schema_casting(sdf, schema)
This function takes a DataFrame and a schema, and applies the schema to the DataFrame. It selects the columns in the DataFrame that are in the schema, and casts each column to the type specified in the schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf
|
DataFrame
|
The DataFrame to apply the schema to. |
required |
schema
|
StructType
|
The schema to apply to the DataFrame. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
A new DataFrame that includes the same rows as the input DataFrame, |
DataFrame
|
but with the columns cast to the types specified in the schema. |
Source code in multimno/core/utils.py
282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
|
calc_hashed_user_id(df, user_column=ColNames.user_id)
Calculates SHA2 hash of user id, takes the first 31 bits and converts them to a non-negative 32-bit integer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Data of clean synthetic events with a user id column. |
required |
Returns:
Type | Description |
---|---|
DataFrame
|
pyspark.sql.DataFrame: Dataframe, where user_id column is transformered to a hashed value. |
Source code in multimno/core/utils.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 |
|
clip_polygons_with_mask_polygons(input_sdf, mask_sdf, cols_to_keep, self_intersection=False, geometry_column='geometry')
Cuts polygons in the input DataFrame with mask polygons from another DataFrame. This function takes two DataFrames: one with input polygons and another with mask polygons. It cuts the input polygons with the mask polygons, and returns a new DataFrame with the resulting polygons. Both dataframes have to have same coordinate system. Args: input_sdf (DataFrame): A DataFrame containing the input polygons. mask_sdf (DataFrame): A DataFrame containing the mask polygons. cols_to_keep (list): A list of column names to keep from the input DataFrame. geometry_column (str, optional): The name of the geometry column in the DataFrames. Defaults to "geometry". Returns: DataFrame: A DataFrame containing the resulting polygons after cutting the input polygons with the mask polygons.
Source code in multimno/core/utils.py
91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
cut_geodata_to_extent(sdf, extent, target_crs, geometry_column='geometry')
Cuts geometries in a DataFrame to a specified extent.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf
|
DataFrame
|
The DataFrame to filter. The DataFrame must contain a geometry column. |
required |
extent
|
tuple
|
A tuple representing the extent. The tuple contains four elements: (west, south, east, north), which are the western, southern, eastern, and northern bounds of the WGS84 extent. |
required |
target_crs
|
int
|
The CRS of DataFrame to transform the extent to. |
required |
geometry_column
|
str
|
The name of the geometry column. Defaults to "geometry". |
'geometry'
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
A DataFrame containing the same rows as the input DataFrame, but with the geometries cut to the extent. |
Source code in multimno/core/utils.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
|
filter_geodata_to_extent(sdf, extent, target_crs, geometry_column='geometry')
Filters a DataFrame to include only rows with geometries that intersect a specified extent.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf
|
DataFrame
|
The DataFrame to filter. The DataFrame must contain a geometry column. |
required |
extent
|
tuple
|
A tuple representing the extent. The tuple contains four elements: (west, south, east, north), which are the western, southern, eastern, and northern bounds of the WGS84 extent. |
required |
target_crs
|
int
|
The CRS of DataFrame to transform the extent to. |
required |
geometry_column
|
str
|
The name of the geometry column. Defaults to "geometry". |
'geometry'
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
A DataFrame containing only the rows from the input DataFrame where the geometry intersects the extent. |
Source code in multimno/core/utils.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 |
|
fix_geometry(sdf, geometry_type, geometry_column='geometry')
Fixes the geometry of a given type in a DataFrame. This function applies several operations to the geometries in the specified geometry column of the DataFrame: 1. If a geometry is a collection of geometries, extracts only the geometries of the given type. 2. Filters out any geometries of type other than given. 3. Removes any invalid geometries. 4. Removes any empty geometries. Args: sdf (DataFrame): The DataFrame containing the geometries to check. geometry_column (str, optional): The name of the column containing the geometries. Defaults to "geometry". Returns: DataFrame: The DataFrame with the fixed polygon geometries.
Source code in multimno/core/utils.py
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
|
get_epsg_from_geometry_column(df)
Get the EPSG code from the geometry column of a DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame with a geometry column. |
required |
Raises:
Type | Description |
---|---|
ValueError
|
If the DataFrame contains multiple EPSG codes. |
Returns:
Name | Type | Description |
---|---|---|
int |
int
|
EPSG code of the geometry column. |
Source code in multimno/core/utils.py
226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 |
|
merge_geom_within_mask_geom(input_sdf, mask_sdf, cols_to_keep, geometry_col)
Merges geometries from an input DataFrame that intersect with geometries from a mask DataFrame.
This function performs a spatial join between input and mask DataFrames using ST_Intersects, calculates the geometric intersection between each matching pair of geometries, then groups by specified columns and unions the resulting intersection geometries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
input_sdf
|
DataFrame
|
Input DataFrame containing geometries to be processed. Must contain a 'geometry' column. |
required |
mask_sdf
|
DataFrame
|
Mask DataFrame containing geometries that define the areas of interest. Must contain a 'geometry' column. |
required |
cols_to_keep
|
List
|
List of column names from the input DataFrame to preserve in the output. These columns will be used as grouping keys. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
A DataFrame containing merged geometries that result from intersecting the input geometries with the mask geometries. |
Source code in multimno/core/utils.py
53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
project_to_crs(sdf, crs_in, crs_out, geometry_column='geometry')
Projects geometry to CRS.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf
|
DataFrame
|
Input DataFrame. |
required |
crs_in
|
int
|
Input CRS. |
required |
crs_out
|
int
|
Output CRS. |
required |
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
DataFrame with geometry projected to cartesian CRS. |
Source code in multimno/core/utils.py
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 |
|
spark_to_geopandas(df, epsg=None)
Convert a Spark DataFrame to a geopandas GeoDataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Spark DataFrame to convert. |
required |
Returns:
Type | Description |
---|---|
GeoDataFrame
|
gpd.GeoDataFrame: GeoDataFrame with the same data as the input DataFrame. |
Source code in multimno/core/utils.py
248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 |
|