io_interface
Module that implements classes for reading data from different data sources into a Spark DataFrames.
CsvInterface
Bases: PathInterface
Class that implements the PathInterface abstract class for reading/writing data from a csv data source.
Source code in multimno/core/io_interface.py
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
read_from_interface(spark, path, schema, header=True, sep=',')
Method that reads data from a csv type data source as a Spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spark
|
SparkSession
|
Spark session. |
required |
path
|
str
|
Path to the data. |
required |
schema
|
StructType
|
Schema of the data. Defaults to None. |
required |
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
Spark dataframe. |
Source code in multimno/core/io_interface.py
135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
|
write_from_interface(df, path, partition_columns=None, header=True, sep=',')
Method that writes data from a Spark DataFrame to a csv data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Spark DataFrame. |
required |
path
|
str
|
Path to the data. |
required |
partition_columns
|
List[str]
|
columns used for a partition write. |
None
|
Raises: NotImplementedError: csv files should not be written in this architecture.
Source code in multimno/core/io_interface.py
155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
|
GeoParquetInterface
Bases: PathInterface
Class that implements the PathInterface abstract class for reading/writing data from a geoparquet data source.
Source code in multimno/core/io_interface.py
177 178 179 180 |
|
HttpGeoJsonInterface
Bases: IOInterface
Class that implements the IO interface abstract class for reading GeoJSON data from an HTTP source.
Source code in multimno/core/io_interface.py
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 |
|
read_from_interface(spark, url, timeout=60, max_retries=5)
Method that reads GeoJSON data from an HTTP source and converts it to a Spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
URL of the GeoJSON data. |
required |
timeout
|
int
|
Timeout for the GET request in seconds. Default is 60. |
60
|
max_retries
|
int
|
Maximum number of retries for the GET request. Default is 5. |
5
|
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
Spark DataFrame. |
Source code in multimno/core/io_interface.py
186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 |
|
write_from_interface(df, url, timeout=60, max_retries=5)
Method that writes a DataFrame to an HTTP source as GeoJSON data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
DataFrame to write. |
required |
url
|
str
|
URL of the HTTP source. |
required |
timeout
|
int
|
Timeout for the POST request in seconds. Default is 60. |
60
|
max_retries
|
int
|
Maximum number of retries for the POST request. Default is 5. |
5
|
Source code in multimno/core/io_interface.py
220 221 222 223 224 225 226 227 228 229 |
|
IOInterface
Abstract interface that provides functionality for reading and writing data
Source code in multimno/core/io_interface.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|
JsonInterface
Bases: PathInterface
Class that implements the PathInterface abstract class for reading/writing data from a json data source.
Source code in multimno/core/io_interface.py
94 95 96 97 |
|
ParquetInterface
Bases: PathInterface
Class that implements the PathInterface abstract class for reading/writing data from a parquet data source.
Source code in multimno/core/io_interface.py
88 89 90 91 |
|
PathInterface
Bases: IOInterface
Abstract interface for reading/writing data from a file type data source.
Source code in multimno/core/io_interface.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
read_from_interface(spark, path, schema=None)
Method that reads data from a file type data source as a Spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spark
|
SparkSession
|
Spark session. |
required |
path
|
str
|
Path to the data. |
required |
schema
|
StructType
|
Schema of the data. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
Spark dataframe. |
Source code in multimno/core/io_interface.py
49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
|
write_from_interface(df, path, partition_columns=None, mode=SPARK_WRITING_MODES.APPEND)
Method that writes data from a Spark DataFrame to a file type data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Spark DataFrame. |
required |
path
|
str
|
Path to the data. |
required |
partition_columns
|
List[str]
|
columns used for a partition write. |
None
|
Source code in multimno/core/io_interface.py
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
|
ShapefileInterface
Bases: PathInterface
Class that implements the PathInterface abstract class for reading/writing data from a ShapeFile data source.
Source code in multimno/core/io_interface.py
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
|
read_from_interface(spark, path, schema=None)
Method that reads data from a ShapeFile type data source as a Spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
spark
|
SparkSession
|
Spark session. |
required |
path
|
str
|
Path to the data. |
required |
schema
|
StructType
|
Schema of the data. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
df |
DataFrame
|
Spark dataframe. |
Source code in multimno/core/io_interface.py
103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
write_from_interface(df, path, partition_columns=None)
Method that writes data from a Spark DataFrame to a ShapeFile data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
df
|
DataFrame
|
Spark DataFrame. |
required |
path
|
str
|
Path to the data. |
required |
partition_columns
|
List[str]
|
columns used for a partition write. |
None
|
Raises: NotImplementedError: ShapeFile files should not be written in this architecture.
Source code in multimno/core/io_interface.py
117 118 119 120 121 122 123 124 125 126 127 |
|