Generic Utils¶
These are generic utils used in Sparkly.
-
sparkly.utils.
absolute_path
(file_path, *rel_path)[source]¶ Return absolute path to file.
- Usage:
>>> absolute_path('/my/current/dir/x.txt', '..', 'x.txt') '/my/current/x.txt'
>>> absolute_path('/my/current/dir/x.txt', 'relative', 'path') '/my/current/dir/relative/path'
>>> import os >>> absolute_path('x.txt', 'relative/path') == os.getcwd() + '/relative/path' True
Parameters: - file_path (str) – file
- rel_path (list[str]) – path parts
Returns: str
-
sparkly.utils.
kafka_get_topics_offsets
(host, topic, port=9092)[source]¶ Return available partitions and their offsets for the given topic.
Parameters: - host (str) – Kafka host.
- topic (str) – Kafka topic.
- port (int) – Kafka port.
Returns: [ – [(partition, start_offset, end_offset)].
Return type: int, int, int
-
sparkly.utils.
parse_schema
(schema)[source]¶ Generate schema by its string definition.
It’s basically an opposite action to DataType.simpleString method. Supports all atomic types (like string, int, float...) and complex types (array, map, struct) except DecimalType.
- Usages:
>>> parse_schema('string') StringType >>> parse_schema('int') IntegerType >>> parse_schema('array<int>') ArrayType(IntegerType,true) >>> parse_schema('map<string,int>') MapType(StringType,IntegerType,true) >>> parse_schema('struct<a:int,b:string>') StructType(List(StructField(a,IntegerType,true),StructField(b,StringType,true))) >>> parse_schema('unsupported') Traceback (most recent call last): ... sparkly.exceptions.UnsupportedDataType: Cannot parse type from string: "unsupported"