Generic Utils

These are generic utils used in Sparkly.

sparkly.utils.absolute_path(file_path, *rel_path)[source]

Return absolute path to file.

Usage:
>>> absolute_path('/my/current/dir/x.txt', '..', 'x.txt')
'/my/current/x.txt'
>>> absolute_path('/my/current/dir/x.txt', 'relative', 'path')
'/my/current/dir/relative/path'
>>> import os
>>> absolute_path('x.txt', 'relative/path') == os.getcwd() + '/relative/path'
True
Parameters:
  • file_path (str) – file
  • rel_path (list[str]) – path parts
Returns:

str

sparkly.utils.kafka_get_topics_offsets(host, topic, port=9092)[source]

Return available partitions and their offsets for the given topic.

Parameters:
  • host (str) – Kafka host.
  • topic (str) – Kafka topic.
  • port (int) – Kafka port.
Returns:

[ – [(partition, start_offset, end_offset)].

Return type:

int, int, int

sparkly.utils.parse_schema(schema)[source]

Generate schema by its string definition.

It’s basically an opposite action to DataType.simpleString method. Supports all atomic types (like string, int, float...) and complex types (array, map, struct) except DecimalType.

Usages:
>>> parse_schema('string')
StringType
>>> parse_schema('int')
IntegerType
>>> parse_schema('array<int>')
ArrayType(IntegerType,true)
>>> parse_schema('map<string,int>')
MapType(StringType,IntegerType,true)
>>> parse_schema('struct<a:int,b:string>')
StructType(List(StructField(a,IntegerType,true),StructField(b,StringType,true)))
>>> parse_schema('unsupported')
Traceback (most recent call last):
...
sparkly.exceptions.UnsupportedDataType: Cannot parse type from string: "unsupported"