Hive Metastore Utils

About Hive Metastore

The Hive Metastore is a database with metadata for Hive tables.

To configure `SparklySession to work with external Hive Metastore, you need to set hive.metastore.uris option. You can do this via hive-site.xml file in spark config ($SPARK_HOME/conf/hive-site.xml):

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://<n.n.n.n>:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

or set it dynamically via SparklySession options:

class MySession(SparklySession):
    options = {
        'hive.metastore.uris': 'thrift://<n.n.n.n>:9083',
    }

Tables management

Why: sometimes you need more than just to create a table.

from sparkly import SparklySession


spark = SparklySession()

assert spark.catalog_ext.has_table('my_table') in {True, False}
spark.catalog_ext.rename_table('my_table', 'my_new_table')
spark.catalog_ext.drop_table('my_new_table')

Table properties management

Why: sometimes you want to assign custom attributes for your table, e.g. creation time, last update, purpose, data source. The only way to interact with table properties in spark - use raw SQL queries. We implemented a more convenient interface to make your code cleaner.

from sparkly import SparklySession


spark = SparklySession()
spark.catalog_ext.set_table_property('my_table', 'foo', 'bar')
assert spark.catalog_ext.get_table_property('my_table', 'foo') == 'bar'
assert spark.catalog_ext.get_table_properties('my_table') == {'foo': 'bar'}

Note properties are stored as strings. In case if you need other types, consider using a serialisation format, e.g. JSON.

API documentation

class sparkly.catalog.SparklyCatalog(spark)[source]

A set of tools to interact with HiveMetastore.

drop_table(table_name, checkfirst=True)[source]

Drop table from the metastore.

Note

Follow the official documentation to understand DROP TABLE semantic. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL #LanguageManualDDL-DropTable

Parameters:
  • table_name (str) – A table name.
  • checkfirst (bool) – Only issue DROPs for tables that are presented in the database.
get_table_properties(table_name)[source]

Get table properties from the metastore.

Parameters:table_name (str) – A table name.
Returns:Key/value for properties.
Return type:dict[str,str]
get_table_property(table_name, property_name, to_type=None)[source]

Get table property value from the metastore.

Parameters:
  • table_name (str) – A table name. Might contain a db name. E.g. “my_table” or “default.my_table”.
  • property_name (str) – A property name to read value for.
  • to_type (function) – Cast value to the given type. E.g. int or float.
Returns:

Any

has_table(table_name, db_name=None)[source]

Check if table is available in the metastore.

Parameters:
  • table_name (str) – A table name.
  • db_name (str) – A database name.
Returns:

bool

rename_table(old_table_name, new_table_name)[source]

Rename table in the metastore.

Note

Follow the official documentation to understand ALTER TABLE semantic. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL #LanguageManualDDL-RenameTable

Parameters:
  • old_table_name (str) – The current table name.
  • new_table_name (str) – An expected table name.
set_table_property(table_name, property_name, value)[source]

Set value for table property.

Parameters:
  • table_name (str) – A table name.
  • property_name (str) – A property name to set value for.
  • value (Any) – Will be automatically casted to string.