Apache Hive

The Apache Hive connector allows Trino to connect to a Hive metastore and query data stored in Apache Hadoop or S3 compatible objects storage.

Deploy a Hive Stacklet with the Stackable Operator for Apache Hive.

Example Hive catalog configuration

apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCatalog
metadata:
  name: hive-catalog  (1)
  labels:
    trino: simple-trino  (2)
spec:
  connector:
    hive:
      metastore:
        configMap: simple-hive  (3)
      s3:
        inline:
          host: test-minio
          port: 9000
          accessStyle: Path
          credentials:
            secretClass: minio-credentials
  configOverrides: (4)
    hive.metastore.username: trino

1	The name of the catalog as it will appear in Trino
2	TrinoCluster can use these labels to select which catalogs to include
3	The name of your Hive Stacklet
4	Use `configOverrides` to add arbitrary properties to the Trino catalog configuration

Connect to S3 store

The Hive connector can connect to an S3 store as follows:

spec:
  connector:
    hive:
      s3:
        inline:
          host: test-minio
          port: 9000
          accessStyle: Path
          credentials:
            secretClass: minio-credentials
      # OR
      s3:
        reference: my-minio

See S3 resources for details about S3 connections.

Please make sure that the underlying Hive metastore also has access to the S3 store, because it will e.g. check if the directory exists when creating tables.

Connect to HDFS cluster

The hive connector can connect to an HDFS operated by Stackable as follows:

spec:
  connector:
    hive:
      hdfs:
        configMap: simple-hdfs

Please make sure that the underlying Hive metastore also has access to the HDFS, because it will e.g. check if the directory exists when creating tables.

Adding unmanaged Hive clusters

You can add connect Trino to Hive catalogs from systems that are not managed by Stackable, including Hive running on existing Hadoop clusters. Unmanaged Hive instances can be defined by creating a ConfigMap containing the configuration for the remote Hive Metastore and HDFS or S3 storage services.

Create a Hive Metastore configMap

The Hive metastore ConfigMap contains the URL for the metastore’s thrift endpoint.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cloudera-hive
data:
  HIVE: thrift://10.132.0.59:9083

Create a HDFS configMap

When the Hive data is stored on HDFS you will need to provide a ConfigMap containing the HDFS configuration. To do this take the core-site.xml and hdfs-site.xml from your Hadoop cluster and create a ConfigMap with the keys core-site.xml and hdfs-site.xml.

apiVersion: v1
kind: ConfigMap
metadata:
  name: cloudera-hdfs
data:
  core-site.xml: |-
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://my.hadoop.cluster:8020</value>
      </property>
    <!-- truncated for brevity -->
    </configuration>
  hdfs-site.xml: |-
    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
      <property>
        <name>dfs.namenode.servicerpc-address</name>
        <value>my.hadoop.cluster:8022</value>
      </property>
    <!-- truncated for brevity -->
    </configuration>

Create the Trino Hive catalog

To use the unmanaged Hive metastore we define a TrinoCatalog object in the same way we would for a managed cluster, referencing the custom ConfigMap we created for Hive and HDFS.

apiVersion: trino.stackable.tech/v1alpha1
kind: TrinoCatalog
metadata:
  name: clouderahive
  labels:
    trino: simple-trino
spec:
  connector:
    hive:
      metastore:
        configMap: cloudera-hive
      hdfs:
        configMap: cloudera-hdfs