How to configure HIVE metastore if you don't have one?
This article will guide you on how to set up a HIVE metastore in case you don't have a metastore.
Last updated
This article will guide you on how to set up a HIVE metastore in case you don't have a metastore.
Last updated
In case you don't have a HIVE metastore, we have provided this step-by-step guide to create one for you. But before we jump into that, it's important to understand what HIVE metastore brings to the table.
Hive metastore is a component that stores all the structure information(metadata) of objects like tables and partitions in the warehouse including column and column type information. To know more about HIVE metastore and its architecture, please refer to this article https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore
Now let's go ahead and set up the HIVE metastore.
We have tried to simplify this process as much as possible but if you still face any issues you can email us at hello@e6data.com.
You can also build your own Metastore (on AWS or GCP) or you can use a managed Metastore service by GCP (Dataproc Metastore)
Once the Hive Metastore is setup, We need to make sure that we,
Create a Schema SQL file according to your S3/GCS data
Connect Presto with Hive Metastore
Run Schema SQL in presto to populate Schema and statistics in Hive Metastore using Presto
Create a Schema SQL file according to your S3/GCS data
The customer needs to prepare the SQL file with respect to the data, they want to query. We have provided the sample SQL files for you to go ahead and try it out.
Connecting Presto with Hive Metastore
Currently, Presto, Spark & many other engines can add schemas to HIVE metastore. We prefer Presto for the same. In case you don't have Presto with you, we have already created terraform scripts so that you can launch Presto in your own environment.
Run Schema SQL in presto to populate Schema and statistics in Hive Metastore using Presto
According to the Schema SQL file, the following operations will be performed.
Create Database
Create Table
Repair Table ( Only If the table is partitioned)
Analyze Table - Collect stats about a table so that it can be used for Cost Based optimization
For AWS: Login to the presto machine on EMR and run the following command:
For GCP: Login to the presto machine on Dataproc and run the following command: