# How to configure HIVE metastore if you don't have one?

In case you don't have a [HIVE metastore](https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore), we have provided this step-by-step guide to create one for you. But before we jump into that, it's important to understand what HIVE metastore brings to the table.&#x20;

Hive metastore is a component that stores all the structure information(metadata) of objects like tables and partitions in the warehouse including column and column type information. To know more about HIVE metastore and its architecture, please refer to this article <https://cwiki.apache.org/confluence/display/hive/design#Design-Metastore>

**Now let's go ahead and set up the HIVE metastore.** &#x20;

We have tried to simplify this process as much as possible but if you still face any issues you can email us at <hello@e6data.com>.&#x20;

1. We have created terraform scripts so that you can launch HIVE metastore in your own environment. &#x20;
   1. [For AWS](https://github.com/e6x-labs/e6-oss-community/tree/main/terraform/aws/hive_metastore)&#x20;
   2. [For GCP](https://github.com/e6x-labs/e6-oss-community/tree/main/terraform/gcp/hive_metastore)
2. You can also build your own Metastore (on AWS or GCP) or you can use a managed Metastore service by GCP (Dataproc Metastore)
3. Once the Hive Metastore is setup, We need to make sure that we,

   1. &#x20;Create a Schema SQL file according to your S3/GCS data&#x20;
   2. Connect Presto with Hive Metastore&#x20;
   3. Run Schema SQL in presto to populate Schema and statistics in Hive Metastore using Presto&#x20;

**Create a Schema SQL file according to your S3/GCS data**&#x20;

The customer needs to prepare the SQL file with respect to the data, they want to query. We have provided the sample SQL files for you to go ahead and try it out.&#x20;

{% file src="<https://3484040590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FeVBYKZm1xFKFFVzS0lRJ%2Fuploads%2FSfKN0g8oqSTG1YqS5x58%2Fddl_schema_with_partitions.sql?alt=media&token=83241ba1-a341-4d73-bbca-953fdb6d4e5b>" %}

{% file src="<https://3484040590-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FeVBYKZm1xFKFFVzS0lRJ%2Fuploads%2FyGlXbcf6yyxOK3bEKbKy%2Fddl_schema.sql?alt=media&token=c21a046e-1c4c-418e-98b2-c65175ea7551>" %}

**Connecting Presto with Hive Metastore**&#x20;

Currently, Presto, Spark & many other engines can add schemas to HIVE metastore. We prefer Presto for the same. In case you don't have Presto with you, we have already created terraform scripts so that you can launch Presto in your own environment. &#x20;

1. [For AWS](https://github.com/e6x-labs/e6-oss-community/tree/main/terraform/aws/presto_on_emr)&#x20;
2. [For GCP](https://github.com/e6x-labs/e6-oss-community/tree/main/terraform/gcp/presto_on_dataproc)

**Run Schema SQL in presto to populate Schema and statistics in Hive Metastore using Presto**

According to the Schema SQL file, the following operations will be performed.&#x20;

1. Create Database&#x20;
2. Create Table&#x20;
3. Repair Table ( Only If the table is partitioned)&#x20;
4. Analyze Table - Collect stats about a table so that it can be used for Cost Based optimization

For AWS:  Login to the presto machine on EMR and run the following command:

```
./presto-cli --catalog hive --file input.sql
```

For GCP: Login to the presto machine on Dataproc and run the following command:

```
./presto --catalog hive --file input.sql
```
