Unload Function (Copy into)

The objective of the "Unload Large Result Set to Object Storage" feature is to empower users of the Open Lakehouse Query Engine with the ability to efficiently export and store query results in object storage. The "Unload Large Result Set to Object Storage" feature supports the COPY INTO command within the e6data Query editor. This functionality enhances the capability of users to export large query result sets directly into object storage solutions.

Key Benefits

  • Efficient Export: Users can export large result sets efficiently, overcoming limitations on the number of records that can be exported.

  • Seamless Integration: Integration of the COPY INTO command within the query editor streamlines the export process, ensuring a seamless user experience.

  • Scalability: Facilitates the export of millions of records, catering to real-world scenarios where large datasets must be managed. The support for the COPY INTO command within the E6data Query editor allows users to export query results beyond the previous limitation of 10,000 records. This enhancement enables users to handle and export large result sets efficiently.

  • Object Store Support: Enables users to export query results into object stores like S3, providing flexibility and compatibility with popular storage solutions.

Pre-requisites

The user should have the following prerequisites before executing the COPY INTO command -

COPY INTO Command

Unload data from a table (or query) into a single file at a specified storage location

COPY INTO '<Cloud storage path>' 
FROM (<SQL Query>) 
MAX_FILE_SIZE <File size in bytes>;

Where,

<Cloud storage path> - S3 Storage file path

-- S3 bucket
COPY INTO 's3://sample/file.csv'

<SQL Query> - SQL Query whose output results will be stored in Cloud storage

<File size in bytes> - Output File size in bytes

Here's a sample COPY INTO command

COPY INTO 's3://sample/unloads/call_center_files.csv'
FROM (SELECT * FROM call_center order by cc_call_center_sk)
MAX_FILE_SIZE 5000000000;

Exporting Large Result Sets

  1. Execute the desired query using the e6data Query editor/Notebook

  2. Upon receiving the query results, identify the large result set that needs to be exported.

  3. Utilize the COPY INTO command to export the result set into the desired object storage solution, such as S3.

  4. Specify the destination within the object storage where the result set should be stored.

  5. Initiate the export process.

  6. Verify the exported result set in the specified object storage location upon successful completion.

  7. For phase 1, the file format will be downloaded with below-mentioned default values -

TYPE = CSV
RECORD_DELIMITER = '\n'
FIELD_DELIMITER = ","
FILE_EXTENSION = null
SKIP_HEADER = 0
DATE_FORMAT="YYYY-MM-DD"
TIME_FORMAT="HH24:MI:SS"
TIMESTAMP_FORMAT="YYYY-MM-DD HH24:MI:SS"
ESCAPE = NONE
ESCAPE_UNENCLOSED_FIELD = "\\"
TRIM_SPACE = false
FIELD_OPTIONALLY_ENCLOSED_BY = '\"'
NULL_IF = ('\"\"')
COMPRESSION = Auto
EMPTY_FIELD_AS_NULL = true
ENCODING = UTF8

Limitations

  1. Cloud storage paths must be specified as file paths; folder paths are not supported at this time. For instance, while we accept paths like: s3://sample/unloads/call_center_files.csv paths such as s3://sample/unloads/are not supported.

  2. The only supported output file format is CSV.

  3. AWS is the only supported cloud provider.

  4. Currently, only single-file exports are supported; exporting multiple files is not available.

  5. e6data does not apply any compression codec for output CSV files.

  6. Null values are currently replaced with "" across all columns independent of column type. However, Snowflake does not enforce this rule for Integer and Decimal datatype columns.

  7. Standard date and timestamp formats supported are:

    • Date format: YYYY-MM-DD

    • Timestamp format: YYYY-MM-DD HH24:MI:SS

Last updated

#930: Cross account hive GCP

Change request updated