LogoLogo
  • Welcome to e6data
  • Introduction to e6data
    • Concepts
    • Architecture
      • e6data in VPC Deployment Model
      • Connect to e6data serverless compute
  • Get Started
  • Sign Up
  • Setup
    • AWS Setup
      • In VPC Deployment (AWS)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in AWS
          • Update a AWS Terraform for your Workspace
        • AWS PrivateLink and e6data
        • VPC Peering | e6data on AWS
      • Connect to e6data serverless compute (AWS)
        • Workspace Creation
        • Catalog Creation
          • Glue Metastore
          • Hive Metastore
          • Unity Catalog
        • Cluster Creation
    • GCP Setup
      • In VPC Deployment (GCP)
        • Prerequisite Infrastructure
        • Infrastructure & Permissions for e6data
        • Setup Kubernetes Components
        • Setup using Terraform in GCP
        • Update a GCP Terraform for your Workspace
      • Connect to e6data serverless compute (GCP)
    • Azure Setup
      • Prerequisite Infrastructure
      • Infrastructure & Permissions for e6data
      • Setup Kubernetes Components
      • Setup using Terraform in AZURE
        • Update a AZURE Terraform for your Workspace
  • Workspaces
    • Create Workspaces
    • Enable/Disable Workspaces
    • Update a Workspace
    • Delete a Workspace
  • Catalogs
    • Create Catalogs
      • Hive Metastore
        • Connect to a Hive Metastore
        • Edit a Hive Metastore Connection
        • Delete a Hive Metastore Connection
      • Glue Metastore
        • Connect to a Glue Metastore
        • Edit a Glue Metastore Connection
        • Delete a Glue Metastore Connection
      • Unity Catalog
        • Connect to Unity Catalog
        • Edit Unity Catalog
        • Delete Unity Catalog
      • Cross-account Catalog Access
        • Configure Cross-account Catalog to Access AWS Hive Metastore
        • Configure Cross-account Catalog to Access Unity Catalog
        • Configure Cross-account Catalog to Access AWS Glue
        • Configure Cross-account Catalog to Access GCP Hive Metastore
    • Manage Catalogs
    • Privileges
      • Access Control
      • Column Masking
      • Row Filter
  • Clusters
    • Edit & Delete Clusters
    • Suspend & Resume Clusters
    • Cluster Size
    • Load Based Sizing
    • Auto Suspension
    • Query Timeout
    • Monitoring
    • Connection Info
  • Pools
    • Delete Pools
  • Query Editor
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Notebook
    • Editor Pane
    • Results Pane
    • Schema Explorer
    • Data Preview
  • Query History
    • Query Count API
  • Connectivity
    • IP Sets
    • Endpoints
    • Cloud Resources
    • Network Firewall
  • Access Control
    • Users
    • Groups
    • Roles
      • Permissions
      • Policies
    • Single Sign-On (SSO)
      • AWS SSO
      • Okta
      • Microsoft My Apps-SSO
      • Icons for IdP
    • Service Accounts
    • Multi-Factor Authentication (Beta)
  • Usage and Cost Management
  • Audit Log
  • User Settings
    • Profile
    • Personal Access Tokens (PAT)
  • Advanced Features
    • Cross-Catalog & Cross-Schema Querying
  • Supported Data Types
  • SQL Command Reference
    • Query Syntax
      • General functions
    • Aggregate Functions
    • Mathematical Functions & Operators
      • Arithematic Operators
      • Rounding and Truncation Functions
      • Exponential and Root Functions
      • Trigonometric Functions
      • Logarithmic Functions
    • String Functions
    • Date-Time Functions
      • Constant Functions
      • Conversion Functions
      • Date Truncate Function
      • Addition and Subtraction Functions
      • Extraction Functions
      • Format Functions
      • Timezone Functions
    • Conditional Expressions
    • Conversion Functions
    • Window Functions
    • Comparison Operators & Functions
    • Logical Operators
    • Statistical Functions
    • Bitwise Functions
    • Array Functions
    • Regular Expression Functions
    • Generate Functions
    • Cardinality Estimation Functions
    • JSON Functions
    • Checksum Functions
    • Unload Function (Copy into)
    • Struct Functions
  • Equivalent Functions & Operators
  • Connectors & Drivers
    • DBeaver
    • DbVisualiser
    • Apache Superset
    • Jupyter Notebook
    • Tableau Cloud
    • Tableau Desktop
    • Power BI
    • Metabase
    • Zeppelin
    • Python Connector
      • Code Samples
    • JDBC Driver
      • Code Samples
      • API Support
    • Configure Cluster Ingress
      • ALB Ingress in Kubernetes
      • GCE Ingress in Kubernetes
      • Ingress-Nginx in Kubernetes
  • Security & Trust
    • Best Practices
      • AWS Best Practices
    • Features & Responsibilities Matrix
    • Data Protection Addendum(DPA)
  • Tutorials and Best Practices
    • How to configure HIVE metastore if you don't have one?
    • How-To Videos
  • Known Limitations
    • SQL Limitations
    • Other Limitations
    • Restart Triggers
    • Cloud Provider Limitations
  • Error Codes
    • General Errors
    • User Account Errors
    • Workspace Errors
    • Catalog Errors
    • Cluster Errors
    • Data Governance Errors
    • Query History Errors
    • Query Editor Errors
    • Pool Errors
    • Connectivity Errors
  • Terms & Condition
  • Privacy Policy
    • Cookie Policy
  • FAQs
    • Workspace Setup
    • Security
    • Catalog Privileges
  • Services Utilised for e6data Deployment
    • AWS supported regions
    • GCP supported regions
    • AZURE supported regions
  • Release Notes & Updates
    • 6th Sept 2024
    • 6th June 2024
    • 18th April 2024
    • 9th April 2024
    • 30th March 2024
    • 16th March 2024
    • 14th March 2024
    • 12th March 2024
    • 2nd March 2024
    • 10th February 2024
    • 3rd February 2024
    • 17th January 2024
    • 9th January 2024
    • 3rd January 2024
    • 18th December 2023
    • 12th December 2023
    • 9th December 2023
    • 4th December 2023
    • 27th November 2023
    • 8th September 2023
    • 4th September 2023
    • 26th August 2023
    • 21st August 2023
    • 19th July 2023
    • 23rd May 2023
    • 5th May 2023
    • 28th April 2023
    • 19th April 2023
    • 15th April 2023
    • 10th April 2023
    • 30th March 2023
Powered by GitBook
On this page
  • Key Benefits
  • Pre-requisites
  • COPY INTO Command
  • Exporting Large Result Sets
  • Limitations
  1. SQL Command Reference

Unload Function (Copy into)

Export query results to external storage efficiently.

PreviousChecksum FunctionsNextStruct Functions

Last updated 3 months ago

The objective of the "Unload Large Result Set to Object Storage" feature is to empower users of the Open Lakehouse Query Engine with the ability to efficiently export and store query results in object storage. The "Unload Large Result Set to Object Storage" feature supports the command within the e6data Query editor. This functionality enhances the capability of users to export large query result sets directly into object storage solutions.

Key Benefits

  • Efficient Export: Users can export large result sets efficiently, overcoming limitations on the number of records that can be exported.

  • Seamless Integration: Integration of the COPY INTO command within the query editor streamlines the export process, ensuring a seamless user experience.

  • Scalability: Facilitates the export of millions of records, catering to real-world scenarios where large datasets must be managed. The support for the COPY INTO command within the E6data Query editor allows users to export query results beyond the previous limitation of 10,000 records. This enhancement enables users to handle and export large result sets efficiently.

  • Object Store Support: Enables users to export query results into object stores like S3, providing flexibility and compatibility with popular storage solutions.

Pre-requisites

The user should have the following prerequisites before executing the COPY INTO command -

  • The user should have the role to execute the query through the e6data /.

  • The users should have the desired in the workspace.

COPY INTO Command

Unload data from a table (or query) into a single file at a specified storage location

COPY INTO '<Cloud storage path>' 
FROM (<SQL Query>) 
MAX_FILE_SIZE <File size in bytes>;
SINGLE <Boolean for Single or Multiple Files upload>

Where,

<Cloud storage path> - S3 Storage file path

-- S3 bucket
COPY INTO 's3://sample/file.csv'

<SQL Query> - SQL Query whose output results will be stored in Cloud storage

<File size in bytes> - Output File size in bytes

<Boolean for single or multiple files upload> - If set to True, multiple files are uploaded. If set to False, only a single file is uploaded. The default value is True

Here's a sample COPY INTO command

COPY INTO 's3://sample/unloads/call_center_files.csv'
FROM (SELECT * FROM call_center order by cc_call_center_sk)
MAX_FILE_SIZE 5000000000
SINGLE false;

Exporting Large Result Sets

  1. Upon receiving the query results, identify the large result set that needs to be exported.

  2. Utilize the COPY INTO command to export the result set into the desired object storage solution, such as S3.

  3. Specify the destination within the object storage where the result set should be stored.

  4. Initiate the export process.

  5. Verify the exported result set in the specified object storage location upon successful completion.

  6. For phase 1, the file format will be downloaded with below-mentioned default values -

TYPE = CSV
RECORD_DELIMITER = '\n'
FIELD_DELIMITER = ","
FILE_EXTENSION = null
SKIP_HEADER = 0
DATE_FORMAT="YYYY-MM-DD"
TIME_FORMAT="HH24:MI:SS"
TIMESTAMP_FORMAT="YYYY-MM-DD HH24:MI:SS"
ESCAPE = NONE
ESCAPE_UNENCLOSED_FIELD = "\\"
TRIM_SPACE = false
FIELD_OPTIONALLY_ENCLOSED_BY = '\"'
NULL_IF = ('\"\"')
COMPRESSION = Auto
EMPTY_FIELD_AS_NULL = true
ENCODING = UTF8
SINGLE = false

Limitations

  1. Cloud storage paths must be specified as file paths; folder paths are not supported at this time. For instance, while we accept paths like: s3://sample/unloads/call_center_files.csv paths such as s3://sample/unloads/are not supported.

  2. The only supported output file format is CSV.

  3. AWS is the only supported cloud provider.

  4. Currently, only single-file exports are supported; exporting multiple files is not available.

  5. e6data does not apply any compression codec for output CSV files.

  6. Null values are currently replaced with "" across all columns independent of column type. However, Snowflake does not enforce this rule for Integer and Decimal datatype columns.

  7. Standard date and timestamp formats supported are:

    • Date format: YYYY-MM-DD

    • Timestamp format: YYYY-MM-DD HH24:MI:SS

Execute the desired query using the e6data /

Query editor
Notebook
DataExport
query editor
notebook
COPY INTO
permission