9th January 2024

We're excited to announce the latest enhancements to e6data platform. We are introducing support for several new functionalities, bug fixes and performance optimisations.

Catalog

Catalog Refresh at the schema level

The catalog refresh functionality has been enhanced to enable users to refresh and update data in the catalog at a schema level. Users can now choose and update any preexisting database. For more information, please refer to the documentation.

Enhanced Error Handling

Download Log file

This feature allows you to download the log file during any catalog failure after an operation. The log file helps identify the cause of the failure. A specific section called Failures is present at the bottom of the log which describes the error for failures. For more information, please refer to the documentation.

Retry option during catalog failure

This improvement allows users to retry catalog operations in case of any failure. Please note that this option is accessible only if the catalog encounters issues after the creation process. For more information, please refer to the documentation.

Catalog operations speed up through optimization

Users have expressed worry about catalog load and refresh times. They have longer than anticipated. This has caused user experience issues. We have improved and upgraded our catalog reader to ensure that the time required for any catalog operations like create, update, or refresh time should significantly decrease.

Cluster

Optimizing Workload Management with Load-Based Sizing

This feature empowers you to fine-tune your cluster size for optimal workload management. With the Load-Based Sizing functionality, you can customize your cluster size to efficiently handle your workload. This enhancement ensures that your cluster operates at peak efficiency, aligning with your specific workload requirements. For more information, please refer the documentation.

Cluster Monitoring

Explore real-time query statistics, monitor the cluster's status, and delve into comprehensive cluster metrics with this robust feature. This enhanced Monitoring Section provides a comprehensive view of your cluster's health and performance, empowering you with the tools to make informed decisions and optimize your cluster operations. For more information, please refer the documentation.

Ability to add new users/tokens without needing a cluster restart

Users previously needed to restart a cluster when a new user was added to the system or when a personal access token was added. We have improved our authentication mechanisms to ensure that new users and tokens will be able to query and access the active clusters within a max of 30s after the addition of the user to the system.

Schema explorer

Schema Reload Option

This functionality enables users to reload the schema for viewing in the schema explorer. It proves useful after refreshing the catalog, allowing users to see the updated schema in the schema explorer and reference it while composing queries. For more information, please refer the documentation.

Query Editor result pane

Export Result CSV

This feature allows users to download query results in CSV format seamlessly, providing a convenient and versatile way to work with and visualize their data. For more information, please refer to the documentation.

Access Control

Query History View Role

With the QueryHistoryView role, users can now access and view the complete query history, offering a comprehensive overview of query activities on the platform. Users holding the QueryHistoryView role can download the entire query history without restrictions. This feature ensures unrestricted access to historical data for thorough analysis. The QueryHistoryView role allows users to efficiently sift through the extensive query history and find the information they need. For more information. please refer the documentation.

Query optimizations & improvements

  • In-memory caching optimizations While explicitly caching data in executors, we have seen a few issues faced by users. We have now revamped our explicit caching methodology to better utilize executor memory. This new methodology not only ensures that there are lesser failures during caching process but also improves performance of the queries run on the cached data.

  • Pruning of non-partition columns and optimizations We have improved our pruning ability with the refactoring of our pruning methodology. We have also introduced the ability to prune non-partition columns which will help in faster data skipping. Thus, resulting in faster query execution times.

  • Introducing CTE reuse When multiple CTEs are present in a query, the planner now optimizes their execution, promoting reuse across the query. It reduces redundant computation, leading to quicker query execution times.

  • We have fixed a series of bugs to reduce query failures during heavy load on systems. Major bug fixes as stated below

    • When running a large set of concurrent queries (>100QPS), queries used to fail due to socket timeout issues. There was a bug in our communication layer between internal components that was causing this issue. It has now been fixed

    • When running a large set of concurrent queries (>100QPS), queries would fail with a Planner IP mismatch error. There was a bug with how we were mapping the planner to query requests via our network router. This is now fixed

    • We have identified and fixed a few more bugs that were causing query failures

New SQL Functions support

Announcing support for seven new functions

  • GENERATE_TIMESTAMP_ARRAY:

    • Generates an array of timestamps based on specified parameters, allowing you to easily create time series data for analysis.

  • GENERATE_DATE_ARRAY:

    • Similar to GENERATE_TIMESTAMP_ARRAY, this function generates an array of dates, simplifying date-related operations in your queries.

  • REGEXP_COUNT:

    • Counts the number of occurrences of a specified regular expression pattern within a given string, enhancing your ability to perform complex string manipulations.

  • REGEXP_LIKE:

    • Enables pattern matching using regular expressions within your queries, providing a more versatile way to filter and analyze textual data.

  • EXCEPT:

    • Introduces the EXCEPT statement, allowing you to specify the names of one or more columns to exclude from the result. All matching column names are omitted from the output. This operation is particularly useful for tailoring your query results by excluding specific columns.

  • REGEXP_EXTRACT_ALL

    • Extracts all occurrences of a regular expression pattern from a string, providing a comprehensive view of all matching substrings.

  • REGEXP_CONTAINS

    • Checks if a string contains a specified regular expression pattern, offering a flexible and efficient way to identify pattern presence.

  • SUBSCRIPT OPERATOR - alias offset/safe offset:

    • Retrieves an element from an array at a specified index or a default value if the index is out of bounds, ensuring safe and controlled access to array elements.

  • ELEMENT_AT

    • Retrieves an element from an array at a specified index or a default value if the index is out of bounds, providing a convenient way to access array elements.

  • SIZE

    • Returns the number of elements in an array, offering a quick and straightforward way to determine the size of arrays.

  • LOCATE

    • Returns the position of the first occurrence of a substring in a string, simplifying substring search operations.

  • CONTAINS_SUBSTR

    • Checks if a string contains a specified substring, providing a straightforward way to determine the presence of substrings in a string.

  • INSTR

    • Returns the position of the first occurrence of a substring in a string, enhancing substring search capabilities.

  • SOUNDEX

    • Computes a phonetic representation of a string, enabling fuzzy string matching based on similar sounds.

  • PARSE_DATE

    • Parses a string into a date, ensuring accurate conversion of date-related information in your queries.

  • PARSE_DATETIME

    • Parses a string into a datetime, facilitating precise conversion of date and time information in your queries.

  • PARSE_TIMESTAMP

    • Parses a string into a timestamp, providing a reliable way to convert string representations into timestamp data for analysis.

Pip Package versions

New pip package versions have been updated to support the latest build versions. The latest versions are 2.1.7. For more information, please refer the documentation.

JDBC Driver versions

New JDCB Driver versions have been updated. Download the latest version of E6Data JDBC driver => https://e6-jdbc-repository.s3.amazonaws.com/e6data-jdbc-1.1.24.jar . For more information, please refer the documentation.

Please note that the latest versions of the Python and JDBC clients is required for interacting with the latest version of the engine and platform.

Last updated

#930: Cross account hive GCP

Change request updated