9th December 2023

Summary

The 9th December 2023 release of e6data includes the following features & enhancements:

Plugin Enhancements

Performance Improvements in the e6data Python Connector

Version 2.0.0 of the e6data Python Connector has been released. The new version provides a significant boost in query performance.

Platform Enhancements

Performance Improvements in the Query Editor

The latest Python connector has been implemented in the Query Editor which has improved query performance.

Engine Enhancements

Added Support for Deletion Vectors

The e6data engine now supports reading deletion vectors for enhanced storage optimization in Delta Lake tables. Deletion vectors allow DELETE operations to mark existing rows as removed without rewriting the entire Parquet file, improving performance and efficiency.

Limitations

  • Querying deleted rows with an IN clause may result in an error.

    • Example: SELECT * FROM "table_with_del_vector" WHERE id IN ('ABCXYZ');

  • While using deletion vectors performance testing was conducted for up to 10 million deleted records. For a higher number of deleted records, performance may degrade from a few milliseconds up to 1 second. Users may experience slower query execution times when utilizing deletion vectors. Further optimizations are in the pipeline to resolve this.

Introduced DROP CACHE Command

Introduced the DROP CACHE command, which enables users to clear all files cached in-memory by the engine using the specified query.

Syntax

DROP CACHE <select * from table query>

Limitations

  • Only full/complete table cache drops are currently supported.

  • In certain scenarios, the drop cache operation may not be successful. Repeated execution of DROP CACHE can lead to a state where subsequent cache commands fail. A cluster restart may be required to mitigate this situation.

  • The current syntax does not gracefully handle scenarios where the DROP CACHE command is issued on a table that doesn't exist. The error message incorrectly states "unsupported syntax," which may be misleading. Future updates will enhance error messaging for non-existent tables in the DROP CACHE command.

New SQL Functions Supported

The following functions have been added to enhance query capabilities:

  • WEEK: Extract the week component from a given date or timestamp.

  • CONTAINS_SUBSTR: Checks if a string contains a specified substring and returns a boolean value.

  • INSTR: Returns the position of the first occurrence of a substring within a string.

  • LEFT: Returns a specified number of characters from the beginning (left) of a string.

  • REGEXP_CONTAINS: Checks if a string matches a regular expression pattern and returns a boolean value.

  • REGEXP_REPLACE: Replaces substrings matching a regular expression pattern with a specified replacement string.

  • SOUNDEX: Returns a phonetic representation (SOUNDEX code) of a string, which can be used to identify similar-sounding words.

Last updated

#930: Cross account hive GCP

Change request updated