Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): serialise table partitions to the Parquet format #4460

Draft
wants to merge 109 commits into
base: master
Choose a base branch
from

Conversation

eugenels
Copy link
Contributor

@eugenels eugenels commented Apr 30, 2024

Integration of parquet file format reading and writing into QuestDB natively.

List of tasks:

  • Write all column types into parquet format
  • Encode QuestDB nulls as parquet nulls for all the data types
  • Support parquet encodings: PLAIN for major data types, RleDictionary for SYMBOLs, DeltaBinaryPacked for TIMESTAMP
  • Decide on the support of generic compression as a must for initial delivery (yes)
  • Include column ids into written parquet file (@puzpuzpuz)
  • Storage support converting existing partitions into parquet files (column tops) @mtopolnik
  • SQL support to convert partitions to parquet format
  • Add SQL table function to query parquet files from anywhere in the local file system @ideoma
  • Integrate parquet file reading into existing PageFrame cursor factories using lazy Row Group decompression into memory buffers @puzpuzpuz
  • Read selected parquet columns using queries without filtering like SELECT col1 FROM parquet_table
  • Read parquet using queries with designated timestamp filters like SELECT col1 FROM parquet_table WHERE timestamp in '2024-05-22
  • Read parquet files with other filters like SELECT col1, col3 FROM parquet_table WHERE timestamp in '2024-05-22 AND col2 = > 1
  • Query tables with mixed parquet and non-parquet partitions
  • Support appending new rows to parquet files
  • Support O3 on partitions encoded in parquet
  • Support queries with parquet partitions with ORDER BY clause
  • Support queries with parquet partitions with GROUP BY clause
  • Support queries with parquet partitions with LATEST BY clause
  • Support other row factories with parquet storage
  • Support push-down filters to skip row groups by using parquet statistics
  • Support parquet data transfer pass trough to the query clients without decompression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants