After using projection, the same query statement has different values and cannot be hit. #63928

liubaochuan · 2024-05-16T02:51:53Z

Make sure to check documentation https://clickhouse.com/docs/en/ first. If the question is concise and probably has a short answer, asking it in community Slack is probably the fastest way to find the answer. For more complicated questions, consider asking them on StackOverflow with "clickhouse" tag https://stackoverflow.com/questions/tagged/clickhouse

If you still prefer GitHub issues, remove all this text and ask your question here.

clickhouse version: 23.11.1.2002

A:
sql explain:
explain PLAN select SEQ_ID from ds2.CW_SNAP_FEATURE_Face where SEQ_ID > 10000 order by SEQ_ID asc limit 1;
explain result:
Expression (Projection)
Limit (preliminary LIMIT (without OFFSET))
Sorting (Sorting for ORDER BY)
Expression (Before ORDER BY)
Filter (WHERE)
ReadFromMergeTree (ds2.CW_SNAP_FEATURE_Face)

B:
sql explain:
explain PLAN select SEQ_ID from ds2.CW_SNAP_FEATURE_Face where SEQ_ID > 1000000 order by SEQ_ID asc limit 1;
explain result:
Expression (Projection)
Limit (preliminary LIMIT (without OFFSET))
Sorting (Sorting for ORDER BY)
Filter
ReadFromMergeTree (prj_seq_id_group_id)

Projection:
ALTER TABLE ds2.CW_SNAP_FEATURE_Face
ADD PROJECTION prj_seq_id_group_id
(
SELECT *
ORDER BY SEQ_ID
);

The different values of SEQ_ID cause A to fail to hit the projection。

konsta-danyliuk · 2024-05-16T07:15:58Z

Hi @liubaochuan,
Would you mind sharing the DDL for the ds2.CW_SNAP_FEATURE_Face table?

liubaochuan · 2024-05-16T07:18:02Z

@konsta-danyliuk

CREATE TABLE ds2.CW_SNAP_FEATURE_Face
(

`SEQ_ID` Int64 DEFAULT toUnixTimestamp64Nano(now64()),

`FEATURE_ID` Int64,

`GROUP_ID_INT32` Int32,

`FEATURE` String,

`SHORT_FEATURE` String,

`IMAGE_URL` String,

`QUALITY_SCORE` String,

`QUALITY_SCORE_INT64` Int64,

`CAPTURE_TIME` Int64,

`CREATE_TIME` Int64,

`ASSOCIATED_OBJECT_ID` Int64,

`ASSOCIATED_OBJECT_IDS` String,

`LABEL` Int64,

`TRACKID` Int64,

`TRACK_BTIME` Int64,

`TRACK_ETIME` Int64,

`IS_SPARSE` Int8 DEFAULT '0',

`EXT_DATA` String,

`ATTRIBUTE` String,

INDEX CREATETIME_INDEX CREATE_TIME TYPE minmax GRANULARITY 8192,

INDEX LABEL_INDEX LABEL TYPE minmax GRANULARITY 8192,

INDEX SEQ_ID_INDEX1 SEQ_ID TYPE minmax GRANULARITY 1,

PROJECTION prj_seq_id_group_id
(
    SELECT *
    ORDER BY SEQ_ID
)

)
ENGINE = MergeTree
PRIMARY KEY (GROUP_ID_INT32,
SEQ_ID)
ORDER BY (GROUP_ID_INT32,
SEQ_ID)
SETTINGS index_granularity = 8192;

konsta-danyliuk · 2024-05-16T10:45:09Z

Hi @liubaochuan,
In general, the SQL engine uses its logic to decide which execution path to take in each case, depending on the query conditions and data. I'm also thinking whether the type of the SEQ_ID column Int64 DEFAULT toUnixTimestamp64Nano(now64()) has an impact here.

Do you see a performance difference between queries A and B?
I'm curious what would be the results of the EXPLAIN PIPELINE and EXPLAIN indexes=1 statements for both queries.

liubaochuan · 2024-05-17T07:42:23Z

@konsta-danyliuk

EXPLAIN PIPELINE Result:
A:
(Expression)
ExpressionTransform
(Limit)
Limit
(Sorting)
MergingSortedTransform 72 → 1
MergeSortingTransform × 72
LimitsCheckingTransform × 72
PartialSortingTransform × 72
(Expression)
ExpressionTransform × 72
(Filter)
FilterTransform × 72
(ReadFromMergeTree)
MergeTreeSelect(pool: ReadPool, algorithm: Thread) × 72 0 → 1

B:
(Expression)
ExpressionTransform
(Limit)
Limit
(Sorting)
MergingSortedTransform 85 → 1
(Filter)
FilterTransform × 85
(ReadFromMergeTree)
MergeTreeSelect(pool: ReadPoolInOrder, algorithm: InOrder) × 85 0 → 1

B query is 20 times faster than A query。

konsta-danyliuk · 2024-05-17T07:55:36Z

Wow! x20 sounds interesting.
It seems you only ran EXPLAIN PIPELINE, would you share EXPLAIN indexes=1results as well? I'm curious what is the difference in the data these queries read.

liubaochuan · 2024-05-17T07:59:00Z

@konsta-danyliuk

EXPLAIN indexes=1 Result:
A:
Expression (Projection)
Limit (preliminary LIMIT (without OFFSET))
Sorting (Sorting for ORDER BY)
Expression (Before ORDER BY)
Filter (WHERE)
ReadFromMergeTree (ds2.CW_SNAP_FEATURE_Face)
Indexes:
PrimaryKey
Keys:
SEQ_ID
Condition: (SEQ_ID in [10001, +Inf))
Parts: 14/14
Granules: 82901/82901
Skip
Name: SEQ_ID_INDEX1
Description: minmax GRANULARITY 1
Parts: 14/14
Granules: 82901/82901

B:
Expression (Projection)
Limit (preliminary LIMIT (without OFFSET))
Sorting (Sorting for ORDER BY)
Filter
ReadFromMergeTree (prj_seq_id_group_id)
Indexes:
PrimaryKey
Keys:
SEQ_ID
Condition: (SEQ_ID in [1000001, +Inf))
Parts: 14/14
Granules: 82782/82901

liubaochuan · 2024-05-20T09:51:06Z

@konsta-danyliuk
Is there any solution to this problem?

konsta-danyliuk · 2024-05-21T12:41:23Z

Hi @liubaochuan.

What does your data look like? Are there many records where SEQ_ID is less than 10000?
Could you also let me know how many records return both queries? I would be interested in the following part.

X rows in set. Elapsed: XXXX sec. Processed XXXXX thousand rows, XXX MB
Peak memory usage: XXXXX MiB.

liubaochuan added the question Question? label May 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

After using projection, the same query statement has different values and cannot be hit. #63928

After using projection, the same query statement has different values and cannot be hit. #63928

liubaochuan commented May 16, 2024 •

edited

konsta-danyliuk commented May 16, 2024

liubaochuan commented May 16, 2024 •

edited

konsta-danyliuk commented May 16, 2024

liubaochuan commented May 17, 2024

konsta-danyliuk commented May 17, 2024

liubaochuan commented May 17, 2024

liubaochuan commented May 20, 2024

konsta-danyliuk commented May 21, 2024

After using projection, the same query statement has different values ​​and cannot be hit. #63928

After using projection, the same query statement has different values ​​and cannot be hit. #63928

Comments

liubaochuan commented May 16, 2024 • edited

konsta-danyliuk commented May 16, 2024

liubaochuan commented May 16, 2024 • edited

konsta-danyliuk commented May 16, 2024

liubaochuan commented May 17, 2024

konsta-danyliuk commented May 17, 2024

liubaochuan commented May 17, 2024

liubaochuan commented May 20, 2024

konsta-danyliuk commented May 21, 2024

After using projection, the same query statement has different values and cannot be hit. #63928

After using projection, the same query statement has different values and cannot be hit. #63928

liubaochuan commented May 16, 2024 •

edited

liubaochuan commented May 16, 2024 •

edited