Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Segfault while running a query that reads from compressed chunks #6925

Closed
feikesteenbergen opened this issue May 16, 2024 · 2 comments · Fixed by #6957
Closed

[Bug]: Segfault while running a query that reads from compressed chunks #6925

feikesteenbergen opened this issue May 16, 2024 · 2 comments · Fixed by #6957
Assignees
Labels

Comments

@feikesteenbergen
Copy link
Member

feikesteenbergen commented May 16, 2024

What type of bug is this?

Crash

What subsystems and features are affected?

Compression

What happened?

While running a query the database crashed with the following message:

TimescaleDB: 2024-05-16 13:53:42 UTC [79]: [664507f1.4f-27] @,app= [00000] LOG: server process (PID 539102) was terminated by signal 11: Segmentation fault

TimescaleDB version affected

2.15.0

PostgreSQL version used

16.3

What operating system did you use?

Docker based on Ubuntu x64

What installation method did you use?

Other

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

`gdb bt`

Core was generated by `postgres: s00hsv3wqq-an: grafana tsdb 172.20.91.123(29143) SELECT             '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  build_compressed_scan_pathkeys (info=0x55badfa594e0, chunk_pathkeys=0x55badfa3a7e8, root=0x55badfa159c0, sort_info=<synthetic pointer>) at /build/timescaledb/tsl/src/nodes/decompress_chunk/decompress_chunk.c:202
202				segmentby_columns =
(gdb) bt
#0  build_compressed_scan_pathkeys (info=0x55badfa594e0, chunk_pathkeys=0x55badfa3a7e8, root=0x55badfa159c0, sort_info=<synthetic pointer>) at /build/timescaledb/tsl/src/nodes/decompress_chunk/decompress_chunk.c:202
#1  create_compressed_scan_paths (sort_info=<synthetic pointer>, info=0x55badfa594e0, compressed_rel=0x55badfa538a0, root=0x55badfa159c0) at /build/timescaledb/tsl/src/nodes/decompress_chunk/decompress_chunk.c:1847
#2  ts_decompress_chunk_generate_paths (root=0x55badfa159c0, chunk_rel=0x55badfa49e08, ht=<optimized out>, chunk=0x55badfa3d658) at /build/timescaledb/tsl/src/nodes/decompress_chunk/decompress_chunk.c:772
#3  0x00007f457aff684a in apply_optimizations (ht=<optimized out>, rte=<optimized out>, rel=0x55badfa49e08, reltype=<optimized out>, root=<optimized out>) at /build/timescaledb/src/planner/planner.c:1074
#4  timescaledb_set_rel_pathlist (root=<optimized out>, rel=0x55badfa49e08, rti=<optimized out>, rte=<optimized out>) at /build/timescaledb/src/planner/planner.c:1229
#5  0x00007f457aff69df in set_rel_pathlist (rte=0x55badfa437a8, rti=7, rel=0x55badfa49e08, root=<optimized out>) at /build/timescaledb/src/import/allpaths.c:285
#6  ts_set_append_rel_pathlist (parent_rte=0x55badfa2d808, parent_rt_index=4, parent_rel=0x55badfa3a9f0, root=<optimized out>) at /build/timescaledb/src/import/allpaths.c:209
#7  reenable_inheritance (rte=<optimized out>, rti=<optimized out>, rel=<optimized out>, root=<optimized out>) at /build/timescaledb/src/planner/planner.c:1050
#8  timescaledb_set_rel_pathlist (root=<optimized out>, rel=0x55badfa3a9f0, rti=4, rte=0x55badfa2d808) at /build/timescaledb/src/planner/planner.c:1179
#9  0x000055badea385ce in set_rel_pathlist (root=0x55badfa159c0, rel=0x55badfa3a9f0, rti=4, rte=0x55badfa2d808) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:542
#10 0x000055badea386a1 in set_append_rel_pathlist (rte=0x55badfa0db08, rti=1, rel=0x55badfa39468, root=0x55badfa159c0) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:1274
#11 set_rel_pathlist (root=0x55badfa159c0, rel=0x55badfa39468, rti=1, rte=0x55badfa0db08) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:482
#12 0x000055badea423ca in set_base_rel_pathlists (root=0x55badfa159c0) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:354
#13 make_one_rel (root=0x55badfa159c0, joinlist=0x55badfa3a010) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:224
#14 0x000055badea628b7 in query_planner (root=0x55badfa159c0, qp_callback=0x55badea6a4e0 <standard_qp_callback>, qp_extra=0x7ffd14092da0) at optimizer/plan/./build/../src/backend/optimizer/plan/planmain.c:278
#15 0x000055badea6508f in grouping_planner (tuple_fraction=<optimized out>, root=<optimized out>) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:1495
#16 subquery_planner (glob=<optimized out>, parse=<optimized out>, parent_root=<optimized out>, hasRecursion=<optimized out>, tuple_fraction=<optimized out>) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:1064
#17 0x000055badea4188b in set_subquery_pathlist (rte=<optimized out>, rti=1, rel=<optimized out>, root=<optimized out>) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:2649
#18 set_rel_size (root=<optimized out>, rel=<optimized out>, rti=1, rte=<optimized out>) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:424
#19 0x000055badea422b8 in set_base_rel_sizes (root=<optimized out>) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:325
#20 make_one_rel (root=0x55badf9fa248, joinlist=0x55badfa0c268) at optimizer/path/./build/../src/backend/optimizer/path/allpaths.c:186
#21 0x000055badea628b7 in query_planner (root=0x55badf9fa248, qp_callback=0x55badea6a4e0 <standard_qp_callback>, qp_extra=0x7ffd14093130) at optimizer/plan/./build/../src/backend/optimizer/plan/planmain.c:278
#22 0x000055badea6508f in grouping_planner (tuple_fraction=<optimized out>, root=<optimized out>) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:1495
#23 subquery_planner (glob=<optimized out>, parse=<optimized out>, parent_root=<optimized out>, hasRecursion=<optimized out>, tuple_fraction=<optimized out>) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:1064
#24 0x000055badea66e71 in standard_planner (parse=0x55badf957680, query_string=<optimized out>, cursorOptions=2048, boundParams=<optimized out>) at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:413
#25 0x00007f4798f62ac9 in pgss_planner (parse=parse@entry=0x55badf957680, query_string=query_string@entry=0x55badf794180 "WITH base AS (\n    SELECT\n        queryid,\n        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,\n        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec"...,
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at ./build/../contrib/pg_stat_statements/pg_stat_statements.c:918
#26 0x00007f4798f3f0f1 in pgss_planner (parse=0x55badf957680, query_string=0x55badf794180 "WITH base AS (\n    SELECT\n        queryid,\n        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,\n        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec"..., cursorOptions=2048, boundParams=0x0)
    at /build/ts_stat_statements/ts_stat_statements.c:1010
#27 0x00007f457aff15d3 in timescaledb_planner (parse=0x55badf957680, query_string=0x55badf794180 "WITH base AS (\n    SELECT\n        queryid,\n        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,\n        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec"..., cursor_opts=2048, bound_params=0x0)
    at /build/timescaledb/src/planner/planner.c:540
#28 0x000055badeb47449 in planner (boundParams=<optimized out>, cursorOptions=<optimized out>, query_string=0x55badf794180 "WITH base AS (\n    SELECT\n        queryid,\n        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,\n        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec"..., parse=0x55badf957680)
    at optimizer/plan/./build/../src/backend/optimizer/plan/planner.c:279
#29 pg_plan_query (querytree=0x55badf957680, query_string=query_string@entry=0x55badf794180 "WITH base AS (\n    SELECT\n        queryid,\n        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,\n        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec"..., cursorOptions=cursorOptions@entry=2048,
    boundParams=boundParams@entry=0x0) at tcop/./build/../src/backend/tcop/postgres.c:902
#30 0x000055badeb4753a in pg_plan_queries (querytrees=0x55badfad9340, query_string=query_string@entry=0x55badf794180 "WITH base AS (\n    SELECT\n        queryid,\n        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,\n        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec"...,
    cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at tcop/./build/../src/backend/tcop/postgres.c:994
#31 0x000055badeb4bd95 in exec_simple_query (query_string=0x55badf794180 "WITH base AS (\n    SELECT\n        queryid,\n        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,\n        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec"...) at tcop/./build/../src/backend/tcop/postgres.c:1191
#32 0x000055badeb4ebf7 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at tcop/./build/../src/backend/tcop/postgres.c:4654
#33 0x000055badeab6d65 in BackendRun (port=0x55badf80d450, port=0x55badf80d450) at postmaster/./build/../src/backend/postmaster/postmaster.c:4464
#34 BackendStartup (port=0x55badf80d450) at postmaster/./build/../src/backend/postmaster/postmaster.c:4192
#35 ServerLoop () at postmaster/./build/../src/backend/postmaster/postmaster.c:1782
#36 0x000055badeaad310 in PostmasterMain (argc=17, argv=<optimized out>) at postmaster/./build/../src/backend/postmaster/postmaster.c:1466
#37 0x000055bade772b6d in main (argc=17, argv=0x55badf6fe640) at main/./build/../src/backend/main/main.c:198


### How can we reproduce the bug?

```bash
Talk to me ;-)
WITH base AS (
    SELECT
        queryid,
        total_plan_time - lag(total_plan_time) OVER (w) AS delta_total_plan_time,
        total_exec_time - lag(total_exec_time) OVER (w) AS delta_total_exec_time,
        calls - lag(calls) OVER (w) AS delta_calls,
        shared_blks_hit - lag(shared_blks_hit) OVER (w) AS delta_shared_blks_hit,
        shared_blks_read - lag(shared_blks_read) OVER (w) AS delta_shared_blks_read,
        shared_blks_written - lag(shared_blks_written) OVER (w) AS delta_shared_blks_written,
        blk_read_time - lag(blk_read_time) OVER (w) AS delta_blk_read_time,
        blk_write_time - lag(blk_write_time) OVER (w) AS delta_blk_write_time,
        extract('epoch' from observed - lag(observed) OVER (w)) AS delta_seconds
    FROM
        public.stat_statements
    WHERE
        db_id = 7401
        AND invalidated >= now() - interval '90 days'
        AND observed <= now()
    WINDOW
        w AS (PARTITION BY queryid ORDER BY observed)
)
SELECT
    dense_rank() OVER (
        ORDER BY (coalesce(sum(delta_total_plan_time), 0)
                  +coalesce(sum(delta_total_exec_time), 0))/sum(delta_seconds) DESC, queryid
                 ) AS rank,
    queryid::text,
    sum(delta_total_plan_time)/sum(delta_seconds) AS plan_time,
    sum(delta_total_exec_time)/sum(delta_seconds) AS exec_time,
    (coalesce(sum(delta_total_plan_time), 0)
     +coalesce(sum(delta_total_exec_time), 0))/sum(delta_seconds) AS total_time,
    sum(delta_calls)/sum(delta_seconds) AS calls,
    sum(delta_shared_blks_hit)/sum(delta_seconds)*8192.0 AS shared_bytes_hit,
    sum(delta_shared_blks_read)/sum(delta_seconds)*8192.0 AS shared_bytes_read,
    sum(delta_shared_blks_written)/sum(delta_seconds)*8192.0 AS shared_blks_written,
    sum(delta_blk_read_time)/sum(delta_seconds)*8192.0 AS blk_read_time,
    sum(delta_blk_write_time)/sum(delta_seconds)*8192.0 AS blk_write_time,
    query
FROM
    base
LEFT JOIN
    public.stat_statements_queries USING (queryid)
WHERE
    delta_calls >= 0
    AND delta_seconds >= 0
GROUP BY
    queryid,
    query
ORDER BY
    (coalesce(sum(delta_total_plan_time), 0)
     +coalesce(sum(delta_total_exec_time), 0))/sum(delta_seconds) DESC,
     queryid ASC
LIMIT
    500;
@svenklemm
Copy link
Member

svenklemm commented May 18, 2024

Boiled down query:

SELECT queryid, lag(total_exec_time) OVER (PARTITION BY queryid)
FROM public.stat_statements
WHERE db_id = 7401 AND observed >= now() - interval '1 day' LIMIT 10;

Resolving the view we get:

SELECT queryid, lag(total_exec_time) OVER (PARTITION BY queryid)
FROM (
SELECT stat_statements.db_id,
    stat_statements.queryid,
    stat_statements.observed,
    stat_statements.total_exec_time
   FROM history.stat_statements
UNION ALL
 SELECT stat_statements.db_id,
    stat_statements.queryid,
    stat_statements.observed,
    stat_statements.total_exec_time
   FROM current.stat_statements
WHERE false
) v
WHERE db_id = 7401 AND observed >= now() - interval '1 day' LIMIT 10;

@feikesteenbergen
Copy link
Member Author

Applied 2.15.1 to the database; confirmed that the segfaults no longer occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants