-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bug After a chain join, executing mutate will cause an error. #9154
Comments
Thanks for reporting @stereoF ! Could you provide some sample tables as text for us to use to help debug this? Thanks! |
┏━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┓ ┃ user_id ┃ account_id ┃ id2 ┃ ┡━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━┩ │ int64 │ string │ string │ ├─────────┼────────────┼────────┤ │ 1 │ 1 │ 1 │ │ 2 │ 2 │ 2 │ │ 3 │ 3 │ 3 │ │ 4 │ 4 │ 4 │ └─────────┴────────────┴────────┘
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/core.py:99 in __rich_console__ │ │ │ │ 96 │ │ │ │ 97 │ │ try: │ │ 98 │ │ │ if opts.interactive: │ │ ❱ 99 │ │ │ │ rich_object = to_rich(self, console_width=console_width) │ │ 100 │ │ │ else: │ │ 101 │ │ │ │ rich_object = Text(self._noninteractive_repr()) │ │ 102 │ │ except Exception as e: │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/pretty.py:271 in to_rich │ │ │ │ 268 │ │ │ expr, max_length=max_length, max_string=max_string, max_depth=max_depth │ │ 269 │ │ ) │ │ 270 │ else: │ │ ❱ 271 │ │ return _to_rich_table( │ │ 272 │ │ │ expr, │ │ 273 │ │ │ max_rows=max_rows, │ │ 274 │ │ │ max_columns=max_columns, │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/pretty.py:342 in _to_rich_table │ │ │ │ 339 │ │ if orig_ncols > len(computed_cols): │ │ 340 │ │ │ table = table.select(*computed_cols) │ │ 341 │ │ │ ❱ 342 │ result = table.limit(max_rows + 1).to_pyarrow() │ │ 343 │ # Now format the columns in order, stopping if the console width would │ │ 344 │ # be exceeded. │ │ 345 │ col_info = [] │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/core.py:486 in to_pyarrow │ │ │ │ 483 │ │ Table │ │ 484 │ │ │ A pyarrow table holding the results of the executed expression. │ │ 485 │ │ """ │ │ ❱ 486 │ │ return self._find_backend(use_default=True).to_pyarrow( │ │ 487 │ │ │ self, params=params, limit=limit, **kwargs │ │ 488 │ │ ) │ │ 489 │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ │ │ clickhouse/__init__.py:290 in to_pyarrow │ │ │ │ 287 │ │ │ external_tables=external_tables, │ │ 288 │ │ │ **kwargs, │ │ 289 │ │ ) as reader: │ │ ❱ 290 │ │ │ table = reader.read_all() │ │ 291 │ │ │ │ 292 │ │ return expr.__pyarrow_result__(table) │ │ 293 │ │ │ │ in pyarrow.lib.RecordBatchReader.read_all:757 │ │ │ │ in pyarrow.lib._datatype_to_pep3118:88 │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ │ │ clickhouse/__init__.py:357 in batcher │ │ │ │ 354 │ │ │ # readonly != 1 means that the server setting is writable │ │ 355 │ │ │ if self.con.server_settings["max_block_size"].readonly != 1: │ │ 356 │ │ │ │ settings["max_block_size"] = chunk_size │ │ ❱ 357 │ │ │ with self.con.query_column_block_stream( │ │ 358 │ │ │ │ sql, external_data=external_data, settings=settings │ │ 359 │ │ │ ) as blocks: │ │ 360 │ │ │ │ yield from map( │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/client.py:212 in query_column_block_stream │ │ │ │ 209 │ │ parameters, see the create_query_context method. │ │ 210 │ │ :return: StreamContext -- Iterable stream context that returns column oriented b │ │ 211 │ │ """ │ │ ❱ 212 │ │ return self._context_query(locals(), use_numpy=False, streaming=True).column_blo │ │ 213 │ │ │ 214 │ def query_row_block_stream(self, │ │ 215 │ │ │ │ │ │ │ query: str = None, │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/client.py:721 in _context_query │ │ │ │ 718 │ │ kwargs = lcls.copy() │ │ 719 │ │ kwargs.pop('self') │ │ 720 │ │ kwargs.update(overrides) │ │ ❱ 721 │ │ return self._query_with_context((self.create_query_context(**kwargs))) │ │ 722 │ │ │ 723 │ def __enter__(self): │ │ 724 │ │ return self │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/httpclient.py:213 in _query_with_context │ │ │ │ 210 │ │ │ body = final_query │ │ 211 │ │ │ fields = None │ │ 212 │ │ │ headers['Content-Type'] = 'text/plain; charset=utf-8' │ │ ❱ 213 │ │ response = self._raw_request(body, │ │ 214 │ │ │ │ │ │ │ │ │ params, │ │ 215 │ │ │ │ │ │ │ │ │ headers, │ │ 216 │ │ │ │ │ │ │ │ │ stream=True, │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/httpclient.py:437 in _raw_request │ │ │ │ 434 │ │ │ elif error_handler: │ │ 435 │ │ │ │ error_handler(response) │ │ 436 │ │ │ else: │ │ ❱ 437 │ │ │ │ self._error_handler(response) │ │ 438 │ │ │ 439 │ def ping(self): │ │ 440 │ │ """ │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/httpclient.py:361 in _error_handler │ │ │ │ 358 │ │ if err_content: │ │ 359 │ │ │ err_msg = common.format_error(err_content.decode(errors='backslashreplace')) │ │ 360 │ │ │ err_str = f':{err_str}\n {err_msg}' │ │ ❱ 361 │ │ raise OperationalError(err_str) if retried else DatabaseError(err_str) from None │ │ 362 │ │ │ 363 │ def _raw_request(self, │ │ 364 │ │ │ │ │ data, │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ DatabaseError: :HTTPDriver for http://cc-j6c9l15edn66o817k.public.clickhouse.ads.aliyuncs.com:8123 returned response code 404) Code: 47. DB::Exception: There's no column 't8.user_id' in table 't8': While processing t8.user_id. (UNKNOWN_IDENTIFIER) (version 22.8.5.29) |
@gforsyth Not only mutate, group or other functions are also raise this kind of error after chain join |
@gforsyth Hi, Is there a temporary solution before the fixing? |
Hi @stereoF -- it's not clear to me that there's a bug here, except that maybe we should be raising an error message? This line:
isn't going to work because |
sorry, it's a clerical error. But It raise the same error when using ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/core.py:99 in __rich_console__ │ │ │ │ 96 │ │ │ │ 97 │ │ try: │ │ 98 │ │ │ if opts.interactive: │ │ ❱ 99 │ │ │ │ rich_object = to_rich(self, console_width=console_width) │ │ 100 │ │ │ else: │ │ 101 │ │ │ │ rich_object = Text(self._noninteractive_repr()) │ │ 102 │ │ except Exception as e: │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/pretty.py:271 in to_rich │ │ │ │ 268 │ │ │ expr, max_length=max_length, max_string=max_string, max_depth=max_depth │ │ 269 │ │ ) │ │ 270 │ else: │ │ ❱ 271 │ │ return _to_rich_table( │ │ 272 │ │ │ expr, │ │ 273 │ │ │ max_rows=max_rows, │ │ 274 │ │ │ max_columns=max_columns, │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/pretty.py:342 in _to_rich_table │ │ │ │ 339 │ │ if orig_ncols > len(computed_cols): │ │ 340 │ │ │ table = table.select(*computed_cols) │ │ 341 │ │ │ ❱ 342 │ result = table.limit(max_rows + 1).to_pyarrow() │ │ 343 │ # Now format the columns in order, stopping if the console width would │ │ 344 │ # be exceeded. │ │ 345 │ col_info = [] │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type │ │ s/core.py:486 in to_pyarrow │ │ │ │ 483 │ │ Table │ │ 484 │ │ │ A pyarrow table holding the results of the executed expression. │ │ 485 │ │ """ │ │ ❱ 486 │ │ return self._find_backend(use_default=True).to_pyarrow( │ │ 487 │ │ │ self, params=params, limit=limit, **kwargs │ │ 488 │ │ ) │ │ 489 │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ │ │ clickhouse/__init__.py:290 in to_pyarrow │ │ │ │ 287 │ │ │ external_tables=external_tables, │ │ 288 │ │ │ **kwargs, │ │ 289 │ │ ) as reader: │ │ ❱ 290 │ │ │ table = reader.read_all() │ │ 291 │ │ │ │ 292 │ │ return expr.__pyarrow_result__(table) │ │ 293 │ │ │ │ in pyarrow.lib.RecordBatchReader.read_all:757 │ │ │ │ in pyarrow.lib._datatype_to_pep3118:88 │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ │ │ clickhouse/__init__.py:357 in batcher │ │ │ │ 354 │ │ │ # readonly != 1 means that the server setting is writable │ │ 355 │ │ │ if self.con.server_settings["max_block_size"].readonly != 1: │ │ 356 │ │ │ │ settings["max_block_size"] = chunk_size │ │ ❱ 357 │ │ │ with self.con.query_column_block_stream( │ │ 358 │ │ │ │ sql, external_data=external_data, settings=settings │ │ 359 │ │ │ ) as blocks: │ │ 360 │ │ │ │ yield from map( │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/client.py:212 in query_column_block_stream │ │ │ │ 209 │ │ parameters, see the create_query_context method. │ │ 210 │ │ :return: StreamContext -- Iterable stream context that returns column oriented b │ │ 211 │ │ """ │ │ ❱ 212 │ │ return self._context_query(locals(), use_numpy=False, streaming=True).column_blo │ │ 213 │ │ │ 214 │ def query_row_block_stream(self, │ │ 215 │ │ │ │ │ │ │ query: str = None, │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/client.py:721 in _context_query │ │ │ │ 718 │ │ kwargs = lcls.copy() │ │ 719 │ │ kwargs.pop('self') │ │ 720 │ │ kwargs.update(overrides) │ │ ❱ 721 │ │ return self._query_with_context((self.create_query_context(**kwargs))) │ │ 722 │ │ │ 723 │ def __enter__(self): │ │ 724 │ │ return self │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/httpclient.py:213 in _query_with_context │ │ │ │ 210 │ │ │ body = final_query │ │ 211 │ │ │ fields = None │ │ 212 │ │ │ headers['Content-Type'] = 'text/plain; charset=utf-8' │ │ ❱ 213 │ │ response = self._raw_request(body, │ │ 214 │ │ │ │ │ │ │ │ │ params, │ │ 215 │ │ │ │ │ │ │ │ │ headers, │ │ 216 │ │ │ │ │ │ │ │ │ stream=True, │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/httpclient.py:437 in _raw_request │ │ │ │ 434 │ │ │ elif error_handler: │ │ 435 │ │ │ │ error_handler(response) │ │ 436 │ │ │ else: │ │ ❱ 437 │ │ │ │ self._error_handler(response) │ │ 438 │ │ │ 439 │ def ping(self): │ │ 440 │ │ """ │ │ │ │ /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con │ │ nect/driver/httpclient.py:361 in _error_handler │ │ │ │ 358 │ │ if err_content: │ │ 359 │ │ │ err_msg = common.format_error(err_content.decode(errors='backslashreplace')) │ │ 360 │ │ │ err_str = f':{err_str}\n {err_msg}' │ │ ❱ 361 │ │ raise OperationalError(err_str) if retried else DatabaseError(err_str) from None │ │ 362 │ │ │ 363 │ def _raw_request(self, │ │ 364 │ │ │ │ │ data, │ ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯ DatabaseError: :HTTPDriver for http://cc-j6c9l15edn66o817k.public.clickhouse.ads.aliyuncs.com:8123 returned response code 404) Code: 47. DB::Exception: There's no column 't8.user_id' in table 't8': While processing t8.user_id. (UNKNOWN_IDENTIFIER) (version 22.8.5.29) |
Hey @stereoF -- sorry for the delay, we were at PyCon last week and it was pretty busy. This is the SQL we're generating for Clickhouse for adding an [ins] In [62]: expr = df.mutate(id2=df.account_id)
[ins] In [63]: ibis.to_sql(expr)
Out[63]:
SELECT
"t8"."user_id",
"t8"."account_id",
"t8"."account_id_right",
"t8"."user_id_shift_7d",
"t8"."account_id_shift_7d",
"t8"."user_id_shift_7d_right",
"t8"."account_id_shift_7d_right",
"t8"."account_id" AS "id2"
FROM (
SELECT
"t4"."user_id",
"t4"."account_id",
"t5"."account_id" AS "account_id_right",
"t6"."user_id_shift_7d",
"t6"."account_id_shift_7d",
"t7"."user_id_shift_7d" AS "user_id_shift_7d_right",
"t7"."account_id_shift_7d" AS "account_id_shift_7d_right"
FROM "ibis_pandas_memtable_ctp55e5sc5cxfkyb7moevbl6f4" AS "t4"
INNER JOIN "ibis_pandas_memtable_7eg6rbbl2rf75a5sweeerkinuu" AS "t5"
ON "t4"."user_id" = "t5"."user_id"
LEFT OUTER JOIN "ibis_pandas_memtable_etbsqsk6rrfr5njw5yzdgnpc7e" AS "t6"
ON "t4"."user_id" = "t6"."user_id_shift_7d"
LEFT OUTER JOIN "ibis_pandas_memtable_p3vq4uorpbgppkkj24ohpsb3ti" AS "t7"
ON "t4"."user_id" = "t7"."user_id_shift_7d"
) AS "t8" I'm going to have to do some reading, but I know Clickhouse has uncommon behavior around subqueries and I suspect that's the core of the issue here. |
What happened?
after chain join
DatabaseError: :HTTPDriver for http://cc-j6c9l15edn66o817k.public.clickhouse.ads.aliyuncs.com:8123/ returned response code 404)
Code: 47. DB::Exception: There's no column 't10.user_id' in table 't10': While processing t10.user_id. (UNKNOWN_IDENTIFIER) (version 22.8.5.29)
What version of ibis are you using?
9.0.0
What backend(s) are you using, if any?
clickhouse
Relevant log output
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: