Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug After a chain join, executing mutate will cause an error. #9154

Open
1 task done
stereoF opened this issue May 8, 2024 · 7 comments
Open
1 task done

bug After a chain join, executing mutate will cause an error. #9154

stereoF opened this issue May 8, 2024 · 7 comments
Labels
bug Incorrect behavior inside of ibis clickhouse The ClickHouse backend

Comments

@stereoF
Copy link

stereoF commented May 8, 2024

What happened?

image
image
image
image

after chain join

df = df_user.join(df_portrait, df_user.user_id == df_portrait.user_id, how='inner') \
            .join(df_user_shift_7d, _.user_id == df_user_shift_7d.user_id_shift_7d, how='left') \
            .join(df_portrait_shift_7d, _.user_id == df_portrait_shift_7d.user_id_shift_7d, how='left') 

image
image
image

DatabaseError: :HTTPDriver for http://cc-j6c9l15edn66o817k.public.clickhouse.ads.aliyuncs.com:8123/ returned response code 404)
Code: 47. DB::Exception: There's no column 't10.user_id' in table 't10': While processing t10.user_id. (UNKNOWN_IDENTIFIER) (version 22.8.5.29)

What version of ibis are you using?

9.0.0

What backend(s) are you using, if any?

clickhouse

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@stereoF stereoF added the bug Incorrect behavior inside of ibis label May 8, 2024
@gforsyth
Copy link
Member

gforsyth commented May 8, 2024

Thanks for reporting @stereoF !

Could you provide some sample tables as text for us to use to help debug this?
And if possible, if you can paste the tracebacks, etc, as text, it's much easier for us to read and dig into.

Thanks!

@stereoF
Copy link
Author

stereoF commented May 9, 2024

@gforsyth

df_user = ibis.memtable({"user_id": [1, 2, 3, 4], "account_id": ['1', '2', '3', '4']})
df_portrait = ibis.memtable({"user_id": [1, 2, 3, 4], "account_id": ['1', '2', '3', '4']})
df_user_shift_7d = ibis.memtable({"user_id_shift_7d": [1, 2, 3, 4], "account_id_shift_7d": ['1', '2', '3', '4']})
df_portrait_shift_7d = ibis.memtable({"user_id_shift_7d": [1, 2, 3, 4], "account_id_shift_7d": ['1', '2', '3', '4']})

df = df_user.join(df_portrait, df_user.user_id == df_portrait.user_id, how='inner') \
            .join(df_user_shift_7d, _.user_id == df_user_shift_7d.user_id_shift_7d, how='left') \
            .join(df_portrait_shift_7d, _.user_id == df_portrait_shift_7d.user_id_shift_7d, how='left') 

df_user.mutate(id2=df_user.account_id).head()

┏━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━┓
┃ user_id  account_id  id2    ┃
┡━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━┩
│ int64stringstring │
├─────────┼────────────┼────────┤
│       11         1      │
│       22         2      │
│       33         3      │
│       44         4      │
└─────────┴────────────┴────────┘

df.mutate(id2=df_user.account_id).head()

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/core.py:99 in __rich_console__                                                                 
                                                                                                  
    96 │   │                                                                                      
    97 │   │   try:                                                                               
    98 │   │   │   if opts.interactive:                                                           
  99 │   │   │   │   rich_object = to_rich(self, console_width=console_width)                   
   100 │   │   │   else:                                                                          
   101 │   │   │   │   rich_object = Text(self._noninteractive_repr())                            
   102 │   │   except Exception as e:                                                             
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/pretty.py:271 in to_rich                                                                       
                                                                                                  
   268 │   │   │   expr, max_length=max_length, max_string=max_string, max_depth=max_depth        
   269 │   │   )                                                                                  
   270 else:                                                                                  
 271 │   │   return _to_rich_table(                                                             
   272 │   │   │   expr,                                                                          
   273 │   │   │   max_rows=max_rows,                                                             
   274 │   │   │   max_columns=max_columns,                                                       
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/pretty.py:342 in _to_rich_table                                                                
                                                                                                  
   339 │   │   if orig_ncols > len(computed_cols):                                                
   340 │   │   │   table = table.select(*computed_cols)                                           
   341                                                                                        
 342 result = table.limit(max_rows + 1).to_pyarrow()                                        
   343 # Now format the columns in order, stopping if the console width would                 
   344 # be exceeded.                                                                         
   345 col_info = []                                                                          
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/core.py:486 in to_pyarrow                                                                      
                                                                                                  
   483 │   │   Table                                                                              
   484 │   │   │   A pyarrow table holding the results of the executed expression.                
   485 │   │   """                                                                                
 486 │   │   return self._find_backend(use_default=True).to_pyarrow(                            
   487 │   │   │   self, params=params, limit=limit, **kwargs                                     
   488 │   │   )                                                                                  
   489                                                                                            
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ 
 clickhouse/__init__.py:290 in to_pyarrow                                                         
                                                                                                  
   287 │   │   │   external_tables=external_tables,                                               
   288 │   │   │   **kwargs,                                                                      
   289 │   │   ) as reader:                                                                       
 290 │   │   │   table = reader.read_all()                                                      
   291 │   │                                                                                      
   292 │   │   return expr.__pyarrow_result__(table)                                              
   293                                                                                            
                                                                                                  
 in pyarrow.lib.RecordBatchReader.read_all:757                                                    
                                                                                                  
 in pyarrow.lib._datatype_to_pep3118:88                                                           
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ 
 clickhouse/__init__.py:357 in batcher                                                            
                                                                                                  
   354 │   │   │   # readonly != 1 means that the server setting is writable                      
   355 │   │   │   if self.con.server_settings["max_block_size"].readonly != 1:                   
   356 │   │   │   │   settings["max_block_size"] = chunk_size                                    
 357 │   │   │   with self.con.query_column_block_stream(                                       
   358 │   │   │   │   sql, external_data=external_data, settings=settings                        
   359 │   │   │   ) as blocks:                                                                   
   360 │   │   │   │   yield from map(                                                            
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/client.py:212 in query_column_block_stream                                           
                                                                                                  
   209 │   │   parameters, see the create_query_context method.                                   
   210 │   │   :return: StreamContext -- Iterable stream context that returns column oriented b   
   211 │   │   """                                                                                
 212 │   │   return self._context_query(locals(), use_numpy=False, streaming=True).column_blo   
   213                                                                                        
   214 def query_row_block_stream(self,                                                       
   215 │   │   │   │   │   │   │      query: str = None,                                          
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/client.py:721 in _context_query                                                      
                                                                                                  
   718 │   │   kwargs = lcls.copy()                                                               
   719 │   │   kwargs.pop('self')                                                                 
   720 │   │   kwargs.update(overrides)                                                           
 721 │   │   return self._query_with_context((self.create_query_context(**kwargs)))             
   722                                                                                        
   723 def __enter__(self):                                                                   
   724 │   │   return self                                                                        
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/httpclient.py:213 in _query_with_context                                             
                                                                                                  
   210 │   │   │   body = final_query                                                             
   211 │   │   │   fields = None                                                                  
   212 │   │   │   headers['Content-Type'] = 'text/plain; charset=utf-8'                          
 213 │   │   response = self._raw_request(body,                                                 
   214 │   │   │   │   │   │   │   │   │    params,                                               
   215 │   │   │   │   │   │   │   │   │    headers,                                              
   216 │   │   │   │   │   │   │   │   │    stream=True,                                          
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/httpclient.py:437 in _raw_request                                                    
                                                                                                  
   434 │   │   │   elif error_handler:                                                            
   435 │   │   │   │   error_handler(response)                                                    
   436 │   │   │   else:                                                                          
 437 │   │   │   │   self._error_handler(response)                                              
   438                                                                                        
   439 def ping(self):                                                                        
   440 │   │   """                                                                                
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/httpclient.py:361 in _error_handler                                                  
                                                                                                  
   358 │   │   if err_content:                                                                    
   359 │   │   │   err_msg = common.format_error(err_content.decode(errors='backslashreplace'))   
   360 │   │   │   err_str = f':{err_str}\n {err_msg}'                                            
 361 │   │   raise OperationalError(err_str) if retried else DatabaseError(err_str) from None   
   362                                                                                        
   363 def _raw_request(self,                                                                 
   364 │   │   │   │   │    data,                                                                 
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
DatabaseError: :HTTPDriver for http://cc-j6c9l15edn66o817k.public.clickhouse.ads.aliyuncs.com:8123 returned 
response code 404)
 Code: 47. DB::Exception: There's no column 't8.user_id' in table 't8': While processing t8.user_id. 
(UNKNOWN_IDENTIFIER) (version 22.8.5.29)

@stereoF
Copy link
Author

stereoF commented May 11, 2024

@gforsyth Not only mutate, group or other functions are also raise this kind of error after chain join

@stereoF
Copy link
Author

stereoF commented May 11, 2024

@gforsyth Hi, Is there a temporary solution before the fixing?

@gforsyth
Copy link
Member

Hi @stereoF -- it's not clear to me that there's a bug here, except that maybe we should be raising an error message?

This line:

df.mutate(id2=df_user.account_id).head()

isn't going to work because df and df_user are different tables.

@stereoF
Copy link
Author

stereoF commented May 14, 2024

Hi @stereoF -- it's not clear to me that there's a bug here, except that maybe we should be raising an error message?

This line:

df.mutate(id2=df_user.account_id).head()

isn't going to work because df and df_user are different tables.

sorry, it's a clerical error. But It raise the same error when using
df.mutate(id2=df.account_id).head()

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/core.py:99 in __rich_console__                                                                 
                                                                                                  
    96 │   │                                                                                      
    97 │   │   try:                                                                               
    98 │   │   │   if opts.interactive:                                                           
  99 │   │   │   │   rich_object = to_rich(self, console_width=console_width)                   
   100 │   │   │   else:                                                                          
   101 │   │   │   │   rich_object = Text(self._noninteractive_repr())                            
   102 │   │   except Exception as e:                                                             
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/pretty.py:271 in to_rich                                                                       
                                                                                                  
   268 │   │   │   expr, max_length=max_length, max_string=max_string, max_depth=max_depth        
   269 │   │   )                                                                                  
   270 else:                                                                                  
 271 │   │   return _to_rich_table(                                                             
   272 │   │   │   expr,                                                                          
   273 │   │   │   max_rows=max_rows,                                                             
   274 │   │   │   max_columns=max_columns,                                                       
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/pretty.py:342 in _to_rich_table                                                                
                                                                                                  
   339 │   │   if orig_ncols > len(computed_cols):                                                
   340 │   │   │   table = table.select(*computed_cols)                                           
   341                                                                                        
 342 result = table.limit(max_rows + 1).to_pyarrow()                                        
   343 # Now format the columns in order, stopping if the console width would                 
   344 # be exceeded.                                                                         
   345 col_info = []                                                                          
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/expr/type 
 s/core.py:486 in to_pyarrow                                                                      
                                                                                                  
   483 │   │   Table                                                                              
   484 │   │   │   A pyarrow table holding the results of the executed expression.                
   485 │   │   """                                                                                
 486 │   │   return self._find_backend(use_default=True).to_pyarrow(                            
   487 │   │   │   self, params=params, limit=limit, **kwargs                                     
   488 │   │   )                                                                                  
   489                                                                                            
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ 
 clickhouse/__init__.py:290 in to_pyarrow                                                         
                                                                                                  
   287 │   │   │   external_tables=external_tables,                                               
   288 │   │   │   **kwargs,                                                                      
   289 │   │   ) as reader:                                                                       
 290 │   │   │   table = reader.read_all()                                                      
   291 │   │                                                                                      
   292 │   │   return expr.__pyarrow_result__(table)                                              
   293                                                                                            
                                                                                                  
 in pyarrow.lib.RecordBatchReader.read_all:757                                                    
                                                                                                  
 in pyarrow.lib._datatype_to_pep3118:88                                                           
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/ibis/backends/ 
 clickhouse/__init__.py:357 in batcher                                                            
                                                                                                  
   354 │   │   │   # readonly != 1 means that the server setting is writable                      
   355 │   │   │   if self.con.server_settings["max_block_size"].readonly != 1:                   
   356 │   │   │   │   settings["max_block_size"] = chunk_size                                    
 357 │   │   │   with self.con.query_column_block_stream(                                       
   358 │   │   │   │   sql, external_data=external_data, settings=settings                        
   359 │   │   │   ) as blocks:                                                                   
   360 │   │   │   │   yield from map(                                                            
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/client.py:212 in query_column_block_stream                                           
                                                                                                  
   209 │   │   parameters, see the create_query_context method.                                   
   210 │   │   :return: StreamContext -- Iterable stream context that returns column oriented b   
   211 │   │   """                                                                                
 212 │   │   return self._context_query(locals(), use_numpy=False, streaming=True).column_blo   
   213                                                                                        
   214 def query_row_block_stream(self,                                                       
   215 │   │   │   │   │   │   │      query: str = None,                                          
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/client.py:721 in _context_query                                                      
                                                                                                  
   718 │   │   kwargs = lcls.copy()                                                               
   719 │   │   kwargs.pop('self')                                                                 
   720 │   │   kwargs.update(overrides)                                                           
 721 │   │   return self._query_with_context((self.create_query_context(**kwargs)))             
   722                                                                                        
   723 def __enter__(self):                                                                   
   724 │   │   return self                                                                        
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/httpclient.py:213 in _query_with_context                                             
                                                                                                  
   210 │   │   │   body = final_query                                                             
   211 │   │   │   fields = None                                                                  
   212 │   │   │   headers['Content-Type'] = 'text/plain; charset=utf-8'                          
 213 │   │   response = self._raw_request(body,                                                 
   214 │   │   │   │   │   │   │   │   │    params,                                               
   215 │   │   │   │   │   │   │   │   │    headers,                                              
   216 │   │   │   │   │   │   │   │   │    stream=True,                                          
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/httpclient.py:437 in _raw_request                                                    
                                                                                                  
   434 │   │   │   elif error_handler:                                                            
   435 │   │   │   │   error_handler(response)                                                    
   436 │   │   │   else:                                                                          
 437 │   │   │   │   self._error_handler(response)                                              
   438                                                                                        
   439 def ping(self):                                                                        
   440 │   │   """                                                                                
                                                                                                  
 /home/zhongminhu/.pyenv/versions/3.11.7/envs/venv311/lib/python3.11/site-packages/clickhouse_con 
 nect/driver/httpclient.py:361 in _error_handler                                                  
                                                                                                  
   358 │   │   if err_content:                                                                    
   359 │   │   │   err_msg = common.format_error(err_content.decode(errors='backslashreplace'))   
   360 │   │   │   err_str = f':{err_str}\n {err_msg}'                                            
 361 │   │   raise OperationalError(err_str) if retried else DatabaseError(err_str) from None   
   362                                                                                        
   363 def _raw_request(self,                                                                 
   364 │   │   │   │   │    data,                                                                 
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
DatabaseError: :HTTPDriver for http://cc-j6c9l15edn66o817k.public.clickhouse.ads.aliyuncs.com:8123 returned 
response code 404)
 Code: 47. DB::Exception: There's no column 't8.user_id' in table 't8': While processing t8.user_id. 
(UNKNOWN_IDENTIFIER) (version 22.8.5.29)

@gforsyth
Copy link
Member

Hey @stereoF -- sorry for the delay, we were at PyCon last week and it was pretty busy.

This is the SQL we're generating for Clickhouse for adding an id2 column:

[ins] In [62]: expr = df.mutate(id2=df.account_id)

[ins] In [63]: ibis.to_sql(expr)
Out[63]: 
SELECT
  "t8"."user_id",
  "t8"."account_id",
  "t8"."account_id_right",
  "t8"."user_id_shift_7d",
  "t8"."account_id_shift_7d",
  "t8"."user_id_shift_7d_right",
  "t8"."account_id_shift_7d_right",
  "t8"."account_id" AS "id2"
FROM (
  SELECT
    "t4"."user_id",
    "t4"."account_id",
    "t5"."account_id" AS "account_id_right",
    "t6"."user_id_shift_7d",
    "t6"."account_id_shift_7d",
    "t7"."user_id_shift_7d" AS "user_id_shift_7d_right",
    "t7"."account_id_shift_7d" AS "account_id_shift_7d_right"
  FROM "ibis_pandas_memtable_ctp55e5sc5cxfkyb7moevbl6f4" AS "t4"
  INNER JOIN "ibis_pandas_memtable_7eg6rbbl2rf75a5sweeerkinuu" AS "t5"
    ON "t4"."user_id" = "t5"."user_id"
  LEFT OUTER JOIN "ibis_pandas_memtable_etbsqsk6rrfr5njw5yzdgnpc7e" AS "t6"
    ON "t4"."user_id" = "t6"."user_id_shift_7d"
  LEFT OUTER JOIN "ibis_pandas_memtable_p3vq4uorpbgppkkj24ohpsb3ti" AS "t7"
    ON "t4"."user_id" = "t7"."user_id_shift_7d"
) AS "t8"

I'm going to have to do some reading, but I know Clickhouse has uncommon behavior around subqueries and I suspect that's the core of the issue here.

@gforsyth gforsyth added the clickhouse The ClickHouse backend label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Incorrect behavior inside of ibis clickhouse The ClickHouse backend
Projects
Status: backlog
Development

No branches or pull requests

2 participants