-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expanding nested star selects gives an error in Bigquery syntax #3484
Comments
Also consider applicability to nested structs SELECT
user.id,
struct(
user.id,
user.name,
user.email,
user.title,
user.harness_department_c,
user.role_level_c,
user.current_role_start_date_c,
user.geo_region_c,
user.geo_subregion_c,
user.sales_segment_c,
struct(manager.id, manager.name, manager.email) AS manager,
struct(se.id, se.name, se.email) AS se,
struct(tdr.id, tdr.name, tdr.email) AS tdr,
struct(tds.id, tds.name, tds.email) AS tds
)::STRUCT<id STRING, name STRING, email STRING, title STRING, harness_department_c STRING, role_level_c STRING, current_role_start_date_c DATE, geo_region_c STRING, geo_subregion_c STRING, sales_segment_c STRING, manager STRUCT<id STRING, name STRING, email STRING>, se STRUCT<id STRING, name STRING, email STRING>, tdr STRUCT<id STRING, name STRING, email STRING>, tds STRUCT<id STRING, name STRING, email STRING>> AS emp
FROM salesforce_v4.user
LEFT JOIN salesforce_v4.user AS manager
ON user.manager_id = manager.id
LEFT JOIN salesforce_v4.user AS se
ON user.se_c = se.id
LEFT JOIN salesforce_v4.user AS tdr
ON user.tdr_c = tdr.id
LEFT JOIN salesforce_v4.user AS tds
ON user.tds_c = tds.id
|
Interesting, so it completely flattens structs. FYI @VaggelisD, an interesting edge case here is when two struct fields on different levels have the same names. BigQuery will give them names such as |
There are some quirky edge cases for sure, and they're not all documented properly unfortunately (from a quick look). For the example you mentioned: select
a, b, c.* from
(select
5 as a,
6 as b,
(select as struct
7 as a,
8 as b
) as c
) gives an unnested table with fields whereas selecting as a struct select as struct
a, b, c.* from
(select
5 as a,
6 as b,
(select as struct
7 as a,
8 as b
) as c
) leads to an unnested table with field names |
ok I see, I'm not sure if it's worth preserving all of that then. Perhaps our usual |
I agree, fully replicating the logic won't be necessary as with proper aliasing these edge cases will not occur |
Yep 👍. FYI this is a WIP on our end, should have a fix soon. |
Thanks for picking this up so quickly, I really appreciate it! |
Currently trying to unnest a struct with * causes an warning in Another consideration that causes a schema warning in sqlmesh disabling optimization:
So if CREATE TABLE activity (
id string,
user_id string,
event string,
ts timestamp
); The referencing it directly, the output type is |
So I think the implicit unnesting is probably trickier, and my hunch is that Vaggelis' PR won't solve it. Though I'd double check to be sure.
I think so! Can you share a small example to make sure I understood this correctly? |
-- Given this is your upstream model called `user`
SELECT
user.id,
struct(
user.id,
user.name,
user.email,
user.title,
user.harness_department_c,
user.role_level_c,
user.current_role_start_date_c,
user.geo_region_c,
user.geo_subregion_c,
user.sales_segment_c,
struct(manager.id, manager.name, manager.email) AS manager,
struct(se.id, se.name, se.email) AS se,
struct(tdr.id, tdr.name, tdr.email) AS tdr,
struct(tds.id, tds.name, tds.email) AS tds
)::STRUCT<id STRING, name STRING, email STRING, title STRING, harness_department_c STRING, role_level_c STRING, current_role_start_date_c DATE, geo_region_c STRING, geo_subregion_c STRING, sales_segment_c STRING, manager STRUCT<id STRING, name STRING, email STRING>, se STRUCT<id STRING, name STRING, email STRING>, tdr STRUCT<id STRING, name STRING, email STRING>, tds STRUCT<id STRING, name STRING, email STRING>> AS emp
FROM salesforce_v4.user
LEFT JOIN salesforce_v4.user AS manager
ON user.manager_id = manager.id
LEFT JOIN salesforce_v4.user AS se
ON user.se_c = se.id
LEFT JOIN salesforce_v4.user AS tdr
ON user.tdr_c = tdr.id
LEFT JOIN salesforce_v4.user AS tds
ON user.tds_c = tds.id;
-- This will cause the optimizer to fail / warn, even if upstream schema was known fully
SELECT emp.* FROM user;
-- Ditto
SELECT emp.manager.* FROM user; You can repro it with any struct |
Yep, these should work after we fix this issue :) |
Before you file an issue
parse_one(sql, read="spark")
: bigquery dialectast.sql(dialect="duckdb")
: Not applicableFully reproducible code snippet
The qualify function throws an error:
sqlglot.errors.OptimizeError: Unknown table: bar
Official Documentation
Bigquery docs
The text was updated successfully, but these errors were encountered: