-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature] Support Semi-Structured Data Columns in Unit Test dict fixture data #10161
Comments
Hi! I believe we do support json structured types in unit test definitions using You'll need to use quotes though like so:
|
Hey @graciegoheen , thanks for getting back to me. I've given your suggestion a try and I'm still running into difficulty. When I have this input:
it generates this SQL for the fixture data:
It seems the straight casting of the string to super isn't decomposing the data into structured values. However, if I switch the SQL to use
Is there a setting I'm missing for the Redshift adapter perhaps to make this work? Any other suggestions? Thanks! |
Hey @ronco -- @graciegoheen and I discussed this a little bit yesterday. You are spot-on with the following:
This is because dbt unit testing doesn't do any unnesting / decomposition of the In order to do a unit test for one of the nested values, then you'll need to supply expected values for all of the nested values as well. See below for a full example. If this doesn't work for you, could you share an example of your YAML for your unit test and your SQL models and we'll take another look? More detailIt looks like the key insight in dbt-redshift is that:
This is similar to a caveat in dbt-bigquery:
Example
select
1 as int_field,
2.0 as float_field,
1 as numeric_field,
'my_string' as str_field,
'my,cool''string' as str_escaped_field,
true as bool_field,
cast('2020-01-02' as date) as date_field,
cast('2013-11-03 00:00:00-0' as timestamp without time zone) as timestamp_field,
cast('2013-11-03 00:00:00-0' as timestamp with time zone) as timestamptz_field,
cast(json_parse('{"bar": "baz", "balance": 7.77, "active": false}') as super) as json_field
select * from {{ ref("stg_data_types") }} unit_tests:
- name: test_my_data_types
model: fct_data_types
given:
- input: ref('stg_data_types')
rows:
- int_field: 1
float_field: 2.0
numeric_field: 1
str_field: my_string
str_escaped_field: "my,cool'string"
bool_field: true
date_field: 2020-01-02
timestamp_field: 2013-11-03 00:00:00-0
timestamptz_field: 2013-11-03 00:00:00-0
json_field: '{"bar": "baz", "balance": 7.77, "active": false}'
expect:
rows:
- int_field: 1
float_field: 2.0
numeric_field: 1
str_field: my_string
str_escaped_field: "my,cool'string"
bool_field: true
date_field: 2020-01-02
timestamp_field: 2013-11-03 00:00:00-0
timestamptz_field: 2013-11-03 00:00:00-0
json_field: '{"bar": "baz", "balance": 7.77, "active": false}' Note how the last line in the unit test includes |
Hey @dbeatty10 , thanks for all this guidance. I am a little confused on how to pull this off. We're using our SUPER field for flexible data storage. In other words, there is no underlying defined STRUCT, the content varies from row to row. So I'm not sure what values I would need to add. As requested, here are sample files illustrating the issue:
and
and finally
When I run that test I get the following output:
I think the problem is the cast is treating the json_field as a plain string, rather than a structured field I can extract data from. Let me know if there's anything else I can provide to get to the bottom of this. Thanks! |
Any further thoughts here? Thanks! |
Is this your first time submitting a feature request?
Describe the feature
When writing unit tests that include semi-structured data columns (SUPER in Redshift for example), the fixture data can only be staged via SQL. This feature calls for understanding structured data in dict fixtures.
An example fixture definition would be something like this:
Describe alternatives you've considered
This functionality can be achieved using SQL fixtures like so:
SELECT json_parse('{"foo":"bar"}')::SUPER as structured_column
This is quite cumbersome as you have to specify all significant columns in your sql.
Who will this benefit?
Anyone making use of semi-structured data columns.
Are you interested in contributing this feature?
I am open to contributing if this is of interest.
Anything else?
No response
The text was updated successfully, but these errors were encountered: