Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Analyse different strategies and add validation for missing fields in parquet data compared to protobuf schema #102

Open
Meghajit opened this issue Jan 20, 2022 · 1 comment

Comments

@Meghajit
Copy link
Member

Meghajit commented Jan 20, 2022

Extra fields present in the parquet data not present in the protobuf schema will be ignored.
However, it might be possible that:

  • there are some fields in protobuf schema which are missing in the parquet data
  • field names are same but the data type is different

We would need answers to as well as solve for :

  1. Should the Parquet Data Source set default values for fields which are not found in the parquet file but present in the schema ? If yes, what should be the default value ?
  2. If no defaults are wanted to be set, should the Dagger job fail ?
@Meghajit
Copy link
Member Author

Removing this from Support for Parquet Files as a Source Milestone as it is a nice to have for the first milestone
cc: @prakharmathur82

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants