Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: allow users to pass schema in encrypted data-frames #676

Merged

Conversation

RomanBredehoft
Copy link
Collaborator

@RomanBredehoft RomanBredehoft commented May 7, 2024

@cla-bot cla-bot bot added the cla-signed label May 7, 2024
if column_name not in column_names:
# TODO: Is this check actually relevant ? Can't the schema provide more columns than the
# one found in the data-frame ?
raise ValueError(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we allow schema with column names that do not match the ones found in the given data-frame ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imo raising the error as you are doing here is the correct behavior.

@@ -189,10 +297,15 @@ def pre_process_dtypes(pandas_dataframe: pandas.DataFrame) -> Tuple[pandas.DataF
"supported."
)

# TODO: Should all non-integers columns be considered by the schema if not None ? Currently,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we raise an error/warning if all non-integer columns from the data-frame were not covered by the given schema ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would happen if they are missing from the given schema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they are automatically computed

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just wondering because if could be an easy mistake to forget to put some columns, but no error will be raised

@@ -37,7 +41,9 @@ def keygen(self, keys_path: Optional[Union[Path, str]] = None):
else:
self.client.keygen(True)

def encrypt_from_pandas(self, pandas_dataframe: pandas.DataFrame) -> EncryptedDataFrame:
def encrypt_from_pandas(
self, pandas_dataframe: pandas.DataFrame, schema: Optional[Dict] = None
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a schema is optional. If set, it should follow a specific format.

if needed, we could also handle the output of get_schema (pandas data-frames) as an input here

@RomanBredehoft RomanBredehoft force-pushed the feat/allow_users_pass_schema_encrypted_dataframe_4376 branch from e81b54d to 3648432 Compare May 23, 2024 16:10
@RomanBredehoft RomanBredehoft force-pushed the feat/allow_users_pass_schema_encrypted_dataframe_4376 branch from 4473764 to 760712f Compare May 27, 2024 15:15
@RomanBredehoft RomanBredehoft force-pushed the feat/allow_users_pass_schema_encrypted_dataframe_4376 branch from 760712f to 6421bc5 Compare May 27, 2024 16:35
Copy link

Coverage passed ✅

Coverage details

---------- coverage: platform linux, python 3.8.18-final-0 -----------
Name    Stmts   Miss  Cover   Missing
-------------------------------------
TOTAL    7633      0   100%

59 files skipped due to complete coverage.

@RomanBredehoft RomanBredehoft marked this pull request as ready for review May 28, 2024 15:02
@RomanBredehoft RomanBredehoft requested a review from a team as a code owner May 28, 2024 15:02
elif column.dtype == "object":
unique_values = column.unique()

# Only take strings into account and thus avoid NaN values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use the old x != x to detect NaNs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah right, I keep forgetting this trick thanks

Copy link
Collaborator

@fd0r fd0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, responded to the todos

@RomanBredehoft RomanBredehoft merged commit ccd6641 into main Jun 4, 2024
12 checks passed
@RomanBredehoft RomanBredehoft deleted the feat/allow_users_pass_schema_encrypted_dataframe_4376 branch June 4, 2024 08:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants