Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SystemAccessControl.getTableColumnMasks SPI function and OPA implementation #21997

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mosiac1
Copy link
Contributor

@mosiac1 mosiac1 commented May 16, 2024

Description

Addresses issues raised in #21359.

Adds the getTableColumnMasks function to the SPI

Map<ColumnSchema, ViewExpression> getTableColumnMasks(ConnectorSecurityContext context, SchemaTableName tableName, List<ColumnSchema> columns);

(which defaults to calling getColumnMask repeatedly) that SystemAcessControl/ConnectorAccessControl plugins can implement to process column mask requests in batches.

Updates the StatementAnalyzer and AccessControl to use this batched method of getting column masks whenever possible (which is all cases).

The issue linked above goes into the performance implications.

This PR updates the OPA plugin to implement the new SPI function with 2 modes of operation:

  • Sending requests to fetch each column mask in parallel; This is a drop-in improvement that doesn't require any config/Rego changes.
  • Using a new batchColumnMasksUri OPA Server endpoint that delegates collection of column masks to the Rego policy code; This requires changes to the Rego code running on the OPA server and the Trino configs.

For the latter mode, the following Rego rule can be added (assuming the columnMask rule is already implemented) to quickly add support for batched column masks. This approach isn't particularly efficient (the with operator in Rego can be costly) and I haven't had the time to benchmark.

batchColumnMasks contains {
   "index": i,
   "viewExpression": cm
} if {
  some i
  cm := columnMask with input.action.resource.column as input.action.filterResources[i].
}

Later edit: did some testing on using the above "trick" - it is very slow (column masks for 1k columns took on avg. 16 seconds to compute). So DO NOT USE THIS IN PRODUCTION. The parallel mode (which this PR enables by default) is much faster.

Additional context and related issues

#21359

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(x) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Comment on lines 956 to 972
return columns.stream()
.map(c -> Map.entry(c, getColumnMask(context, tableName, c.getName(), c.getType())))
.filter(entry -> entry.getValue().isPresent())
.collect(Collectors.toMap(Map.Entry::getKey, entry -> entry.getValue().get()));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this should have a default that calls getColumnMask, looking at other function in this interface none of them do this. Should this logic be left to another class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the default should be to call the old method
the engine should call the new method only
the old method should be deprecated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! do we want to remove getColumnMask from the AccessControl interface? that would be the best way to ensure the engine is not using it anymore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, remove the supposed-to-be-unused method from the interface defined in trino-main

deprecation is needed only for trino-spi's interfaces

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the function from trino-main's interfaces, let me know if missed some.

Do we want to remove getColumnMask's implementation from the FileBasedAccessControl and OpaAccessControl implementation? (or others if you have any in mind)

Comment on lines 956 to 972
return columns.stream()
.map(c -> Map.entry(c, getColumnMask(context, tableName, c.getName(), c.getType())))
.filter(entry -> entry.getValue().isPresent())
.collect(Collectors.toMap(Map.Entry::getKey, entry -> entry.getValue().get()));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the default should be to call the old method
the engine should call the new method only
the old method should be deprecated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

2 participants