More Complete ACL Implementation #386

kevin-montrose · 2024-05-15T22:16:44Z

This sketches out a more proper ACL implementation - one with categories beyond @admin, and where individual commands can be ACL'd.

Basic idea is to use RespCommandsInfo as a mechanism to discover command ACLs, and attach a bitfield to each User to track individual commands rather than categories.

Outstanding TODOs:

Test rest of commands w.r.t. ACL categories
Implement individual command ACL'ing (ie. +get)
Implement sub-command ACL'ing (ie. -config|get)
Figure out what to do with unlisted/internal/cluster commands
Figure out what to do with custom extensions
- Introduced a @garnet ACL category that all commands not in Redis are now tagged with
(Probably) Remove CommandCategory - it is redundant
Clean up command parsing, now that subcommand bytes are a "later in the pipeline'-thing
- As part of this, probably DRY up ACL checking - it's a bit error prone to add a new command with this code so "wet"
- SET (with EX, NX, etc. but not SETEX) are still special cased as they weren't realized as subCmds
Cluster subcommands are weird, but need ACL testing too
CustomCmd, CustomTnx, CustomObjCmd are weird and need to be ACL'able in some way too
- Introduced a @Custom ACL category that blocks all of these, and the individual commands can also be ACL'd (that is +customcmd works)
- Longer term we may want to allow ACL'ing the commands by their actual registered name, but IMO that should be a separate enhancement

This is explicitly not going to implement keyspace patterns. I feel like that should be a separate enhancement, as it's pretty substantial on it's own.

The biggest incidental change here is a substantial refactoring of RespCommand and the various switches over it. This is, IMO, the cleanest way to get a canonical value to test against ACLs with. As a consequence, RespCommand has a bunch more entries as most things that were previously "sub commands" (but not in the RESP sense) are now actual enum values.

There's one additional implementation detail to call out, which is how ACL descriptions are maintained. So that ACL descriptions behave more or less like you'd expect (that is +@foo +bar -bar becomes +@foo) a greedy approach is taken where, upon update, a trial removal of each token is made to see if it impacts the effective ACL set. This is pretty allocate-y, and while that could be improved I think ACL manipulation is rare and locked down enough that clarity was a bit more important. It's worth a stringent review, however. Code is in User.RationalizeACLDescription.

An attempt has been made to cover all commands, but more remain. A number of TODOs remain, but this gives a rough shape of how it'd work and how individual Commands or SubCommands would end up ACL'd too.

As part of this, needed to actually enumerate all commands in RespCommands which implies a lot of future cleanup in parsing and command processesing. This also uncovered a number of different gaps in implemented RESP commands, which were not addressed unless doing so was necessary for ACL testing.

Garnet-specific commands are still pending.

@garnet

Introduces @garnet category. Reworks WATCH so MS and OS are proper subcommands.

… command ACL'ing.

Reworks parsing to remove the tuple with subcommand, as it'd always be 0 now.

…ands, while the common case is consolidated into an upfront check. Note that SET (and pseudo-variants SETEXNX, etc.) is a special case, as it lacks subcommands but has multiple command types. That could be cleaned up later, but has performance implications - so leaving it for future work.

…CL categories, replace slow temp code with better code, add tests to catch future additions; this involves reordering RespCommand yet again

…checks

kevin-montrose · 2024-05-21T16:43:11Z

There's still some more to do here (more testing, subcommands, some cleanup) , but I'm going on vacation for a week and this is pretty dang close to how it's going to actually look when done. So taking it off draft, and marking ready for review (if not merging).

/cc @mgravell @NickCraver - this is relevant to your interests.

lmaas

I have only made it half-way through the PR, but this implementation looks very promising! Your efforts in addressing the open to-do items in the ACL implementation are greatly appreciated. I have included some comments below.

libs/cluster/Session/ClusterSession.cs

libs/server/ACL/ACLException.cs

libs/server/ACL/ACLParser.cs

libs/server/ACL/User.cs

test/Garnet.test/CredentialManager.cs

vazois

I think this PR has too many changes to vet, maybe we need to break it into smaller PRs. I think the first is good but very hard to review because the surface area is huge. I think the idea of having a bitmap to match command permissions is good. There might I couple alternatives to tradeoff between CPU cycles or memory. We need to separate general fixes from clear ACL enchantments and avoid pushing parsing inside the execution methods.

libs/cluster/Session/MigrateCommand.cs

libs/cluster/Session/ClusterSession.cs

libs/server/Resp/RespServerSession.cs

libs/cluster/Session/RespClusterBasicCommands.cs

libs/server/Resp/Bitmap/BitmapCommands.cs

libs/server/Resp/KeyAdminCommands.cs

libs/server/Resp/Objects/HashCommands.cs

libs/server/Storage/Functions/MainStore/VarLenInputMethods.cs

kevin-montrose · 2024-05-28T17:09:41Z

I think this PR has too many changes to vet, maybe we need to break it into smaller PRs. I think the first is good but very hard to review because the surface area is huge. I think the idea of having a bitmap to match command permissions is good. There might I couple alternatives to tradeoff between CPU cycles or memory. We need to separate general fixes from clear ACL enchantments and avoid pushing parsing inside the execution methods.

Yeah, it's gigantic - ACLs touch literally everything. Excluding tests, it is trending smaller as things DRY up.

A bunch of random-ish fixes are somewhat unfortunate, as a big part of testing this is making sure that failed ACLs don't bork the command stream - and I encountered a number of cases of fairly "brittle" command implementations. I'm open to breaking those out into separate PRs once ACLs are in a good place, but I needed some confidence that the ACL design would work for all commands upfront.

Co-authored-by: Lukas Maas <contact@lukasmaas.com>

…ssion ACL validation address feedback; break CommandPermissionSet out into separate file address feedback; increment correct statistic, fixing bug address feedback; adding count is unnecessary in many places, remove that post-DRY-ing

badrishc · 2024-05-29T19:37:58Z

Great work overall. Shaping up really nicely, thanks!

…spCommand.

…s are all covered.

…lar commands; address a couple nits

This means we now document all of these in RespCommandsInfo.json, as they're just normal commands from the perspective of ACL'ing.

@Custom

…mands cannot be ACL'd, but ACL'ing all custom (with @Custom) or as part of Garnet extensions (with @garnet) works as does ACL'ing all custom non-object (with customcmd), object (with customobjcmd), and transaction (with customtxn) separately Fixup a number of test failures and nits.

…rt of this, correct some places where commands weren't tagged @fast or @slow (everything should be tagged one or the other)

…have no impact. This implementation isn't amazing, but ACL changing is hopefully rare and privileged enough that it is acceptable.

… ACL test commands that have been corrected

kevin-montrose · 2024-06-04T19:10:26Z

@lmaas @vazois @badrishc This is now in a complete state from my perspective, and ready for full review.

badrishc · 2024-06-05T00:32:35Z

Benchmark (Release mode): Embedded.perftest --op-workload PING --op-percent 100 -t 1

Framework: net8.0

Branch: main (on Kevin's fork)

Running with 1 threads...
Avg. throughput: 135719.46 Kops/sec; 135719.46 Kops/sec/thread (Max: 135719.46, Min: 135719.46)

Branch: aclImprovements (on Kevin's fork)

Running with 1 threads...
Avg. throughput: 62371.18 Kops/sec; 62371.18 Kops/sec/thread (Max: 62371.18, Min: 62371.18)

Summary

Looks like we are at around 54% performance regression at the moment. Need to drill deeper into why.

Edit: looks like there is too much variance across runs so take above with a grain of salt.

vazois

There are some issues related to code style. Please use compact switch statements where possible.
There are some protocol compatibility issues with the responses generated for invalid input in cluster and bitmap commands.
Thanks for refactoring the cluster commands to make them compatible with the improved ACL implementation.

libs/server/Resp/Objects/HashCommands.cs

libs/server/Resp/RespCommand.cs

libs/cluster/Session/ClusterCommands.cs

test/Garnet.test.cluster/ClusterNegativeTests.cs

test/Garnet.test/RespAdminCommandsTests.cs

…or messages, and cleaning up subcommand parsing

kevin-montrose · 2024-06-05T21:09:21Z

Benchmark (Release mode): Embedded.perftest --op-workload PING --op-percent 100 -t 1

Framework: net8.0

Branch: main (on Kevin's fork)

Running with 1 threads... Avg. throughput: 135719.46 Kops/sec; 135719.46 Kops/sec/thread (Max: 135719.46, Min: 135719.46)

Branch: aclImprovements (on Kevin's fork)

I reproduce this regression on my latest too, I'll spend some time digging (now that all feedback is addressed) and update when I have an explanation/fix.

TalZaccai · 2024-06-05T22:31:30Z

libs/server/Resp/BasicCommands.cs

                    return false;
+
+                while (!RespWriteUtils.WriteError($"ERR Unknown subcommand or wrong number of arguments for COMMAND.", ref dcurr, dend))


Error should be in the format of CmdStrings.GenericErrUnknownSubCommand

TalZaccai · 2024-06-05T22:32:33Z

libs/server/Resp/BasicCommands.cs


-            // Handle COMMAND COUNT
-            if (count > 0 && (subCommand.SequenceEqual(CmdStrings.COUNT) || subCommand.SequenceEqual(CmdStrings.count)))
+                while (!RespWriteUtils.WriteError($"ERR Unknown subcommand or wrong number of arguments for COMMAND COUNT.", ref dcurr, dend))


Error should be in the format of CmdStrings.GenericErrWrongNumArgs

TalZaccai · 2024-06-05T22:51:19Z

test/Garnet.test/RespCommandTests.cs

+        /// have a matching RespCommandInfo objects defined in RespCommandsInfo
+        /// </summary>
+        [Test]
+        public void CommandsInfoCoverageTest()


This test was removed from main, we decided not to have it enforced. Alternatively, we can add an Ignore attribute

badrishc · 2024-06-05T23:34:43Z

I reproduce this regression on my latest too, I'll spend some time digging (now that all feedback is addressed) and update when I have an explanation/fix.

Is it possible something is inadvertently getting boxed?
I made this into a BDN testcase here: main...badrishc/resp-parse-stress - here are the results comparing main to this PR:

main:

Method	Job	EnvironmentVariables	Runtime	Mean	Error	StdDev
InlinePing	.NET 6	Empty	.NET 6.0	4.292 us	0.0446 us	0.0395 us
InlinePing	.NET 8	DOTNET_TieredPGO=0	.NET 8.0	2.449 us	0.0052 us	0.0040 us

PR:

Method	Job	EnvironmentVariables	Runtime	Mean	Error	StdDev
InlinePing	.NET 6	Empty	.NET 6.0	4.937 us	0.0462 us	0.0409 us
InlinePing	.NET 8	DOTNET_TieredPGO=0	.NET 8.0	3.958 us	0.0443 us	0.0393 us

kevin-montrose · 2024-06-06T02:03:56Z

Is it possible something is inadvertently getting boxed?
I made this into a BDN testcase here: main...badrishc/resp-parse-stress - here are the results comparing main to this PR:

Doesn't appear to be allocations based on local profiling. Might be useful to add [MemoryDiagnoser] to that benchmark, which I haven't had a chance to poke at yet.

Doing some quick and dirty benchmarking with the Embedded.perftest example, it looks like it's mostly the cost of doing the CheckACLPermissions on every request now.

On my box, .NET 8 with PGO disabled I'm getting about...

main ~62K PINGs / s
this branch ~43K PINGs / s
this branch, but remove the call to if (...) with the CheckACLPermissions call ~58K PINGs / s

Total loss ~30%, and a solid ~25% (~15K PINGs / s) is lost in checking ACLs.

I'll noodle on speeding this up in the morning.

badrishc · 2024-06-06T02:37:35Z

Doesn't appear to be allocations based on local profiling. Might be useful to add [MemoryDiagnoser] to that benchmark, which I haven't had a chance to poke at yet.

I have updated the branch with [MemoryDiagnoser] and added Get/Set in addition to Ping, for completeness. PR for this is here: #446

kevin-montrose · 2024-06-06T03:20:26Z

Mkay, because I can't leave well enough alone despite it being after 11 PM here...

One relatively simple approach would be to slap a int[] (trading space for time) on User that's used to cache the last ACL check indexed by each command (skipping any ACL checks on a 1, only transitioning from 0 to 1 on a successful ACL check). We just allocate a new int[] and swap it out whenever the user's ACL is modified. A really terrible hacked version of this gets in the ~59K range again locally, but that's with no testing so huge grain of salt.

I need to think on if there's any point in keeping the bit-packed version around; if not, we could just initialize the int[] with the "truth" and then there's not even an update.

Another, much less simple, approach would be to lean in to RespCommand being expanded and change the various ProcessXXXCommands methods into a single lookup into an array of function pointers / delegates. The array starts initialized with some function that does the ACL check and replaces the appropriate entry with a pointer to the corresponding NetworkXXX(...) method for the command on successful ACL. Subsequent invocations would jump straight to the command's actual implementation. On any user permission change, we reset the array.

That might ultimately be a bit cleaner, but it'd require a bunch of additional changes on top of this already gigantic change. There's also an indirection in there which makes me not so confident the perf would be any better.

So I'll keep pursuing the first approach. Of course, if there are any other clever ideas, I'd be happy to hear them.

badrishc · 2024-06-06T05:17:53Z

Mkay, because I can't leave well enough alone despite it being after 11 PM here...

One relatively simple approach would be to slap a int[] (trading space for time) on User that's used to cache the last ACL check indexed by each command (skipping any ACL checks on a 1, only transitioning from 0 to 1 on a successful ACL check). We just allocate a new int[] and swap it out whenever the user's ACL is modified. A really terrible hacked version of this gets in the ~59K range again locally, but that's with no testing so huge grain of salt.

I need to think on if there's any point in keeping the bit-packed version around; if not, we could just initialize the int[] with the "truth" and then there's not even an update.

Another, much less simple, approach would be to lean in to RespCommand being expanded and change the various ProcessXXXCommands methods into a single lookup into an array of function pointers / delegates. The array starts initialized with some function that does the ACL check and replaces the appropriate entry with a pointer to the corresponding NetworkXXX(...) method for the command on successful ACL. Subsequent invocations would jump straight to the command's actual implementation. On any user permission change, we reset the array.

That might ultimately be a bit cleaner, but it'd require a bunch of additional changes on top of this already gigantic change. There's also an indirection in there which makes me not so confident the perf would be any better.

So I'll keep pursuing the first approach. Of course, if there are any other clever ideas, I'd be happy to hear them.

I'm not sure we need to do this. The bitmap is nice and tight. There might be a simpler way to get back the performance. Can look into it tomorrow.

Introducing remaining ACL categories

3739de5

An attempt has been made to cover all commands, but more remain. A number of TODOs remain, but this gives a rough shape of how it'd work and how individual Commands or SubCommands would end up ACL'd too.

lmaas self-assigned this May 15, 2024

kevin-montrose added 15 commits May 17, 2024 18:45

Cover all remainin commands actually in Redis.

e2eb16f

Garnet-specific commands are still pending.

Handle Garnet specific commands.

b7ce733

Introduces @garnet category. Reworks WATCH so MS and OS are proper subcommands.

Get us to all tests passing before moving onto cleanup and individual…

409a7b0

… command ACL'ing.

merge main

aaa1fe9

Rework BITOP to remove last subcommand byte in parsing.

d18ad7a

Reworks parsing to remove the tuple with subcommand, as it'd always be 0 now.

remove CommandCategory, it is redundant

aa5a827

handle some todos; bring read/write command checking into line with A…

acbbed0

…CL categories, replace slow temp code with better code, add tests to catch future additions; this involves reordering RespCommand yet again

lay groundwork for adding/removing individual commands from ACLs

89ce6e8

expand tests to also exercise individual commands

012063f

knock some TODOs out, get these checks into simpler branchless range …

5e8d48d

…checks

expand tests to cover individual command ACL'ing

8c64559

merge latest

9df500a

cleanup a number of style nits and analyzer/diagnostic warnings

aec976d

kevin-montrose marked this pull request as ready for review May 21, 2024 16:42

kevin-montrose changed the title ~~[SKETCH] More Complete ACL Implementation~~ More Complete ACL Implementation May 21, 2024

lmaas reviewed May 24, 2024

View reviewed changes

vazois reviewed May 26, 2024

View reviewed changes

kevin-montrose and others added 4 commits May 28, 2024 13:40

nit: typo

cc46474

Co-authored-by: Lukas Maas <contact@lukasmaas.com>

merge latest main; make some notes on restoring BITOP fast parsing

cd73753

address feedback; no reason to special case default user anymore

af1ec64

kevin-montrose added 2 commits May 30, 2024 16:25

Rework parsing and RespCommand so RESP subcommands have entries in Re…

2ea71a4

…spCommand.

DRY up testing for ACLs now that categories, commands, and subcommand…

4914728

…s are all covered.

kevin-montrose added 11 commits May 31, 2024 17:41

Restore BITOP pseudo-subcommand parsing and cleanup treatment of simi…

a4a14ec

…lar commands; address a couple nits

Test remaining "weird" commands, including all CLUSTER subcommands.

13f8df8

This means we now document all of these in RespCommandsInfo.json, as they're just normal commands from the perspective of ACL'ing.

handle some lingering TODOs and nits

16deddf

implement correct ACL reporting for the new ACL implementation; as pa…

2147fea

…rt of this, correct some places where commands weren't tagged @fast or @slow (everything should be tagged one or the other)

fixup remaining tests

0f9f2e2

Change ACL description implementation to greedily remove tokens that …

a23dac5

…have no impact. This implementation isn't amazing, but ACL changing is hopefully rare and privileged enough that it is acceptable.

lots of little cleanup nits

ce2703d

merge main; includes adding ACL tests for new commands, and fixes for…

f27b953

… ACL test commands that have been corrected

cleanup formatting

891cce3

fix formatting nits in tests

918b12a

TalZaccai self-requested a review June 5, 2024 00:48

vazois requested changes Jun 5, 2024

View reviewed changes

kevin-montrose added 3 commits June 5, 2024 14:48

Address feedback; many style nits, restoring or correctig various err…

ffe6c5a

…or messages, and cleaning up subcommand parsing

address feedback; use compact switch, fix some nits

0b4379f

merge main

652b1da

TalZaccai requested changes Jun 5, 2024

View reviewed changes

Merge branch 'main' into aclImprovements

7f1aaa4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More Complete ACL Implementation #386

More Complete ACL Implementation #386

kevin-montrose commented May 15, 2024 •

edited

kevin-montrose commented May 21, 2024 •

edited

lmaas left a comment

vazois left a comment

kevin-montrose commented May 28, 2024

badrishc commented May 29, 2024

kevin-montrose commented Jun 4, 2024

badrishc commented Jun 5, 2024 •

edited

vazois left a comment

kevin-montrose commented Jun 5, 2024

Framework: net8.0

Branch: main (on Kevin's fork)

Branch: aclImprovements (on Kevin's fork)

TalZaccai Jun 5, 2024

TalZaccai Jun 5, 2024

TalZaccai Jun 5, 2024 •

edited

badrishc commented Jun 5, 2024

kevin-montrose commented Jun 6, 2024

badrishc commented Jun 6, 2024

kevin-montrose commented Jun 6, 2024

badrishc commented Jun 6, 2024

		return false;

		while (!RespWriteUtils.WriteError($"ERR Unknown subcommand or wrong number of arguments for COMMAND.", ref dcurr, dend))

More Complete ACL Implementation #386

Are you sure you want to change the base?

More Complete ACL Implementation #386

Conversation

kevin-montrose commented May 15, 2024 • edited

kevin-montrose commented May 21, 2024 • edited

lmaas left a comment

Choose a reason for hiding this comment

vazois left a comment

Choose a reason for hiding this comment

kevin-montrose commented May 28, 2024

badrishc commented May 29, 2024

kevin-montrose commented Jun 4, 2024

badrishc commented Jun 5, 2024 • edited

Framework: net8.0

Branch: main (on Kevin's fork)

Branch: aclImprovements (on Kevin's fork)

Summary

vazois left a comment

Choose a reason for hiding this comment

kevin-montrose commented Jun 5, 2024

Framework: net8.0

Branch: main (on Kevin's fork)

Branch: aclImprovements (on Kevin's fork)

TalZaccai Jun 5, 2024

Choose a reason for hiding this comment

TalZaccai Jun 5, 2024

Choose a reason for hiding this comment

TalZaccai Jun 5, 2024 • edited

Choose a reason for hiding this comment

badrishc commented Jun 5, 2024

kevin-montrose commented Jun 6, 2024

badrishc commented Jun 6, 2024

kevin-montrose commented Jun 6, 2024

badrishc commented Jun 6, 2024

kevin-montrose commented May 15, 2024 •

edited

kevin-montrose commented May 21, 2024 •

edited

badrishc commented Jun 5, 2024 •

edited

TalZaccai Jun 5, 2024 •

edited