Add `unprivileged` and `privileged` subcommand to Elastic Agent #4621

blakerouse · 2024-04-25T18:48:29Z

What does this PR do?

Adds a new unprivileged and privileged subcommand that converts an installed Elastic Agent from privileged to unprivileged or back again.

Why is it important?

To allow users to change the permission execution mode without having to re-install the Elastic Agent.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
~~[ ] I have made corresponding change to the default configuration files~~
~~[ ] I have added tests that prove my fix is effective or that my feature works~~ (will have integration tests)
I have added an entry in ./changelog/fragments using the changelog tool
I have added an integration test or an E2E test

Author's Checklist

can convert from privileged to unprivileged
can convert from unprivileged to privileged

How to test this PR locally

$ sudo elastic-agent unprivileged

$ sudo elastic-agent privileged

Related issues

Closes #2790

…ed mode.

elasticmachine · 2024-05-09T14:50:07Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

blakerouse · 2024-05-09T18:57:02Z

SonarQube you are correct about unit tests, but you are very wrong about integration tests...

This has integration tests for both privileged and unprivileged going in both directions.

pkg/control/v2/client/wait/agent.go

internal/pkg/agent/cmd/privileged.go

michel-laterman · 2024-05-14T19:08:06Z

internal/pkg/agent/cmd/unprivileged.go

+	// cannot switch to unprivileged when Elastic Defend exists in the policy
+	err = ensureNoElasticDefend()


Normally osquery is running as privileged. Quite a lot of functionality will be unavailable if running unprivileged or returning different results.

Osqueryd also checks for "safe permissions" when it loads our custom extension.

When osquery launches, each path is evaluated for "safe permissions" (extension executable files must be owned by root or Administrator) and executed as a monitored child process

https://osquery.readthedocs.io/en/stable/deployment/extensions/#autoloading-extensions

If you observe a runtime error from osquery, Extension binary has unsafe permissions, you have to lock down the filesystem permissions on the extension executable. See the steps in "Extensions Binary Permissions," above.

https://osquery.readthedocs.io/en/stable/deployment/extensions/#troubleshooting

For osquery, you should mark the integration as requiring root privileges like defend does in https://github.com/elastic/endpoint-package/blob/ce921805467ab55953adc3721e7db8581915bd35/package/endpoint/manifest.yml#L23-L25. You can also do this for individual data streams but I don't think that applies to OSquery.

You could also add a prevention to the spec file to prevent us from even starting osquery if you aren't root, but since it doesn't fundamentally not work when not root, this might not be necessary.

If you do update the spec, you would want to update the agentbeat spec.

internal/pkg/agent/cmd/unprivileged.go

internal/pkg/agent/cmd/privileged.go

internal/pkg/agent/cmd/unprivileged.go

internal/pkg/agent/cmd/privileged.go

internal/pkg/agent/cmd/unprivileged.go

cmacknz · 2024-05-16T20:39:42Z

internal/pkg/agent/cmd/privileged.go

+
+By default this command will ask for a confirmation before making this change. You can bypass the confirmation request
+using the -f flag. This will always stop the running Elastic Agent (if running) before performing the switch. It is
+possible that loss of metrics could occur during this small window of time. The command also always starts the


possible that loss of metrics could occur during this small window of time

Loss of metrics or data in general, if this is intended to be a warning that this is not a zero downtime operation you could just say that directly.

For both commands you may want to caution about what happens if their execution is interrupted.

cmacknz · 2024-05-16T20:42:35Z

internal/pkg/agent/cmd/unprivileged.go

+		return fmt.Errorf("failed to create component model from policy: %w", err)
+	}
+	for _, comp := range comps {
+		if comp.InputSpec != nil && comp.InputSpec.InputType == endpoint {


Is there a way to actually evaluate the runtime preventions and see if something has the root one?

That would be a bit more future proof. The constrain isn't really "is defend running" it's "is there something that requires root to function" running, which today is only defend.

I think we need to determine if that is the behavior that we actually want. An input specification that requires root, when switched to unprivileged mode will still allow the Elastic Agent to run and the input will just report as failed. This would then be viewable to the user in status as well as in Fleet and would allow them to adjust/remove the integration.

In the case of Endpoint it is more detrimental to switch the mode with it enabled. Endpoint being a separate piece of software that is installed, once switched it cannot be uninstalled. That leaves the Elastic Agent in a completely broken state, unless switched back to privileged and then remove the integration.

I believe that we have 4 options for handling this:

Prevent switch if any input has a runtime protection for requiring root.

Prevent switch if any service input has a runtime protection for requiring root.

Add new option to input specification that says it doesn't support unprivileged mode directly. This is a little different then a runtime protection, being that it could be used in switch to prevent the switch. Endpoint specification can set this specifically and then future inputs can as well.

Keep it how it is at the moment and hard-code it to endpoint.

That would be a bit more future proof. The constrain isn't really "is defend running" it's "is there something that requires root to function" running, which today is only defend.

Osquery as well

Add new option to input specification that says it doesn't support unprivileged mode directly.

Let's avoid building something new for information we already have, so this option is out.

This would then be viewable to the user in status as well as in Fleet and would allow them to adjust/remove the integration.

Agree with this, along the lines of not building anything new, we can rely on this to inform the user after the fact. When this is exposed from Fleet, Fleet will have information about the installed packages and will likely warn or prevent the user in the same way. Most users are going to do this en-mass at won't be reading warning text anyway.

Prevent switch if any service input has a runtime protection for requiring root.
Keep it how it is at the moment and hard-code it to endpoint.

Given the distinction that a service input may be totally broken if we do this, it makes sense to only add additional functionality for service components to deal with this distinction.

If it isn't a lot of effort to check the spec file so we don't have to hard code this case, then doing this in a general way is preferable.

Otherwise given endpoint is the only service component we expect to have for the foreseeable future, we can leave this as is and add a comment explaining why it specifically is special cased.

That would be a bit more future proof. The constrain isn't really "is defend running" it's "is there something that requires root to function" running, which today is only defend.

Osquery as well

@aleksmaus Root is required to function or it doesn't work well without root? Trying to understand the difference here.

Another option is to allow the switch with Endpoint enabled, but Elastic Agent uninstall Endpoint just like the uninstall command does. It can perform uninstall and then switch to non-root. This provides more of a way of not blocking the user from switching, we could even prompt if they want to remove it an switch. [y/N] could be the default with being a force -f switch where it always uninstalls it.

I like this option a little more because it doesn't prevent the switch just because a service component that requires root is enabled.

@aleksmaus Root is required to function or it doesn't work well without root? Trying to understand the difference here.

We could run osqueryd binary standalone without root permissions and some of the functionality will not be available and some of the queries will return different results (with no errors).

So there is no indication from the user point of view that something is wrong with permissions, the data just would be missing, which creates "blind spots" from security point of view without user being aware.

The file system searches would be missing some valid results because of the lack of the permissions, and osquery doesn't return the error in these cases, it just returns a smaller resultset.

Here are just a few examples of running osquery with sudo and without on the same query:

Also the file permissions for our extensions should be set to root or Administrator:
https://osquery.readthedocs.io/en/stable/deployment/extensions/#autoloading-extensions

(extension executable files must be owned by root or Administrator)

@aleksmaus I think it's best to move this to its own issue and discussion on osquerybeat as non-root. Its could be possible to allow it to run or not run. I believe that discussion is better to be had in a specific issue and then come to a conclusion. We can adjust in a follow up PR based on that.

Another option is to allow the switch with Endpoint enabled, but Elastic Agent uninstall Endpoint just like the uninstall command does. It can perform uninstall and then switch to non-root. This provides more of a way of not blocking the user from switching, we could even prompt if they want to remove it an switch. [y/N] could be the default with being a force -f switch where it always uninstalls it.

You would have to include and test that tamper protection still applies on this path which will be extra testing we have to keep around. I think it is much simpler if we force users to put unprivileged agents in policies that do not include Defend. There are no edge cases around Defend being conditionally uninstalled or removed that way. The composable agent policies work in the UI will help users deal with any shared content between the privileged and privileged policies.

So I'd say we should continue to refuse the conversion if defend is in the policy.

Okay. I was just worrying that it prevents issues with different personas in an organization. A persona that manages the policies and a persona that manages installation of the endpoints. That is probably rare and not something we need to worry about.

mergify · 2024-05-20T14:27:49Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b to-unprivileged upstream/to-unprivileged
git merge upstream/main
git push upstream to-unprivileged

internal/pkg/agent/cmd/install.go

internal/pkg/agent/cmd/privileged.go

pkg/control/v2/client/wait/agent.go

pkg/control/v2/client/wait/common.go

testing/integration/switch_privileged_test.go

cmacknz · 2024-05-22T18:49:36Z

internal/pkg/agent/cmd/unprivileged.go

+	return nil
+}
+
+func ensureNoServiceComponentIssues() error {


I don't think this has any test coverage. You could add a check in the endpoint test suite that the unprivileged command fails, or add a unit test.

True I will add an integration test, as I would prefer to see a true scenario be tested.

blakerouse · 2024-05-22T19:17:15Z

I have validated manually that this works on macOS, because we do not have integration testing coverage on macOS. I was able to switch between the modes successfully and the vaults are handled properly.

elastic-sonarqube · 2024-05-24T03:19:38Z

Quality Gate failed

Failed conditions
1.5% Coverage on New Code (required ≥ 40%)

See analysis details on SonarQube

michalpristas · 2024-05-31T11:49:16Z

internal/pkg/agent/cmd/unprivileged.go

+	return cmd
+}
+
+func unprivilegedCmd(streams *cli.IOStreams, cmd *cobra.Command) (err error) {


nit: seems like there's quite some repetition meaning this and privilegedCmd function

Work on privileged/unprivileged command.

ee26696

blakerouse added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team backport-skip labels Apr 25, 2024

blakerouse self-assigned this Apr 25, 2024

blakerouse changed the title ~~Work on privileged/unprivileged command.~~ Add unprivileged and privileged subcommand to Elastic Agent Apr 25, 2024

blakerouse added 7 commits April 30, 2024 15:12

Add integration tests for switching between unprivileged and privileg…

df0160a

…ed mode.

Merge branch 'main' into to-unprivileged

e4946e6

Merge branch 'main' into to-unprivileged

a709d64

Merge remote-tracking branch 'upstream/main' into to-unprivileged

71e6c0b

Fix upstream rename.

2f83735

Add changelog.

bfbf202

Merge remote-tracking branch 'upstream/main' into to-unprivileged

d1b8f4f

blakerouse marked this pull request as ready for review May 9, 2024 14:50

blakerouse requested a review from a team as a code owner May 9, 2024 14:50

blakerouse requested review from andrzej-stencel and pchila May 9, 2024 14:50

ycombinator requested review from michalpristas, michel-laterman and andrzej-stencel and removed request for andrzej-stencel May 9, 2024 14:54

ycombinator removed the request for review from andrzej-stencel May 9, 2024 21:50

michel-laterman reviewed May 14, 2024

View reviewed changes

pkg/control/v2/client/wait/agent.go Show resolved Hide resolved

michel-laterman reviewed May 14, 2024

View reviewed changes

internal/pkg/agent/cmd/privileged.go Show resolved Hide resolved

michel-laterman reviewed May 14, 2024

View reviewed changes

blakerouse added 2 commits May 15, 2024 14:40

Merge branch 'main' into to-unprivileged

217632b

Merge branch 'main' into to-unprivileged

a97f017

kaanyalti reviewed May 16, 2024

View reviewed changes

internal/pkg/agent/cmd/unprivileged.go Show resolved Hide resolved

kaanyalti reviewed May 16, 2024

View reviewed changes

internal/pkg/agent/cmd/privileged.go Show resolved Hide resolved

kaanyalti reviewed May 16, 2024

View reviewed changes

internal/pkg/agent/cmd/unprivileged.go Outdated Show resolved Hide resolved

kaanyalti reviewed May 16, 2024

View reviewed changes

internal/pkg/agent/cmd/privileged.go Outdated Show resolved Hide resolved

kaanyalti reviewed May 16, 2024

View reviewed changes

internal/pkg/agent/cmd/privileged.go Show resolved Hide resolved

kaanyalti reviewed May 16, 2024

View reviewed changes

internal/pkg/agent/cmd/unprivileged.go Show resolved Hide resolved

cmacknz reviewed May 16, 2024

View reviewed changes

kaanyalti approved these changes May 16, 2024

View reviewed changes

blakerouse added 2 commits May 20, 2024 13:33

Merge remote-tracking branch 'upstream/main' into to-unprivileged

9d2dc72

Switch to new install privileged/unprivileged checks.

0986097

blakerouse force-pushed the to-unprivileged branch from d75f062 to 0986097 Compare May 20, 2024 17:38

blakerouse added 2 commits May 21, 2024 08:54

Adjust FixPermissions to take ownership back to Administrators.

2298339

Merge remote-tracking branch 'upstream/main' into to-unprivileged

fc516b7

pchila reviewed May 21, 2024

View reviewed changes

blakerouse added 5 commits May 22, 2024 10:02

Code review feedback.

b25e9bf

Fix service component check. Support switching on macOS.

ebafe9e

Fix lint.

d9a87eb

Update to constant.

6e90411

Merge remote-tracking branch 'upstream/main' into to-unprivileged

77a077b

pchila approved these changes May 22, 2024

View reviewed changes

cmacknz reviewed May 22, 2024

View reviewed changes

blakerouse added 2 commits May 23, 2024 22:47

Add tests for unprivileged switch failure with endpoint installed.

29b9f17

Merge remote-tracking branch 'upstream/main' into to-unprivileged

80fe23e

ycombinator mentioned this pull request May 30, 2024

[Fleet] Users should be able to switch Agents from privileged to unprivileged mode elastic/kibana#184516

Open

michalpristas approved these changes May 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `unprivileged` and `privileged` subcommand to Elastic Agent #4621

Add `unprivileged` and `privileged` subcommand to Elastic Agent #4621

blakerouse commented Apr 25, 2024 •

edited

elasticmachine commented May 9, 2024

blakerouse commented May 9, 2024

michel-laterman May 14, 2024

aleksmaus May 16, 2024

cmacknz May 17, 2024

cmacknz May 17, 2024

cmacknz May 16, 2024

cmacknz May 16, 2024

cmacknz May 16, 2024

blakerouse May 17, 2024

aleksmaus May 17, 2024

cmacknz May 17, 2024

blakerouse May 20, 2024

blakerouse May 20, 2024

aleksmaus May 20, 2024

blakerouse May 20, 2024

cmacknz May 21, 2024

blakerouse May 22, 2024

mergify bot commented May 20, 2024

cmacknz May 22, 2024

blakerouse May 22, 2024

blakerouse commented May 22, 2024

elastic-sonarqube bot commented May 24, 2024

michalpristas May 31, 2024

		// cannot switch to unprivileged when Elastic Defend exists in the policy
		err = ensureNoElasticDefend()

Add unprivileged and privileged subcommand to Elastic Agent #4621

Are you sure you want to change the base?

Add unprivileged and privileged subcommand to Elastic Agent #4621

Conversation

blakerouse commented Apr 25, 2024 • edited

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

elasticmachine commented May 9, 2024

blakerouse commented May 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mergify bot commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

blakerouse commented May 22, 2024

elastic-sonarqube bot commented May 24, 2024

Quality Gate failed

Choose a reason for hiding this comment

Add `unprivileged` and `privileged` subcommand to Elastic Agent #4621

Add `unprivileged` and `privileged` subcommand to Elastic Agent #4621

blakerouse commented Apr 25, 2024 •

edited