-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unprivileged
and privileged
subcommand to Elastic Agent
#4621
base: main
Are you sure you want to change the base?
Conversation
unprivileged
and privileged
subcommand to Elastic Agent
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
SonarQube you are correct about unit tests, but you are very wrong about integration tests... This has integration tests for both privileged and unprivileged going in both directions. |
// cannot switch to unprivileged when Elastic Defend exists in the policy | ||
err = ensureNoElasticDefend() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally osquery is running as privileged. Quite a lot of functionality will be unavailable if running unprivileged or returning different results.
Osqueryd also checks for "safe permissions" when it loads our custom extension.
When osquery launches, each path is evaluated for "safe permissions"
(extension executable files must be owned by root or Administrator)
and executed as a monitored child process
https://osquery.readthedocs.io/en/stable/deployment/extensions/#autoloading-extensions
If you observe a runtime error from osquery, Extension binary has unsafe permissions,
you have to lock down the filesystem permissions on the extension executable.
See the steps in "Extensions Binary Permissions," above.
https://osquery.readthedocs.io/en/stable/deployment/extensions/#troubleshooting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For osquery, you should mark the integration as requiring root privileges like defend does in https://github.com/elastic/endpoint-package/blob/ce921805467ab55953adc3721e7db8581915bd35/package/endpoint/manifest.yml#L23-L25. You can also do this for individual data streams but I don't think that applies to OSquery.
You could also add a prevention to the spec file to prevent us from even starting osquery if you aren't root, but since it doesn't fundamentally not work when not root, this might not be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you do update the spec, you would want to update the agentbeat spec.
internal/pkg/agent/cmd/privileged.go
Outdated
|
||
By default this command will ask for a confirmation before making this change. You can bypass the confirmation request | ||
using the -f flag. This will always stop the running Elastic Agent (if running) before performing the switch. It is | ||
possible that loss of metrics could occur during this small window of time. The command also always starts the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
possible that loss of metrics could occur during this small window of time
Loss of metrics or data in general, if this is intended to be a warning that this is not a zero downtime operation you could just say that directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For both commands you may want to caution about what happens if their execution is interrupted.
return fmt.Errorf("failed to create component model from policy: %w", err) | ||
} | ||
for _, comp := range comps { | ||
if comp.InputSpec != nil && comp.InputSpec.InputType == endpoint { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to actually evaluate the runtime preventions and see if something has the root one?
That would be a bit more future proof. The constrain isn't really "is defend running" it's "is there something that requires root to function" running, which today is only defend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to determine if that is the behavior that we actually want. An input specification that requires root, when switched to unprivileged mode will still allow the Elastic Agent to run and the input will just report as failed. This would then be viewable to the user in status
as well as in Fleet and would allow them to adjust/remove the integration.
In the case of Endpoint it is more detrimental to switch the mode with it enabled. Endpoint being a separate piece of software that is installed, once switched it cannot be uninstalled. That leaves the Elastic Agent in a completely broken state, unless switched back to privileged and then remove the integration.
I believe that we have 4 options for handling this:
- Prevent switch if any input has a runtime protection for requiring root.
- Prevent switch if any service input has a runtime protection for requiring root.
- Add new option to input specification that says it doesn't support unprivileged mode directly. This is a little different then a runtime protection, being that it could be used in switch to prevent the switch. Endpoint specification can set this specifically and then future inputs can as well.
- Keep it how it is at the moment and hard-code it to endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a bit more future proof. The constrain isn't really "is defend running" it's "is there something that requires root to function" running, which today is only defend.
Osquery as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add new option to input specification that says it doesn't support unprivileged mode directly.
Let's avoid building something new for information we already have, so this option is out.
This would then be viewable to the user in status as well as in Fleet and would allow them to adjust/remove the integration.
Agree with this, along the lines of not building anything new, we can rely on this to inform the user after the fact. When this is exposed from Fleet, Fleet will have information about the installed packages and will likely warn or prevent the user in the same way. Most users are going to do this en-mass at won't be reading warning text anyway.
Prevent switch if any service input has a runtime protection for requiring root.
Keep it how it is at the moment and hard-code it to endpoint.
Given the distinction that a service input may be totally broken if we do this, it makes sense to only add additional functionality for service components to deal with this distinction.
If it isn't a lot of effort to check the spec file so we don't have to hard code this case, then doing this in a general way is preferable.
Otherwise given endpoint is the only service component we expect to have for the foreseeable future, we can leave this as is and add a comment explaining why it specifically is special cased.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a bit more future proof. The constrain isn't really "is defend running" it's "is there something that requires root to function" running, which today is only defend.
Osquery as well
@aleksmaus Root is required to function or it doesn't work well without root? Trying to understand the difference here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to allow the switch with Endpoint enabled, but Elastic Agent uninstall Endpoint just like the uninstall
command does. It can perform uninstall and then switch to non-root. This provides more of a way of not blocking the user from switching, we could even prompt if they want to remove it an switch. [y/N] could be the default with being a force -f
switch where it always uninstalls it.
I like this option a little more because it doesn't prevent the switch just because a service component that requires root is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aleksmaus Root is required to function or it doesn't work well without root? Trying to understand the difference here.
We could run osqueryd binary standalone without root permissions and some of the functionality will not be available and some of the queries will return different results (with no errors).
So there is no indication from the user point of view that something is wrong with permissions, the data just would be missing, which creates "blind spots" from security point of view without user being aware.
The file system searches would be missing some valid results because of the lack of the permissions, and osquery doesn't return the error in these cases, it just returns a smaller resultset.
Here are just a few examples of running osquery with sudo and without on the same query:
Also the file permissions for our extensions should be set to root or Administrator:
https://osquery.readthedocs.io/en/stable/deployment/extensions/#autoloading-extensions
(extension executable files must be owned by root or Administrator)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aleksmaus I think it's best to move this to its own issue and discussion on osquerybeat as non-root. Its could be possible to allow it to run or not run. I believe that discussion is better to be had in a specific issue and then come to a conclusion. We can adjust in a follow up PR based on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is to allow the switch with Endpoint enabled, but Elastic Agent uninstall Endpoint just like the uninstall command does. It can perform uninstall and then switch to non-root. This provides more of a way of not blocking the user from switching, we could even prompt if they want to remove it an switch. [y/N] could be the default with being a force -f switch where it always uninstalls it.
You would have to include and test that tamper protection still applies on this path which will be extra testing we have to keep around. I think it is much simpler if we force users to put unprivileged agents in policies that do not include Defend. There are no edge cases around Defend being conditionally uninstalled or removed that way. The composable agent policies work in the UI will help users deal with any shared content between the privileged and privileged policies.
So I'd say we should continue to refuse the conversion if defend is in the policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I was just worrying that it prevents issues with different personas in an organization. A persona that manages the policies and a persona that manages installation of the endpoints. That is probably rare and not something we need to worry about.
This pull request is now in conflicts. Could you fix it? 🙏
|
d75f062
to
0986097
Compare
return nil | ||
} | ||
|
||
func ensureNoServiceComponentIssues() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this has any test coverage. You could add a check in the endpoint test suite that the unprivileged command fails, or add a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True I will add an integration test, as I would prefer to see a true scenario be tested.
I have validated manually that this works on macOS, because we do not have integration testing coverage on macOS. I was able to switch between the modes successfully and the vaults are handled properly. |
Quality Gate failedFailed conditions |
return cmd | ||
} | ||
|
||
func unprivilegedCmd(streams *cli.IOStreams, cmd *cobra.Command) (err error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like there's quite some repetition meaning this and privilegedCmd function
What does this PR do?
Adds a new
unprivileged
andprivileged
subcommand that converts an installed Elastic Agent from privileged to unprivileged or back again.Why is it important?
To allow users to change the permission execution mode without having to re-install the Elastic Agent.
Checklist
[ ] I have made corresponding change to the default configuration files[ ] I have added tests that prove my fix is effective or that my feature works(will have integration tests)./changelog/fragments
using the changelog toolAuthor's Checklist
How to test this PR locally
Related issues