-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fleet] Add Delete Button in Fleet Interface for Managing Diagnostic Logs #167366
Comments
@allamiro please be aware that i'm moving this issue to the Kibana repository instead. |
Pinging @elastic/fleet (Team:Fleet) |
This would be pretty low effort for the Fleet team to implement AFAIU - probably a quick win for us. For what it's worth, these indices are lifecycle-managed by an out-of-the-box ILM policy that does the following:
We could probably do a better job of mentioning that somewhere in the product or its associated docs 🙂 |
Well well, another issue that's more involved than it looks :) The Fleet API and UI changes are simple. The problem is with the actual deletion of a data stream doc, because data streams are meant to be pretty much append-only: https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html#data-streams-append-only The error(s) I run into when trying to delete the docs that contain the diagnostic file and metadata are well-described in this medium blog post. There are two possible workarounds:
|
@nimarezainia The base list here comes from agent diagnostic actions. This is so that we can show an in-progress state while the agent is uploading the file. If a file is deleted by ILM I believe it will still show up here like this due to 1) no file found + 2) current date exceeds the action expiration time (EDIT: I need to confirm this and look at the Kibana background task that reconciles the meta information with the deleted files). If a file is deleted immediately after being generated, it will show as |
## Summary Resolves #167366. This PR introduces the ability to manually delete diagnostics files before their ILM policy kicks in. This PR: - Adds a new `DELETE /api/fleet/agents/files/{fileId}` route which returns `{id: string, deleted: boolean}` - Deletes the file from `.fleet-fileds-fromhost-data-agent` data stream when a request is received, and updates the corresponding meta information in `.fleet-fileds-fromhost-meta-agent` to set it to `DELETED` status - This is similar to how the meta information [reconciliation via task manager](https://github.com/elastic/kibana/blob/9e5d6b3e0fa731a37374c091cd7284d1c41f116e/x-pack/plugins/fleet/server/services/files/index.ts#L148) is done when the ILM policy deletes diagnostics file - Updates the Agent details > Diagnostics tab: - Surface Delete action in files table for requests that resulted in files - Refactors existing UI around the copy text and generate button - Add `Show expired requests` toggle - Off by default, which means the following will be shown: - Files that are on disk - Files being generated - File requests which errored out but are still not expired (i.e. users can see errors with recent requests) - When toggled on, will additionally show: - File requests which errored out AND are expired - File requests that are just expired (i.e. edge case where a file was deleted by ILM but the meta info had not yet reconciled) FYI, the expiration threshold is currently only 3 minutes. This is a bug, see: #183692 The main reason for adding this toggle is to keep the initial list view clean. The items in this list are built from all `REQUEST_DIAGNOSTICS` agent actions that the user submits, which can be on a single agent or bulk agents. When a file is deleted manually with this new work, or by the existing ILM policy, we can correctly flag the associated action as having `DELETED` files and hide it from view. But when a request errors out or otherwise results in no files being generated, we still want to keep the history of the request (we have no precedent of deleting agent activity). Over time, this history is no longer useful for the user and just pollutes the table, so it is better to hide these items from the initial view. <img width="1314" alt="image" src="https://github.com/elastic/kibana/assets/1965714/36a2b405-5e58-4444-a114-ab06b842a505"> <img width="1330" alt="image" src="https://github.com/elastic/kibana/assets/1965714/bd8c40b1-46d9-47e3-880f-5c831b626398"> ### Testing Use the single and bulk request diagnostics feature and test the delete functionality. Go nuts :) ### Checklist - [x] Any text added follows [EUI's writing guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses sentence case text and includes [i18n support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md) - [x] [Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html) was added for features that require explanation or tutorials - [x] [Unit or functional tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html) were updated or added to match the most common scenarios --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Describe the enhancement:
I propose the addition of a 'Delete' button within the Elastic Fleet interface, allowing users to easily remove diagnostic logs collected through Fleet. This feature should allow users to select specific logs or select logs based on a set of criteria like date range and log source, and remove them directly from the Fleet interface without having to manually access Elasticsearch indices.
Describe a specific use case for the enhancement or feature:
In environments where a large number of logs are generated, the accumulated diagnostic data can quickly lead to increased storage costs and management overhead. Having a 'Delete' button directly in the Elastic Fleet interface will empower users to manage their storage more effectively and ensure that unnecessary or obsolete data can be promptly removed.
This enhancement is particularly crucial for organizations with stringent data retention policies or those operating in environments with limited available storage, enabling them to maintain compliance and optimize resource utilization efficiently.
For instance, a user could employ this feature to easily remove all diagnostic logs collected from a specific agent or a group of agents that are older than a certain date, or that match specific criteria, ensuring the conservation of storage space and compliance with data retention policies, without the need to navigate away from the Fleet interface or delve into Elasticsearch's Index Management.
Benefits:
Ease of Use:
Users can manage storage directly from Fleet, reducing the complexity involved in navigating to different parts of the Elastic Stack to delete data.
Efficient Resource Management:
Enables users to efficiently manage their storage resources by allowing them to promptly remove unnecessary or outdated logs.
Compliance:
Assists organizations in maintaining compliance with data retention policies by facilitating the easy removal of obsolete data.
Reduced Overhead:
Mitigates administrative overhead associated with managing storage and data lifecycle in Elasticsearch.
Implementation Suggestions:
Selection Criteria:
The delete feature should allow users to specify criteria like date ranges, log sources, or log types, to identify the logs they wish to remove.
Confirmation Prompt:
Before final deletion, users should receive a confirmation prompt detailing the logs to be deleted to prevent accidental data loss.
Logging & Audit Trail:
All deletion actions should be logged to maintain an audit trail, providing visibility into which logs were deleted, by whom, and when.
Permission Controls:
Adequate permission controls should be put in place to ensure that only authorized users can delete logs, preventing unauthorized access and potential data loss.
The introduction of a 'Delete' button within the Fleet interface for managing diagnostic logs would enhance user experience, data management, and compliance with data retention policies, fostering efficient utilization of storage resources and reducing administrative overhead.
The text was updated successfully, but these errors were encountered: