Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add Delete Button in Fleet Interface for Managing Diagnostic Logs #167366

Closed
allamiro opened this issue Sep 26, 2023 · 6 comments · Fixed by #183690
Closed

[Fleet] Add Delete Button in Fleet Interface for Managing Diagnostic Logs #167366

allamiro opened this issue Sep 26, 2023 · 6 comments · Fixed by #183690
Assignees
Labels
QA:Ready for Testing Code is merged and ready for QA to validate Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@allamiro
Copy link

allamiro commented Sep 26, 2023

fleet-enhancement
Describe the enhancement:
I propose the addition of a 'Delete' button within the Elastic Fleet interface, allowing users to easily remove diagnostic logs collected through Fleet. This feature should allow users to select specific logs or select logs based on a set of criteria like date range and log source, and remove them directly from the Fleet interface without having to manually access Elasticsearch indices.

Describe a specific use case for the enhancement or feature:

In environments where a large number of logs are generated, the accumulated diagnostic data can quickly lead to increased storage costs and management overhead. Having a 'Delete' button directly in the Elastic Fleet interface will empower users to manage their storage more effectively and ensure that unnecessary or obsolete data can be promptly removed.
This enhancement is particularly crucial for organizations with stringent data retention policies or those operating in environments with limited available storage, enabling them to maintain compliance and optimize resource utilization efficiently.
For instance, a user could employ this feature to easily remove all diagnostic logs collected from a specific agent or a group of agents that are older than a certain date, or that match specific criteria, ensuring the conservation of storage space and compliance with data retention policies, without the need to navigate away from the Fleet interface or delve into Elasticsearch's Index Management.
Benefits:

  • Ease of Use:
    Users can manage storage directly from Fleet, reducing the complexity involved in navigating to different parts of the Elastic Stack to delete data.

  • Efficient Resource Management:
    Enables users to efficiently manage their storage resources by allowing them to promptly remove unnecessary or outdated logs.

  • Compliance:
    Assists organizations in maintaining compliance with data retention policies by facilitating the easy removal of obsolete data.

  • Reduced Overhead:
    Mitigates administrative overhead associated with managing storage and data lifecycle in Elasticsearch.
    Implementation Suggestions:

  • Selection Criteria:
    The delete feature should allow users to specify criteria like date ranges, log sources, or log types, to identify the logs they wish to remove.

  • Confirmation Prompt:
    Before final deletion, users should receive a confirmation prompt detailing the logs to be deleted to prevent accidental data loss.

  • Logging & Audit Trail:
    All deletion actions should be logged to maintain an audit trail, providing visibility into which logs were deleted, by whom, and when.

  • Permission Controls:
    Adequate permission controls should be put in place to ensure that only authorized users can delete logs, preventing unauthorized access and potential data loss.

The introduction of a 'Delete' button within the Fleet interface for managing diagnostic logs would enhance user experience, data management, and compliance with data retention policies, fostering efficient utilization of storage resources and reducing administrative overhead.

@jlind23
Copy link
Contributor

jlind23 commented Sep 27, 2023

@allamiro please be aware that i'm moving this issue to the Kibana repository instead.
@nimarezainia could you please take a look at this request.

@jlind23 jlind23 transferred this issue from elastic/fleet-server Sep 27, 2023
@jlind23 jlind23 added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 27, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kpollich
Copy link
Member

This would be pretty low effort for the Fleet team to implement AFAIU - probably a quick win for us.

For what it's worth, these indices are lifecycle-managed by an out-of-the-box ILM policy that does the following:

  • Deletes diagnostics "data" (the actual binary .zip contents) after 7 days
  • Deletes diagnostics "meta" (the other stuff related to the upload e.g. timestamps, status flags, etc after 90 days

We could probably do a better job of mentioning that somewhere in the product or its associated docs 🙂

@nimarezainia
Copy link
Contributor

@kpollich thanks for that info. I assume that the diagnostic file mentioned in the Fleet UI would also then be removed? (once the data is deleted).

fyi @kilfoyle

@jen-huang
Copy link
Contributor

jen-huang commented May 14, 2024

Well well, another issue that's more involved than it looks :)

The Fleet API and UI changes are simple. The problem is with the actual deletion of a data stream doc, because data streams are meant to be pretty much append-only: https://www.elastic.co/guide/en/elasticsearch/reference/current/data-streams.html#data-streams-append-only

The error(s) I run into when trying to delete the docs that contain the diagnostic file and metadata are well-described in this medium blog post.

There are two possible workarounds:

  1. Perform a DELETE request against the backing index directly, which will involve an additional query to find the backing index first 🥲
  2. Perform a _delete_by_query against the data stream - I can probably target just one ID with this, so will try this workaround first

@jen-huang
Copy link
Contributor

jen-huang commented May 14, 2024

@kpollich thanks for that info. I assume that the diagnostic file mentioned in the Fleet UI would also then be removed? (once the data is deleted).

@nimarezainia The base list here comes from agent diagnostic actions. This is so that we can show an in-progress state while the agent is uploading the file. If a file is deleted by ILM I believe it will still show up here like this due to 1) no file found + 2) current date exceeds the action expiration time (EDIT: I need to confirm this and look at the Kibana background task that reconciles the meta information with the deleted files).

Image

If a file is deleted immediately after being generated, it will show as Generating until the action expires. Obviously not a great experience, will look to see what I can do about deleting the action "history" if the user manually triggers a delete (and they should be able to delete ones marked as Expired due to ILM):

Image

jen-huang added a commit that referenced this issue May 20, 2024
## Summary

Resolves #167366.

This PR introduces the ability to manually delete diagnostics files
before their ILM policy kicks in. This PR:

- Adds a new `DELETE /api/fleet/agents/files/{fileId}` route which
returns `{id: string, deleted: boolean}`
- Deletes the file from `.fleet-fileds-fromhost-data-agent` data stream
when a request is received, and updates the corresponding meta
information in `.fleet-fileds-fromhost-meta-agent` to set it to
`DELETED` status
- This is similar to how the meta information [reconciliation via task
manager](https://github.com/elastic/kibana/blob/9e5d6b3e0fa731a37374c091cd7284d1c41f116e/x-pack/plugins/fleet/server/services/files/index.ts#L148)
is done when the ILM policy deletes diagnostics file
 - Updates the Agent details > Diagnostics tab:
- Surface Delete action in files table for requests that resulted in
files
   - Refactors existing UI around the copy text and generate button
   - Add `Show expired requests` toggle
     - Off by default, which means the following will be shown:
       - Files that are on disk
       - Files being generated
- File requests which errored out but are still not expired (i.e. users
can see errors with recent requests)
     - When toggled on, will additionally show:
       - File requests which errored out AND are expired
- File requests that are just expired (i.e. edge case where a file was
deleted by ILM but the meta info had not yet reconciled)

FYI, the expiration threshold is currently only 3 minutes. This is a
bug, see: #183692

The main reason for adding this toggle is to keep the initial list view
clean. The items in this list are built from all `REQUEST_DIAGNOSTICS`
agent actions that the user submits, which can be on a single agent or
bulk agents.

When a file is deleted manually with this new work, or by the existing
ILM policy, we can correctly flag the associated action as having
`DELETED` files and hide it from view. But when a request errors out or
otherwise results in no files being generated, we still want to keep the
history of the request (we have no precedent of deleting agent
activity). Over time, this history is no longer useful for the user and
just pollutes the table, so it is better to hide these items from the
initial view.

<img width="1314" alt="image"
src="https://github.com/elastic/kibana/assets/1965714/36a2b405-5e58-4444-a114-ab06b842a505">

<img width="1330" alt="image"
src="https://github.com/elastic/kibana/assets/1965714/bd8c40b1-46d9-47e3-880f-5c831b626398">

### Testing
Use the single and bulk request diagnostics feature and test the delete
functionality. Go nuts :)

### Checklist

- [x] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [x]
[Documentation](https://www.elastic.co/guide/en/kibana/master/development-documentation.html)
was added for features that require explanation or tutorials
- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
@jen-huang jen-huang added the QA:Ready for Testing Code is merged and ready for QA to validate label May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QA:Ready for Testing Code is merged and ready for QA to validate Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants