Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add improved handling for TLS certificates for static builds. #17605

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Ferroin
Copy link
Member

@Ferroin Ferroin commented May 6, 2024

Summary

Our current static builds include a bundled copy of the CA certificates available in the build environment, and will automatically fall back to using those if they can’t find a usable set of CA certificates on the target system.

This works in a majority of cases, but it has a couple of issues:

  • It doesn’t properly support the case of users who want to just use our bundled certificates unconditionally for some reason.
  • It doesn’t actually test that the system certificates work to connect to Netdata infrastructure, and thus it’s entirely possible that we end up with an install we can’t claim because the users are refusing to follow security best practices and keep their CA certificate store up to date.

This PR works to remedy those issues. It adds additional logic to the installation code embedded in the static builds that handles the TLS certificates, adding support both for checking the usability of the selected certificates and for skipping the system certificates and just using the bundled ones. This functionality is controlled by two new options and associated environment variables:

  • --certificates and NETDATA_CERT_MODE allow specifying the certificate handling mode. auto and system have the current behavior. check expands the current behavior by testing the detected system certificate store for usability by connecting to a well-known URL. bundled ignores any system certificates and unconditionally uses the bundled certificates. Defaults to auto if unspecified (thus preserving the existing behavior).
  • --certificate-test-url and NETDATA_CERT_TEST_URL allow overriding the URL used by check mode for testing whether the system certificates are usable. Defaults to https://app.netdata.cloud if unspecified.

The options override any value specified by the environment variables.

Additionally, the kickstart script is updated to pass the claiming URL as the certificate test URL when installing a static build, and to make the certificate handling mode default to check for installs other than offline installs and auto mode for offline installs. The values passed by the kickstart script can still be overridden by using the --static-install-options option to add the --certificates or --certificate-test-url options as needed.

Test Plan

Testing requires manually installing static builds produced from this PR on a variety of systems with various values for the --certificates and --certificate-test-url options and observing how the symlink at /opt/netdata/etc/ssl is updated.

@github-actions github-actions bot added the area/packaging Packaging and operating systems support label May 6, 2024
@Ferroin Ferroin marked this pull request as ready for review May 13, 2024 12:19
@Ferroin Ferroin requested a review from a team as a code owner May 13, 2024 12:20
@Ferroin Ferroin requested a review from a team May 13, 2024 12:20
@Ferroin
Copy link
Member Author

Ferroin commented May 13, 2024

Flagging @netdata/agent as a reviewer for help with testing.

@Ferroin Ferroin requested a review from a team May 29, 2024 11:00
Comment on lines 232 to 234
select_internal_certs() {
echo "Using bundled TLS configuration and certificates"
ln -sf /opt/netdata/share/ssl /opt/netdata/etc/ssl
Copy link
Member

@ilyam8 ilyam8 May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work because if a directory or symlink to a directory with the target name already exists, the symlink will be created inside it.

There is -n option (fixes the problem)

-n, --no-dereference
       treat LINK_NAME as a normal file if it is a symbolic link to a directory

but POSIX ln doesn't have it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# ln -sfv /opt/netdata/share/ssl /opt/netdata/etc/ssl
'/opt/netdata/etc/ssl/ssl' -> '/opt/netdata/share/ssl

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fixed by the latest commit.

Comment on lines +241 to +242
test_certs() {
/opt/netdata/bin/curl --fail --silent --output /dev/null "${NETDATA_CERT_TEST_URL}" || return 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we check for 60 exit code only?

       60     Peer certificate cannot be authenticated with known CA
              certificates.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Failing to check at all means we run in to the same potential issue we have now, so I would prefer to just act as if the check failed in that case instead of risking claiming not working.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand, we only need to use the bundled version when the error is "The peer certificate cannot be authenticated using known CA certificates." Why doing it on any other error? If someone set wrong "test claim URL" or any other not "cant auth using known CA certs" error. Can you provide an example of other error you think we need to use bundle?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are literally hundreds of reasons that the cURL command could fail that have nothing to do with certificates. Any of those cases mean that we have no idea if the system certificates work or not.

Unless it’s an offline install though, it’s safer in that case to use our bundled certificates instead of the system certificates, because we know that will work in the common case of trying to connect to the Cloud, while we do not know if the system certificates will.

else
replace_symlink() {
target="${1}"
name="${2}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Ferroin ,

please, take a look on this warning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/packaging Packaging and operating systems support
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants