Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce redundant debug-logging on larger networks #3116

Draft
wants to merge 6 commits into
base: mainnet-staging
Choose a base branch
from

Conversation

vvp
Copy link
Contributor

@vvp vvp commented Feb 22, 2024

In larger BFT networks, some of the recurring debug-logging events are currently unnecessarily noisy which makes observing node/network operation harder. This PR improves this in two fronts.

Logging only validator set changes. Currently, each Gateway logs all connected validators (their SocketAddrs and Aleo addresses) in every heartbeat. During a stable operation in network of $n$ nodes, this means $n(n-1)$ redundant debug log lines, every 15 seconds. This PR instead only logs the validator set differences between each heartbeat, indicating connected and disconnected validators with + and - symbols . For example:

 INFO Connected to 41 validators (of 49 bonded validators)                                 
DEBUG   + 10.166.0.3:5011 - aleo1sufp275hshd7srrkxwdf7yczmc89em6e5ly933ftnaz64jlq8qysnuz88d
DEBUG   + 10.166.0.3:5003 - aleo12ux3gdauck0v60westgcpqj7v8rrcr3v346e4jtq04q7kkt22czsh808v2
DEBUG   + 10.166.0.3:5010 - aleo1s2tyzgqr9p95sxtj9t0s38cmz9pa26edxp4nv0s8mk3tmdzmqgzsctqhxg
DEBUG   - 10.166.0.3:5005 - aleo1l4z0j5cn5s6u6tpuqcj6anh30uaxkdfzatt9seap0atjcqk6nq9qnm9eqf
DEBUG   - 10.166.0.3:5008 - aleo1xh2lnryvtzxcvlz8zzgranu6yldaq5257cac44de4v0aasgu45yq3yk3yv

As this change involves using a RWLock and doing some extra allocations, it is only enabled when tracing::Level::DEBUG is enabled.

Folding batch proposal resends into a single line. Currently, when resending proposals, Primary::propose_batch() logs a single line per each non-signer SocketAddrs. This is repeated every 2.5 seconds and during network events, can result in a lot of redundant log spam. This PR folds the information content into a single debug-log line before resending the proposals.

I estimate that in larger networks these changes can reduce the total aggregated log size significantly due to combinatorial explosion - using my recent debug log of a 16-hour test run in a 50-node devnet as an example the reduction was ~50% (of a total of 7.1M lines, 3.5M lines were removed).

node/bft/src/gateway.rs Outdated Show resolved Hide resolved
node/bft/src/gateway.rs Outdated Show resolved Hide resolved
node/bft/src/gateway.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@ljedrz ljedrz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, left a few comments 👍

@howardwu howardwu marked this pull request as draft March 9, 2024 00:00
node/bft/src/gateway.rs Outdated Show resolved Hide resolved
vvp and others added 6 commits March 15, 2024 16:01
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: Vesa-Ville <v-v@equilibrium.co>
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: Vesa-Ville <v-v@equilibrium.co>
Co-authored-by: ljedrz <ljedrz@users.noreply.github.com>
Signed-off-by: Vesa-Ville <v-v@equilibrium.co>
Co-authored-by: Raymond Chu <14917648+raychu86@users.noreply.github.com>
Signed-off-by: Vesa-Ville <v-v@equilibrium.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants