Add a script to gather GitHub issue stats #4951
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is the initial attempt to get some stats for #4930. The script can be run to fetch stats on GitHub issues with certain labels.
This needs ROCKSET_API_KEY env variable to query Rockset and GITHUB_TOKEN to fetch the issue timeline. We don't store the latter anywhere yet and still need to get it from GitHub.
The Rockset query is at https://console.rockset.com/lambdas/details/commons.query_github_issues
Here are some examples:
python fetch_github_issue_stats.py --label "module: mps" --output module-mps-issue-stats.csv
with fetch all issues withmodule: mps
label together with their timeline. The data is aggregated in a 2-week windows and written to a CSV. Initially, the output includes the count, how many of them are hi-priority, the percentile triaged time in hours (p50/90/100), and the count of issue that has not yet been triaged.python fetch_github_issue_stats.py --label "module: mps" "high priority" --output module-mps-hi-pri-issue-stats.csv
python fetch_github_issue_stats.py --label "module: mps" --start-date 2024-02-01 --output module-mps-feb-issue-stats.csv
To avoid querying Rockset and GitHub excessively, after the command is run, all the raw data will be stored locally in a json file named
data-EPOCH.json
and can be provided as the input when rerunning the script, i.e.ython fetch_github_issue_stats.py --label "module: mps" --input data-1708029115.json --output module-mps-feb-issue-stats.csv