Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a script to gather GitHub issue stats #4951

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

huydhn
Copy link
Contributor

@huydhn huydhn commented Feb 15, 2024

This is the initial attempt to get some stats for #4930. The script can be run to fetch stats on GitHub issues with certain labels.

This needs ROCKSET_API_KEY env variable to query Rockset and GITHUB_TOKEN to fetch the issue timeline. We don't store the latter anywhere yet and still need to get it from GitHub.

The Rockset query is at https://console.rockset.com/lambdas/details/commons.query_github_issues

Here are some examples:

  • python fetch_github_issue_stats.py --label "module: mps" --output module-mps-issue-stats.csv with fetch all issues with module: mps label together with their timeline. The data is aggregated in a 2-week windows and written to a CSV. Initially, the output includes the count, how many of them are hi-priority, the percentile triaged time in hours (p50/90/100), and the count of issue that has not yet been triaged.
created_at,total,hi-pri,triaged_time_hour_p50,triaged_time_hour_p90,triaged_time_hour_p100,not_yet_triaged
2022-05-15 00:00:00+00:00,5,1,14,89,138,0
2022-05-29 00:00:00+00:00,50,6,13,53,119,2
2022-06-12 00:00:00+00:00,20,4,21,75,165,1
2022-06-26 00:00:00+00:00,15,1,20,74,97,0
2022-07-10 00:00:00+00:00,14,2,25,54,156,0
2022-07-24 00:00:00+00:00,6,0,70,110,115,1
2022-08-07 00:00:00+00:00,15,5,22,81,91,1
2022-08-21 00:00:00+00:00,12,1,25,43,57,2
2022-09-04 00:00:00+00:00,16,3,52,91,104,1
2022-09-18 00:00:00+00:00,13,0,29,69,81,2
2022-10-02 00:00:00+00:00,14,2,21,60,88,3
2022-10-16 00:00:00+00:00,26,1,24,38,80,7
2022-10-30 00:00:00+00:00,23,0,30,112,121,0
2022-11-13 00:00:00+00:00,12,1,48,68,72,3
2022-11-27 00:00:00+00:00,10,0,74,91,98,1
2022-12-11 00:00:00+00:00,13,0,19,73,78,1
2022-12-25 00:00:00+00:00,4,0,65,91,98,1
2023-01-08 00:00:00+00:00,7,1,29,38,47,1
2023-01-22 00:00:00+00:00,12,1,21,68,97,1
2023-02-05 00:00:00+00:00,4,0,65,77,78,0
2023-02-19 00:00:00+00:00,11,0,9,67,98,0
2023-03-05 00:00:00+00:00,13,2,40,89,91,1
2023-03-19 00:00:00+00:00,19,2,20,64,77,3
2023-04-02 00:00:00+00:00,12,0,48,83,101,0
2023-04-16 00:00:00+00:00,13,2,22,36,56,0
2023-04-30 00:00:00+00:00,10,1,18,55,61,2
2023-05-14 00:00:00+00:00,11,1,10,35,80,1
2023-05-28 00:00:00+00:00,5,1,46,74,84,1
2023-06-11 00:00:00+00:00,4,0,35,63,71,0
2023-06-25 00:00:00+00:00,2,0,13,23,26,0
2023-07-09 00:00:00+00:00,13,1,13,39,70,0
2023-07-23 00:00:00+00:00,7,0,33,76,96,0
2023-08-06 00:00:00+00:00,10,0,13,64,79,0
2023-08-20 00:00:00+00:00,5,1,15,18,20,0
2023-09-03 00:00:00+00:00,5,1,9,52,81,0
2023-09-17 00:00:00+00:00,4,0,25,35,38,0
2023-10-01 00:00:00+00:00,2,0,22,39,44,0
2023-10-15 00:00:00+00:00,6,0,6,22,25,0
2023-10-29 00:00:00+00:00,10,3,12,36,71,3
2023-11-12 00:00:00+00:00,6,1,43,77,80,0
2023-11-26 00:00:00+00:00,3,0,6,10,11,0
2023-12-10 00:00:00+00:00,5,2,7,27,32,2
2023-12-24 00:00:00+00:00,5,2,14,22,26,1
2024-01-07 00:00:00+00:00,5,1,56,99,106,1
2024-01-21 00:00:00+00:00,7,1,12,20,22,2
2024-02-04 00:00:00+00:00,9,1,10,101,130,2
2024-02-18 00:00:00+00:00,4,0,25,30,31,0
  • Multiple labels can be select and only issues with all of them are fetched, i.e. python fetch_github_issue_stats.py --label "module: mps" "high priority" --output module-mps-hi-pri-issue-stats.csv
  • A start and stop date can be selected. The stop date default is the current date python fetch_github_issue_stats.py --label "module: mps" --start-date 2024-02-01 --output module-mps-feb-issue-stats.csv

To avoid querying Rockset and GitHub excessively, after the command is run, all the raw data will be stored locally in a json file named data-EPOCH.json and can be provided as the input when rerunning the script, i.e. ython fetch_github_issue_stats.py --label "module: mps" --input data-1708029115.json --output module-mps-feb-issue-stats.csv

Copy link

vercel bot commented Feb 15, 2024

@huydhn is attempting to deploy a commit to the Meta Open Source Team on Vercel.

A member of the Team first needs to authorize it.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 15, 2024
@huydhn huydhn requested a review from albanD February 15, 2024 20:34
Copy link

vercel bot commented Feb 15, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
torchci ⬜️ Ignored (Inspect) Visit Preview Feb 15, 2024 8:37pm

@huydhn
Copy link
Contributor Author

huydhn commented Feb 15, 2024

For the example python fetch_github_issue_stats.py --label "module: mps" --output module-mps-issue-stats.csv I have the input JSON and the output CSV files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants