Various reports generated from DevStats databases
On bare metal:
- For available reports, do:
ls sql/
. Use any file without.sql
extension, for example: developers, issues, prs, commits etc. - Running single report:
PG_PASS=... ./sh/run.sh (developers|issues|prs|..|contributing_companies_list) 2016-01-01 2017-01-01
. - Some reports require passing additional parameters, example:
PG_PASS=... ./sh/run.sh company_contributions 2016-01-01 2017-01-01 {{type}} known {{company}} Google
. - Running report for multiple date ranges:
[SKIPDT=1] PG_PASS=... ./sh/rep.sh (quarters|years|join|YYYY-MM-DD) (developers|developers_count|...|contributing_companies_list) {{actor}} author
. - Running cumulative report:
CUMULATIVE=2014-01-01 PG_PASS=... ./sh/rep.sh (quarters|years|join|YYYY-MM-DD) (developers|developers_count|...|contributing_companies_list) {{actor}} author
. - Additional parameters example:
SKIPDT=1 PG_PASS=... ./sh/rep.sh years company_contributions {{type}} all {{company}} 'Red Hat'
. - Example with limits:
SKIPDT=1 PG_PASS=... CUMULATIVE=2012-01-01 ./sh/rep.sh quarters contributing_companies_list {{lim}} 8
. - Company commits percent:
SKIPDT=1 PG_PASS=... ./sh/rep.sh 2015-07-21 company_commits {{type}} known {{actor}} author {{company}} 'Red Hat'
. - Company committers list:
SKIPDT=1 PG_PASS=... ./sh/rep.sh 2016-03-10 company_committers_list {{order}} 'commits desc' {{actor}} author
. - Company committers percent:
SKIPDT=1 PG_PASS=... ./sh/rep.sh 2016-03-10 company_committers_percent {{type}} all {{actor}} author {{company}} 'Google'
. - Stats for all CNCF projects:
PG_PASS=... ./sh/all.sh ./sh/rep.sh all contributors_count
. - Example contributors stats:
PG_PASS=... ./sh/all.sh ./sh/rep.sh all company_contributors {{type}} known {{company}} Apple
. - Company committers per CNCF projects:
SKIPDT=1 PG_PASS=... ./sh/all.sh ./sh/rep.sh all company_committers_percent {{type}} known {{company}} Apple {{actor}} author
. - Company committers emails for all CNCF projects:
SKIPDT=1 PG_DB=allprj CSV=emails.csv PG_PASS=... ./sh/rep.sh all company_committers_emails {{type}} known {{company}} Apple {{actor}} author
. - List company's individual commits:
SKIPDT=1 PG_DB=allprj PG_PASS=... CSV=commits_links.csv ./sh/rep.sh all company_commits_links {{type}} known {{company}} Apple {{actor}} author
. - Some country stats:
PG_PASS=... ./sh/rep.sh years country_stats {{country}} 'United States' {{cumulative}} 'cum' {{period}} m {{repogroup}} all {{metric}} contributions
. - Yearly contributors with know country:
PG_PASS=... SKIPDT=1 CUMULATIVE=2012-01-01 ./sh/rep.sh years contributors_count_with_country
. - Yearly contributions from contributors with known country:
PG_PASS=... SKIPDT=1 CUMULATIVE=2012-01-01 ./sh/rep.sh years contributions_with_country
. - After report is processed
out.csv
file is generated - it can be used for creating charts and other data analysis. - Velocity metrics:
PG_PASS=... ./sh/rep.sh quarters velocity {{actor}} author
. - Documentation commits, committers and companies:
PG_PASS=... CUMULATIVE=2014-01-01 ./sh/rep.sh quarters documentation {{actor}} author
. - Committers stats:
PG_DB=cii PG_PASS=... ./affs/committers.sh
.
- Go to
cncf/devstats-helm
repo and follow instructions fromtest/README.md
file, especiallyCreate reports pod
section. - Once you shell into the reports pod, you can act just like on the bare metal server. Putting any CSV files generated while reporting in
/data
directory will make them available at:https://teststats.cncf.io/backups/
.
Details about running reports pod:
- Create reports pod:
helm install devstats-test-reports ./devstats-helm --set skipSecrets=1,skipPVs=1,skipBackupsPV=1,skipVacuum=1,skipBackups=1,skipBootstrap=1,skipProvisions=1,skipCrons=1,skipAffiliations=1,skipGrafanas=1,skipServices=1,skipPostgres=1,skipIngress=1,skipStatic=1,skipNamespaces=1,reportsPod=1,projectsOverride='+cncf\,+opencontainers\,+istio\,+zephyr\,+linux\,+rkt\,+sam\,+azf\,+riff\,+fn\,+openwhisk\,+openfaas\,+cii\,+prestodb\,+opentracing'
- Shell into reports pod:
../devstats-k8s-lf/util/pod_shell.sh devstats-reports
and then run some report: (seecncf/devstats-reports:README.md
: example:PG_DB=cii ./affs/unknown_committers.sh
). To shell from different namespace:k exec -it -n devstats-test devstats-reports -- /bin/bash
. - If you move any generated CSV file into the
/data
directory (which is RWX PV mount) - that file will be available at:https://teststats.cncf.io/backups/
. - Finally delete reporting pod:
helm delete devstats-test-reports
(but you can leave it running, it is just sleeping forever waiting for shell connection).
- Shell into the reports pod and execute
./contributors/update_contributors.sh
. - Follow uploading to google sheets part of this documentation.
Company contributions:
PG_DB=prometheus PG_PASS=... ./sh/run.sh contributing_companies_list 2014-01-01 2020-01-01 {{lim}} 8
.PG_DB=prometheus PG_PASS=... ./sh/rep.sh quarters company_contributions {{type}} known {{company}} 'Robust Perception'
.SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh quarters company_contributions {{type}} known {{company}} 'Red Hat'
.
Company percent contributions:
SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh quarters company_contributions {{type}} known {{company}} 'Robust Perception'
.
Contributing companies:
SKIPDT=1 CUMULATIVE=2012-01-01 PG_DB=prometheus PG_PASS=... ./sh/rep.sh quarters contributing_companies_count; cat out.csv
.PG_DB=prometheus PG_PASS=... ./sh/rep.sh prometheus_join contributing_companies_count
.
Contributors:
SKIPDT=1 CUMULATIVE=2012-01-01 PG_DB=prometheus PG_PASS=... ./sh/rep.sh quarters contributors_count; cat out.csv
.PG_DB=prometheus PG_PASS=... ./sh/rep.sh prometheus_join contributors_count
.
Countries contributors:
SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh months country_stats {{cumulative}} cum {{period}} m {{repogroup}} all {{metric}} contributors {{country}} Poland
.CUMULATIVE=2012-01-01 PG_DB=prometheus PG_PASS=... ./sh/rep.sh prometheus_join contributors_with_country
.CUMULATIVE=2012-01-01 SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh months country_contributors {{country}} 'United States'
.CUMULATIVE=2012-01-01 PG_DB=prometheus SKIPDT=1 PG_PASS=... ./sh/rep.sh months contributors_count_with_country
.
Countries contributions:
SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh months country_stats {{cumulative}} cum {{period}} m {{repogroup}} all {{metric}} contributions {{country}} Poland
.CUMULATIVE=2012-01-01 PG_DB=prometheus PG_PASS=... ./sh/rep.sh prometheus_join contributions_with_country
.CUMULATIVE=2012-01-01 SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh months country_contributions {{country}} 'United States'
.CUMULATIVE=2012-01-01 PG_DB=prometheus SKIPDT=1 PG_PASS=... ./sh/rep.sh months contributions_by_country
.
PRs:
- `` CUMULATIVE=2012-01-01 SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh months prs ```.
- `` CUMULATIVE=2012-01-01 SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh quarters prs ```.
- `` CUMULATIVE=2012-01-01 SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh years prs ```.
SKIPDT='' PG_DB=prometheus PG_PASS=... ./sh/rep.sh prometheus_join prs
.
Commits:
CUMULATIVE=2012-01-01 SKIPDT=1 PG_DB=prometheus PG_PASS=... ./sh/rep.sh months commits {{actor}} author
.SKIPDT='' PG_DB=prometheus PG_PASS=... ./sh/rep.sh prometheus_join commits {{actor}} author
.
Velocity:
PG_PASS=... CUMULATIVE=2010-01-01 PG_DB=prometheus ./sh/rep.sh months velocity {{actor}} author
.PG_PASS=... PG_DB=prometheus ./sh/rep.sh prometheus_join velocity {{actor}} author
.
Documentation:
PG_PASS=... CUMULATIVE=2010-01-01 SKIPDT='' PG_DB=prometheus ./sh/rep.sh months documentation {{actor}} author
.
clear; PG_DB=allprj ./affs/company_contributors.sh Apple
.clear; PG_DB=allprj ./affs/company_contributors_repo_groups.sh 'Adobe Inc.'
.- They generate files for download, like:
wget https://devstats.cncf.io/backups/allprj_Adobe_Inc__contributors.csv
.
PRs:
- Company PRs monthly chart:
PG_DB=allprj ./sh/rep.sh months company_prs {{company_name}} company_name {{company}} 'Adobe Inc.'; mv out.csv /data/adobe_monthly_prs.csv
. Thenwget https://devstats.cncf.io/backups/adobe_monthly_prs.csv
. - Company PRs total value:
PG_DB=allprj ./sh/run.sh company_prs 2014-01-01 2024-01-01 {{company_name}} company_name {{company}} 'Adobe Inc.'
. - Company PRs monthly chart (cumulative):
PG_DB=allprj CUMULATIVE=2010-01-01 ./sh/rep.sh months company_prs {{company_name}} company_name {{company}} 'Adobe Inc.'; mv out.csv /data/adobe_monthly_prs_cum.csv
. Thenwget https://devstats.cncf.io/backups/adobe_monthly_prs_cum.csv
.
Commits:
- Company commits monthly chart:
PG_DB=allprj ./sh/rep.sh months company_commits {{company_name}} company_name {{company}} 'Adobe Inc.' {{type}} known {{actor}} actor|author|committer; mv out.csv /data/adobe_monthly_commits.csv
. Thenwget https://devstats.cncf.io/backups/adobe_monthly_commits.csv
. - Company commits total value:
PG_DB=allprj ./sh/run.sh company_commits 2014-01-01 2024-01-01 {{company_name}} company_name {{company}} 'Adobe Inc.' {{type}} known {{actor}} actor
. - Company commits monthly chart (cumulative):
PG_DB=allprj CUMULATIVE=2010-01-01 ./sh/rep.sh months company_commits {{company_name}} company_name {{company}} 'Adobe Inc.' {{type}} known {{actor}} actor; mv out.csv /data/adobe_monthly_commits_cum.csv
. Thenwget https://devstats.cncf.io/backups/adobe_monthly_commits_cum.csv
.
Contributors having at least N=20 contributions in PROJ=opentelementry project during the last year:
./contributors/contributors_with_n_dtrange.sh opentelemetry "now()-'1 year'::interval" 'now()' 20
.
- SSH into
node-0
DevStats node. - Create reporting pod:
helm install devstats-prod-reports ./devstats-helm --set skipSecrets=1,skipPVs=1,skipBackupsPV=1,skipVacuum=1,skipBackups=1,skipBootstrap=1,skipProvisions=1,skipCrons=1,skipAffiliations=1,skipGrafanas=1,skipServices=1,skipPostgres=1,skipIngress=1,skipStatic=1,skipAPI=1,skipNamespaces=1,reportsPod=1,namespace='devstats-prod'
. - Shell into reports pod:
../devstats-k8s-lf/util/pod_shell.sh devstats-reports
and then run some report: (seecncf/devstats-reports:README.md
: example:PG_DB=cii ./affs/unknown_committers.sh
). To shell from different namespace:k exec -it -n devstats-prod devstats-reports -- /bin/bash
. ./sh/gdpr.sh 'Identity 1' 'name 2' 'email 3' ...
.- You don't need to downcase names, remove spaces, @, ! etc.
- Finally delete reporting pod:
helm delete devstats-prod-reports
(but you can leave it running, it is just sleeping forever waiting for shell connection).
While in reporting pod:
clear && GHA2DB_CSVOUT=/data/world_map.csv GHA2DB_QOUT=1 PG_DB=gha ./sh/run.sh world_map 2014-01-01 2023-02-04 {{repo_group}} '' {{type}} login
- contributors.clear && GHA2DB_CSVOUT=/data/world_map2.csv PG_DB=allprj ./sh/run.sh world_map 2014-01-01 2023-02-04 {{repo_group}} 'Kubernetes' {{type}} event_id
- contributions.
Locally:
wget https://devstats.cncf.io/backups/world_map.csv
- to download.
This calculates country ranks (by the number of contributors and contributions):
[EXTRA_COND="and \"Date from\" = '2023-01-01' and \"Date to\" = '2024-01-01'"] PG_DB=allprj ./sh/country_ranks.sh Poland
.- Then get the generated CSV:
wget https://devstats.cncf.io/backups/Poland_country_ranks_in_allprj.csv
. - Make a copy of this China report and replace with some other country stats.
- Example report for India.
For contributors we can have 3 situations:
- contributor without any affiliations defined - all his/her contributions are unknown.
- contributor with fully defined affiliations - all his/her contributions are affiliated to a company or marked as independent.
- contributor has partial affiliations defined (date ranges does not cover all time) and some of his/her contributions are known and others unknown - we call this mixed affiliation.
- You can run SQL to generate this data for any CNCF project for all time:
sql/percent_known_contributors.sql
. - You can also run for a specific time range and CNCF project:
PG_DB=allprj ./sh/run.sh contributions_affiliations_quality_percent 2023-01-01 2024-01-01 {{company_name}} company_name
. - Example result for all CNCF projects combined for all time:
devstats-reports:/# PG_DB=allprj ./sh/run.sh contributors_affiliations_quality_percent 2014-01-01 2024-01-01 {{company_name}} company_name
Compiled 2023-10-12_04:48:13AM, commit: 989cef61420a093e10d0a251408b6a4986133996 on Linux_darkstar_5.4.0-81-generic_#91-Ubuntu_SMP_Thu_Jul_15_19:09:17_UTC_2021_x86_64_x86_64_x86_64_GNU/Linux using go_version_go1.20.3_linux/amd64
/----------------------+------------------------+----------------------+------------------------+------------------+----------------+-------------------+-------------------+-------------------+-------------------+----------------------\
|known_contributors_max|unknown_contributors_max|known_contributors_min|unknown_contributors_min|mixed_contributors|all_contributors|percent_known_max |percent_unknown_max|percent_known_min |percent_unknown_min|percent_mixed |
+----------------------+------------------------+----------------------+------------------------+------------------+----------------+-------------------+-------------------+-------------------+-------------------+----------------------+
|40729 |186059 |40715 |186045 |14 |226774 |17.9601718010001147|82.0460017462319314|17.9539982537680686|82.0398281989998853|0.00617354723204600175|
\----------------------+------------------------+----------------------+------------------------+------------------+----------------+-------------------+-------------------+-------------------+-------------------+----------------------/
Rows: 1
Time: 1m6.445201838s
This means that we know about 17.95% - 17.96% of all contributors (note that this number will be much higer for contributions, for contributors there is a very long tail of contributors making 1-3 contributions that were not checked yet).
For contributions we can have a similar 3 situaltions:
- contribution (like commit) can have up to 3 contributors: pusher/actor, committer and author.
- if all contribution's contributors affiliations are known at the contribution's create time then contribution is known.
- if none of contribution's contributors affiliations are known at that date then contribution is unknown.
- if we know affiliation details for some contributors and don't know for others then that contribution is mixed.
- You can run SQL to generate this data for any CNCF project for all time:
sql/percent_known_contributors.sql
. - You can also run for a specific time range and CNCF project:
PG_DB=grpc ./sh/run.sh contributors_affiliations_quality_percent 2020-01-01 2022-01-01 {{company_name}} company_name
. - Example result for all CNCF projects combined for all time:
devstats-reports:/# PG_DB=allprj ./sh/run.sh contributions_affiliations_quality_percent 2014-01-01 2024-01-01 {{company_name}} company_name
Compiled 2023-10-12_04:48:13AM, commit: 989cef61420a093e10d0a251408b6a4986133996 on Linux_darkstar_5.4.0-81-generic_#91-Ubuntu_SMP_Thu_Jul_15_19:09:17_UTC_2021_x86_64_x86_64_x86_64_GNU/Linux using go_version_go1.20.3_linux/amd64
/-----------------------+-------------------------+-----------------------+-------------------------+-----------------+-------------------+-------------------+-------------------+-------------------+-------------------+----------------------\
|known_contributions_max|unknown_contributions_max|known_contributions_min|unknown_contributions_min|all_contributions|mixed_contributions|percent_known_max |percent_unknown_max|percent_known_min |percent_unknown_min|percent_mixed |
+-----------------------+-------------------------+-----------------------+-------------------------+-----------------+-------------------+-------------------+-------------------+-------------------+-------------------+----------------------+
|13333382 |1862500 |13236307 |1765425 |15098807 |97075 |88.3075199252497234|12.3354116653057424|87.6645883346942576|11.6924800747502766|0.64293159055546573978|
\-----------------------+-------------------------+-----------------------+-------------------------+-----------------+-------------------+-------------------+-------------------+-------------------+-------------------+----------------------/
Rows: 1
Time: 24.820014137s
This means that we know about 87.66% - 88.3% affilaitions of all contributions across all CNCF projects and across all time.
So as of 12/22/2023 we know company affiliations for about 88% of all contributions and we know about 18% of all contributors contributiong to all projects - and those a bit less than 20% contributors made almost 90% of all contributions across all time and all projects.
While in the reporting pod run:
PG_DB=allprj GHA2DB_CSVOUT=/data/company_project_commits_stats.csv ./sh/run.sh company_project_commits_stats 2014-01-01 2025-01-01 {{limit_companies}} 50 {{limit_projects}} 50
.
Locally:
wget https://devstats.cncf.io/backups/company_project_commits_stats.csv
- to download results.