Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] better performance logging for docs generate command #10158

Open
3 tasks done
dataders opened this issue May 16, 2024 · 1 comment
Open
3 tasks done

[Feature] better performance logging for docs generate command #10158

dataders opened this issue May 16, 2024 · 1 comment
Labels
enhancement New feature or request performance

Comments

@dataders
Copy link
Contributor

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

the user wants the following debug log, to be reported as info but with more added context about which query was run.

SQL status: SUCCESS 24005 in 239.0 seconds

Originally posted by @extrospective in dbt-labs/dbt-snowflake#1034 (comment)

[...] If I run "dbt docs generate" there is little useful information. [...] if docs generation were to continue to be a multi-minute query some info could be added there.

On the other hand, when running with debug:

When you run "dbt docs generate --debug" - there is a line which reads for me:

SQL status: SUCCESS 24005 in 239.0 seconds

That line is perhaps the most important in the file - since it accounts for much of the time.

One option - make the line more clear what it refers to.
3 seconds later in the log file is what presumably is the connection which caused it:

13:32:56  SQL status: SUCCESS 24005 in 239.0 seconds
13:32:59  On dbt_db.information_schema: Close

if there is parallelization, no guarantee something will not be in between - so I might like a more descriptive first line.

And if the problem is not made to "go away", I could imagine accumulating some critical timing information for these important queries, and at the end of the log file process (with --debug) flushing that critical information there - so it stands out from the 43K line file.

Describe alternatives you've considered

docs generate is faster

Who will this benefit?

all users of dbt docs

Are you interested in contributing this feature?

No response

Anything else?

No response

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 17, 2024

Perhaps some INFO-level logging to this effect:

$ dbt docs generate
12:18:42  Running with dbt=1.8.0
12:18:42  Registered adapter: snowflake=1.8.1
12:18:43  Found 1234 models, ...
12:18:43
12:18:43  Compiling project # unless --no-compile
12:19:43  Building catalog
12:19:55  Finished generating catalog for 1234 objects in database1.schema1
12:19:57  Finished generating catalog for 98765 objects in database1.schema2
12:20:01  Catalog written to /Users/jerco/dev/scratch/testy/target/catalog.json

Those specific events might make most sense to fire in adapter.get_filtered_catalog, in which case we should move to dbt-adapters.

@jtcohen6 jtcohen6 removed the triage label May 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

3 participants