Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embed DuckDB extensions in the Rill binary #4919

Merged
merged 7 commits into from
May 22, 2024
Merged

Conversation

esevastyanov
Copy link
Contributor

Closes #4574

@esevastyanov esevastyanov marked this pull request as ready for review May 17, 2024 11:27
Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks excellent! Approving, but put a few nits below to look at first


// Define source and destination paths
embedPath := fmt.Sprintf("embed/extensions/%s/%s", duckdbVersion, platformName)
duckdbExtensionsPath := filepath.Join(os.Getenv("HOME"), ".duckdb", "extensions", duckdbVersion, platformName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use os.UserHomeDir() instead of Getenv?

defer output.Close()

for {
_, err = io.CopyN(output, gzipReader, 1024) // CopyN is used to prevent a warning (G110: Potential DoS vulnerability...)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you prefer io.Copy, adding a // nolint is also okay IMO, since we trust the source of the data

Comment on lines 17 to 25
// Since DuckDB is called from multiple packages, the extensions are installed in the init function
func init() {
err := installExtensions()
if err != nil {
// If extensions cannot be installed, log the error and continue as the extensions can be downloaded
// Should it be fatal in order to notice the issue prior to a release?
log.Printf("Error preparing DuckDB extensions: %v", err)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might delay process start. Can we move this to lazy evaluation, e.g. using a sync.Once global that is called when the first DuckDB handle is opened?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you expect something similar to this?

var once sync.Once
func (c *connection) prepareExtensions() error {
	var err error
	once.Do(func() {
		err = installExtensions()
	})
	return err
}

However DuckDB is also used in duckdbsql package (query function). This function is called on app start by emitStartEvent.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah something like that. You could make it a public function, like duckdb.PrepareExtensions, and call it from the duckdbsql package before loading an extension (and other packages that load extensions).

I understand in practice it gets loaded fast for rill start, but there are other CLI commands like rill user add etc. that don't need DuckDB, and also some tests that don't need it but might still load the DuckDB driver.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to separate package to resolve a circular dependency and also I decided to ignore the case if no extensions are embedded at all as this might be noisy in dev runs and tests

}
}

//go:embed embed/extensions/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid accidental commits, can we add the extension files here to gitignore? Might need a nested pattern like runtime/drivers/duckdb/embed/extensions/*/* or something like that to still maintain the .gitkeep file.

@begelundmuller begelundmuller added the blocker A release blocker issue that should be resolved before a new release label May 20, 2024
Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@begelundmuller begelundmuller merged commit 86e17c2 into main May 22, 2024
4 checks passed
@begelundmuller begelundmuller deleted the embed-duckdb-ext branch May 22, 2024 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker A release blocker issue that should be resolved before a new release
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Embed DuckDB extensions in the Rill binary
2 participants