-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slow performance of duckdb_stream_fetch_chunk #12105
Comments
In what way is the Python code streaming? |
I didn't say that Python code is streaming. If Python can materialize so fast, why is C API slow (Which isn't even materializing the result set)? |
Why are you comparing a streaming result set against a materialized one then? A more equivalent test would be using: con.execute(...)
con.fetchmany(duckdb.__standard_vector_size__) It sounds like what you're struggling with is the performance of the streaming result collector, which is a known issue |
I switched the .NET implementation to a streaming result set to avoid issues like JDBC ResultSet consumes unlimited memory but if that comes with a 10x performance regression, I will switch it back. |
Switched back to using non-streaming result and here are the results:
|
What happens?
Compared to Python API,
duckdb_stream_fetch_chunk
is very slow.Output for Python script (the first column is the value that goes in the where condition):
Time needed for
duckdb_execute_prepared_streaming
+duckdb_stream_fetch_chunk
for one of these keys is 0.559 seconds (8MrfK3KC2, takes 0.09 seconds in Python):This was originally raised in the .NET client repo: Giorgi/DuckDB.NET#188
I can either share the parquet file (26GB) or you can generate it yourself based on the instructions here: Giorgi/DuckDB.NET#188 (comment) (You will get different keys, but it should behave similarly)
To Reproduce
I use this Python script to measure Python execution count:
C code fragment:
OS:
Windows 11 x64
DuckDB Version:
0.10.2
DuckDB Client:
C, Python
Full Name:
Giorgi Dalakishvili
Affiliation:
Space International
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a stable release
Did you include all relevant data sets for reproducing the issue?
Yes
Did you include all code required to reproduce the issue?
Did you include all relevant configuration (e.g., CPU architecture, Python version, Linux distribution) to reproduce the issue?
The text was updated successfully, but these errors were encountered: