Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

paging for large result sets? #59

Open
jameshowison opened this issue Feb 18, 2022 · 9 comments
Open

paging for large result sets? #59

jameshowison opened this issue Feb 18, 2022 · 9 comments
Labels
C++ enhancement New feature or request

Comments

@jameshowison
Copy link

I'm running into responsiveness issues with queries with many results (~100,000 order of magnitude).

If it was just me then I'd add LIMIT statements, remembering to remove them later in the query buildup. But I'm working with students and asking doesn't make it so.

Any chance that the result sets could be paged, with the interface only handling those currently being looked at? This is a common approach in things like phpmyadmin or other sql clients, but I don't know whether SOCI makes that easy or not?

@marimeireles
Copy link
Member

hey @jameshowison, thanks for opening the issue. I was reading the discussion you had on the issue linked. Even if this is not a xeus-sql issue per-se is still an interesting question.

Any chance that the result sets could be paged, with the interface only handling those currently being looked at? This is a common approach in things like phpmyadmin or other sql clients, but I don't know whether SOCI makes that easy or not?

Do you know how this is implemented under the hood?
If it's being paged it means they're running the statement again with a LIMIT set by them?

I don't know how we could implement something sensitive to being watched by the user, but we could think on some kind of pagination.

@jameshowison
Copy link
Author

Yeah, I think phpmyadmin and others just use LIMIT and OFFSET. Although I'm far from sure (it's what phpmyadmin shows as you browse). I found this note about using resultset/cursor and postgres specific FETCH. Perhaps something like that is implemented with SOCI?

https://www.citusdata.com/blog/2016/03/30/five-ways-to-paginate/

Perhaps some of the impact can be resolved using something like https://pypi.org/project/jupyterlab-limit-output/

@marimeireles
Copy link
Member

Perhaps some of the impact can be resolved using something like https://pypi.org/project/jupyterlab-limit-output/

Yeah, that seems like something that should work out of the box, tbh.

@jameshowison
Copy link
Author

Thinking about this more, perhaps an across platform approach that protects against the most common error cases without the hassle of pagination would be to add LIMIT 100 to each query (unless another LIMIT was there) and include a link at the bottom to "Show All results" which removes the LIMIT 100?

I think needed to interactively look at more than 100 results would be exceedingly rare (I mean 100 is too many to look through anyway :)

@jameshowison
Copy link
Author

Just a note that I ran into this today with Jupyter becoming unresponsive with a 38MB ipynb file (resulting from large results from SELECT * FROM table queries.

@jameshowison
Copy link
Author

Just a note here (in case others find this in future) that https://github.com/kynan/nbstripout is able to resurrect files that have become unresponsive due to containing very large amounts of output.

jameshowison added a commit to jameshowison/utexas-image that referenced this issue Feb 28, 2022
To help students who end up with an unresponsive Jupyter because of very large output in files.  A workaround for the issue in xues-sql reported here: jupyter-xeus/xeus-sql#59
@jameshowison
Copy link
Author

What about adding a parameter to the LOAD, something like limit_output = 100 would be a good explicit way to handle this. While adding a LIMIT 100 to the query enroute to the server would be one approach, another could be simply to truncate the results to never fill up output to a level that can make the page unresponsive. So something truncating rows around xeus_sql_interpreter.cpp:93?

@marimeireles
Copy link
Member

Thinking about this more, perhaps an across platform approach that protects against the most common error cases without the hassle of pagination would be to add LIMIT 100 to each query (unless another LIMIT was there) and include a link at the bottom to "Show All results" which removes the LIMIT 100?

I think needed to interactively look at more than 100 results would be exceedingly rare (I mean 100 is too many to look through anyway :)

This would also work.

I'm not sure I understand your last comment. How would that differ from using LIMIT?

@jameshowison
Copy link
Author

Last comment meant that I almost never manually look at results above 100 rows, I load larger results into a data analysis or visualization program. But I think having this configurable using limit_output = in the load cell is a great option.

@marimeireles marimeireles added enhancement New feature or request C++ labels May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C++ enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants