Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Bleve for relevance searching #7

Open
preslavrachev opened this issue Nov 21, 2019 · 7 comments
Open

Integrate Bleve for relevance searching #7

preslavrachev opened this issue Nov 21, 2019 · 7 comments

Comments

@preslavrachev
Copy link

preslavrachev commented Nov 21, 2019

One of the features, I have always been missing on Bear is relevance searches. When you have thousands of notes, this starts becoming an issue. I mentioned it to the Bear guys, but they have no intention of introducing it to the app anytime soon.

@drgrib Perhaps, checking out Bleve is something we could look at (if interested)? I've wanted to try it out for some of my personal projects, so this would be a nice experiment.

@drgrib
Copy link
Owner

drgrib commented Nov 23, 2019

Can you explain what a "relevance search" is? I.e. how does it differ from the current search mechanism, preferably by example? It's not obvious to me from the Bleve link.

Also, this appears to require in-memory indexing which speeds later searches of that in-memory structure. It may not be obvious because Go makes the execution so fast but the way script filters work in Alfred is that they are executed again and again every time you type another character.

So if I type bs sand, the script filter for cmd/search/search is called from scratch 4 times, once for each character. Each time, it is restarted from nothing with the {query} values of s, sa, san, and sand, respectively. Each of these calls makes a new select query to the Bear SQLite database using each subsequent value of {query}.

This makes in-memory data structure access to support queries impossible. Background processes that index the data and store it to disk are also impossible because script filters can only be called when the Alfred user invokes them by typing their keyword.

In short, at least at first glance, Bleve seems impossible to use by an Alfred workflow. Only Bear itself could implement such a feature and even then an Alfred workflow would not have access to the index it creates unless it is stored in the SQLite database somehow.

@drgrib
Copy link
Owner

drgrib commented Nov 23, 2019

I've read and watched a little more on Bleve and it seems like "relevance search" is just a catchall term for making searching more intelligent, less like basic DB search and more like Google search or elastic search. It's possible, if index creation and updating is fast enough, that indexing could be executed and the index saved within script filter executable calls. But it is not clear to me exactly how the index would be constructed from the Bear SQLite database and how it would be changed with Bear note changes.

@preslavrachev
Copy link
Author

Sorry, i should have elaborated a bit. Relevance searching should have meant "sorting by relevance". Put simply, sort the found notes, not by recency, but by how closely they match your search query.

Bleve, just like ElasticSearch, does that very efficiently. It allows for the creation of a search index, which is persistent and super fast when queried. As you pointed out in other applications, records get added to the index, as soon as they get saved in a database, on disk, etc. In our case, this, of course, won't be that easy. I toyed around with the idea of having a separate command, which triggers the generation (and re-generation) of an entire index. It's not the nicest user experience, but it can be some kind of a start. Another possible idea is to execute two concurrent searches for every query - one against the index, and one against the SQLite DB. The one against the index get shown to the user, the other gets used to re-index the found notes (effectively, making sure that recently created/updated notes make it into the index).

Again, as I said in my original note, this is not a feature request of any kind, but just a crazy idea I've been playing around with in my head. I'd suggest that you don't do anything on it for now. Instead, I'll try to spare some time and scratch a PoC in my own fork. If you like the idea afterwards, we can think of a PR.

@drgrib
Copy link
Owner

drgrib commented Nov 24, 2019

Interesting. Sounds like a plan. One possible way that Go can help with performance issues is to use a select statement with a timeout on the index task, which can just return basic DB results if the index isn't finished or isn't fast enough. It could possibly make incremental progress on indexing each time a search is called and then cancel the execution within a timeout. One caveat is that unless the process of indexing can be broken into short spurts, this may not be a viable solution.

@preslavrachev
Copy link
Author

@drgrib I am not sure which version of the SQLite libraries you are using, but when I tried building locally, I got back slices with many, but empty results. I attributed that to the types behind the column pointers not being []uint8 anymore, but regular strings. Anyway, I fixed in my WIP branch: preslavrachev@a25548f but if you wish, I can send you a separate PR with just this commit.

P.S. I can totally advise setting up Go modules support for this package. It would help eradicate all kinds of such dependency version inconsistencies.

@drgrib
Copy link
Owner

drgrib commented Dec 7, 2019

That's a good point. When I first designed the workflow over a weekend when I finally got tired of the slow speed of the Python version, I never anticipated this kind of collaboration because I (naively :) assumed it would have all the features anyone would ever want and that people would just use the executables. But yes, dependency management is sorely missing.

So yes, can you actually send me a separate PR with just that commit as well as your choice of dependency management so I match your version? Both the separate PR and the dependency management are excellent ideas.

@preslavrachev
Copy link
Author

No need to apologize, it's your project after all :) It has all the features you wanted it to, which is perfect. All the rest are nice to haves. This is the beauty of open-source. I can add changes in my fork, and send you PRs. You feel totally free to accept or reject them (no pressure) ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants