-
-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider brackets within wildcard as regular characters #19973
base: master
Are you sure you want to change the base?
Conversation
I had this PR in draft for a few months, and I had the following in sandbox, see the comment: // The order of the replacements below is important <-- this comment
// Replace "\?" (which has been regex-escaped above) with "."
wildcard.replace(u"\\?"_s, u"."_s);
// Replace "\*" (which has been regex-escaped above) with ".*"
// Consecutive wildcards are merged to avoid catastrophic backtracking
wildcard.replace(QRegularExpression(u"(?:\\\\*)+"_s), u".*"_s); On second thought, I cannot figure out any case where switching the replacement order would cause issues. But maybe I'm overlooking something I had thought of a few months ago. Does anyone think of any edge case? |
So, no one has anything to say about it? |
Mmmm, not sure about the number of backslashes here:
|
53ae771
to
ff37f4d
Compare
ff37f4d
to
095bf83
Compare
Force-pushed to (probably) fix the number of backslashes, as described above. |
There is a build failure, but it is unrelated to this PR, see here:
|
This PR is stale because it has been 60 days with no activity. This PR will be automatically closed within 7 days if there is no further activity. |
Coincidentally, I had seen the 5.0 beta release, and I was about to "bump" this PR. |
By the way, someone might be interested to submit a PR for Qt's wildcardToRegularExpression, to prevent regex catastrophic backtracking, if there are multiple consecutive It is as simple as adding the following in the case '*':
rx += settings.starEscape;
+++ while (i < wclen && wc[i] == u'*') {
+++ ++i;
+++ }
break; |
This PR is stale because it has been 60 days with no activity. This PR will be automatically closed within 7 days if there is no further activity. |
No no no dear bot, please keep this open. |
Usually two comments are guaranteed to remove the label. |
Could you provide a feature summary without reference to other topics etc.? It should contain a precise definition of how it is supposed to behave, and a description of why. IIRC, you want it to accept only |
Just a first draft for that summary:
About a way to escape the These characters are forbidden in NTFS and FAT filesystems (so Windows, basically), but are perfectly valid on almost all other filesystems (see this table). However, a special case I thought of, see the last two points:
By implementing using a "naive" code, we may get handling A, but I think handling B would be more appropriate.
So, for now I haven't really established what would be the most appropriate escaping strategy. And as I remember, that's probably why I had left out adding an escaping mechanism initially. |
The only really satisfying solution would be to implement a « plaintext / wildcards / regex » dropdown, see #9255 (comment) and following. The "wildcards" mode would continue to behave as currently, as provided by Qt. And at least it provides an unambiguous escaping mechanism: The added "plaintext" mode would become the default mode, instead of "wildcards". The former doesn't bring any surprise, and is sufficient for most users. Users then have the choice to opt in more advanced search modes. |
@vlakoff
|
It seems so natural to me to use the simplest wildcard characters (at least |
Thinking about how to implement the escaping you chose here, I'm seeing two possible approaches:
Also, as I stated above, I am very uncomfortable that the Therefore, I am bending towards closing this ticket, and keeping the current code, which is not that bad:
|
Other notes:
|
You can just report it to Qt and suggest a fix within bugreport. |
For context, in #4170 we added support for « ? » and « * » glob wildcards in search patterns. But a side-effect is that « [ » and « ] » also become special characters, which is rather unexcepted and undesirable.
This PR is a new attempt at #16965, which was broken and had to be reverted in #17820. I still don't know why #16965 was failing…
Here I'm taking a new approach: we fully regex-escape the provided pattern, then we replace
\?
and\*
with.
and.*
respectively. This approach should be more robust, and is even simpler actually. See this question on SO that led me to this approach.I'm taking the opportunity to point out that at the time of #4170 (2015), only plaintext search was available, hence the need for these wildcards. In 2018 the powerful regex search has been added: #9255. Ideally, we should be able to choose between plaintext, wildcard and regex searches, see this draft.