Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Case insensitive FastRegexMatcher doesn't check all folds #14066

Open
colega opened this issue May 8, 2024 · 8 comments · May be fixed by #14086 or #14170
Open

Case insensitive FastRegexMatcher doesn't check all folds #14066

colega opened this issue May 8, 2024 · 8 comments · May be fixed by #14086 or #14170

Comments

@colega
Copy link
Contributor

colega commented May 8, 2024

What did you do?

I ran the mimir-prometheus FuzzFastRegexMatcher_WithStaticallyDefinedRegularExpressions for a while, and found a bug.

The input string "ſſs" is not matched by the case-insensitive regular expression "(?i:(foo1|foo2|aaa|bbb|ccc|ddd|eee|fff|ggg|hhh|iii|lll|mmm|nnn|ooo|ppp|qqq|rrr|sss|ttt|uuu|vvv|www|xxx|yyy|zzz))",.

What did you expect to see?

I expected that string to be matched, as the golang's regexp package matches it.

What did you see instead? Under which circumstances?

It wasn't matched.

@colega
Copy link
Contributor Author

colega commented May 8, 2024

A very quick diff that fixes the problem in the least performant way:

diff --git model/labels/regexp.go model/labels/regexp.go
index 79e340984..7681483e7 100644
--- model/labels/regexp.go
+++ model/labels/regexp.go
@@ -791,7 +791,15 @@ func (m *equalMultiStringMapMatcher) Matches(s string) bool {
        }

        _, ok := m.values[s]
-       return ok
+       if ok || m.caseSensitive {
+               return ok
+       }
+       for k := range m.values {
+               if strings.EqualFold(s, k) {
+                       return true
+               }
+       }
+       return false
 }

@colega
Copy link
Contributor Author

colega commented May 8, 2024

I'd also consider just going through the standard regular expression engine if the matcher is case insensitive and the string contains unicode runes.

@colega
Copy link
Contributor Author

colega commented May 10, 2024

The case-insensitive regular expression "(?i:(sss))" should match the string "ſſs" but it does not.

(Edit: this comment was a response to a deleted comment).

@kushalShukla-web
Copy link
Contributor

you mean this matches function is incomplete right and it should have a check for a string to be matched .

@kushalShukla-web
Copy link
Contributor

kushalShukla-web commented May 10, 2024

i think here we should use "regexp" package instead of "string" package .
here on this line it will always return false as its not checking for a certain pattern strings.EqualFold(s, k) .

@kushalShukla-web
Copy link
Contributor

will create pr for this today .

@bboreham
Copy link
Member

I observe it fails because ToLower("ſſs") is "ſſs".
However if we flip all the ToLowers to ToUpper, this case does work.

Maybe we can use unicode.SimpleFold? (which I found used inside Go's regexp package)

@Ranveer777
Copy link

Ranveer777 commented May 30, 2024

@colega @bboreham
I'm interested in picking up this bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment