Built-in character classes are more table-driven.
Given that, Negative built-in ones like \W
, \S
etc...
are difficult for engines to merge into a positive character class.
In this case, there are some obvious bugs because as you've said, it doesn't time out on
some target strings.
In fact, [a-xzA-XZ\W]
works given the sample string. It times out when Y
is included anywhere
but just for that particular string.
让我们看看能否确定这是否是一个bug.
首先,一些测试:
Test - Fail [a-zA-Z\W]
https://rextester.com/FHUQG84843
# Test - Fail [a-zA-Z\W]
puts "Hello World!";
regex = /(Si.ges[a-zA-Z\W]*avec\W*fonction\W*m.moires)/ui;
text = "xation de 2 sièges-enfants sur la banquette AR),Pack \"Assistance\",Keyless Access avec alarme : Système de verrouillage/déverrouillage et de démarrage sans clé,Park Assist: Système d'assistance au stationnement en créneauet et en bataille,Rear Assist: Caméra de recul avec visualisation de la zone situ";
res = text.match(regex);
puts "Done";
Test - Pass [a-xzA-XZ\W]
https://rextester.com/RPV28606
Test - Pass [a-zA-Z\P{Word}]
https://rextester.com/DAMW9069
Conclusion: Report this as a BUG.
IMO this is a BUG with their built-in class \W
which is engine defined,
since \P{Word}
is a Unicode property defined function, not a range.
And we see that [a-zA-Z\P{Word}]
works just fine.
Use \P{Word}
inside classes as a temporary workaround.
In reality when modern-day engines were first designed, the logic of what
a negative class was [^]
each item is AND NOT which when combined with a positive
class where each item is ORed results in errors in scope.
Perl had class errors still a short time ago.