some extended Korn shell globs are really slow

Bug #625164 reported by Thorsten Glaser
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
mksh
In Progress
Wishlist
Unassigned

Bug Description

Attaching a testcase.

The first glob is decently fast, the second one is noticeable (about a second) on a 3 GHz Athlon.
The third glob needs to be killed with SIGKILL out of all things.

Also, I have another shell script, where replacing
[[ $foo = *@(x)* ]] with [[ $foo = *'x'* ]] and
[[ $foo = @(1|2………),* ]] with [[ $foo = 1,* || $foo = 2………,* ]
made it noticeable faster.

The probable culprit is gmatchx, do_gmatch, and friends, mostly from misc.c.

This bug serves as documentation for now, because I have no idea how to tackle it, but if someone takes it up and submits patches, be my guest.

Tags: glob slow
Revision history for this message
Thorsten Glaser (mirabilos) wrote :
Revision history for this message
Thorsten Glaser (mirabilos) wrote :

Partial fix committed:

${foo/bar/baz} made bar into *bar* which will become very slow if bar = *@(foo)
mksh now partially optimises extglobs:

Pass 1 ⇒ replace all @(foo) with foo if foo doesn’t contain a pattern separator (‘|’)

Pass 2 ⇒ collapse all adjacent asterisk wildcards (‘*’)

This speeds up a lot of things, up to the point of preventing apparent freezes even on multi-Gigahertz-machines. There’s still a lot to do (cache optimised regexps, especially for the ${foo/bar/baz} case; optimise things as reported above) though.

Changed in mksh:
status: New → In Progress
Revision history for this message
Thorsten Glaser (mirabilos) wrote :

mksh’s globbing really sucks. Way out: parse them as special kind of regex (NFA with making $KSH_MATCH an array, possibly).

References:

- https://research.swtch.com/glob

- https://swtch.com/~rsc/regexp/regexp1.html

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.