command-not-found slow for its task

Bug #598282 reported by Iain Buclaw
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
command-not-found (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: command-not-found

Considering command-not-found has a relatively simple job. It seems that it is horribly over-engineered for its task. And as such, we have runtimes such as this:

$ time sl
The program 'sl' is currently not installed. You can install it by typing:
sudo apt-get install sl

real 0m1.715s
user 0m0.680s
sys 0m0.228s

$ time sl
The program 'sl' is currently not installed. You can install it by typing:
sudo apt-get install sl

real 0m0.759s
user 0m0.592s
sys 0m0.116s

A little over a year ago I wrote a replacement script, that I've numerously lost, found, updated and forgotten about. It is about 250 LOC, and kept with one thing in mind, KISS. What surprises me most though is not the simplicity in comparison, but that it is 14x faster at executing it's job.
That script can be found here: http://ubuntuforums.org/showpost.php?p=7116396&postcount=1
And this is how well it runs in comparison.

$ time sl
The program 'sl' is currently not installed. You can install it by typing:
sudo apt-get install sl

real 0m0.079s
user 0m0.036s
sys 0m0.016s

$ time sl
The program 'sl' is currently not installed. You can install it by typing:
sudo apt-get install sl

real 0m0.062s
user 0m0.044s
sys 0m0.016s

You would be forgiven in thinking I used a compiled application to get such low times, but it is not, I tell you that now.

So the ultimate question is, why punish users? I think some act needs to be gotten together to clamp down on the speed demons in this application. As IMO, it is in an unacceptable state.

Regards

ProblemType: Bug
DistroRelease: Ubuntu 10.04
Package: command-not-found 0.2.40ubuntu5
ProcVersionSignature: Ubuntu 2.6.32-22.36-generic 2.6.32.11+drm33.2
Uname: Linux 2.6.32-22-generic i686
Architecture: i386
Date: Thu Jun 24 22:33:33 2010
PackageArchitecture: all
ProcEnviron:
 LANG=en_GB.utf8
 SHELL=/bin/bash
SourcePackage: command-not-found

Revision history for this message
Iain Buclaw (iainb) wrote :
Changed in command-not-found (Ubuntu):
status: New → Confirmed
Revision history for this message
Zygmunt Krynicki (zyga) wrote :

Feel free to help me out. The only issue why this app is slow is python startup time. A rewrite in C would make it much more responsive. NB: The app is _fast_ but not _responsive_.

More than a year ago I did an experiment with c-n-f running as a daemon with a trimmed-down C program talking to it over dbus. The initial response was roughly identical to what c-n-f does today. Each subsequent response was instant (on a very slow atom laptop with spinning disk). I did not finish that design but I think the idea is sound.

There are three things taking time in c-n-f:

1) The start up time of the environment (currently the only really slow thing)
2) The query, this is actually very fast using binary database optimized for lookups. It actually answers 10s or even 100s queries each time you miss-type a command as it searches for similar command to offer typo suggestions. I don't see the need to optimize this much but one possible solution would be to add a cache for things that people often miss-type to save on the "spellchecker" lookups.
3) The time it takes to format and print the message, this is very fast and I don't see the need to change that.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

I checked your script -- it does not seem to offer typo corrections - could you please add that feature (or disable it in c-n-f) and re-benchmark both?

Revision history for this message
rgrig (radugrigore) wrote :

See attached. Try
  tar xaf pkghint.tar.bz2
  cd pkghint
  ./makedb.sh
  make

Here are some times on my computer:

rg@rqgm:pkghint$ time /usr/lib/command-not-found foobar
foobar: command not found
real 0m0.204s
user 0m0.168s
sys 0m0.032s

rg@rqgm:pkghint$ time ./pkghint foobar
foobar: command not found
real 0m0.017s
user 0m0.004s
sys 0m0.004s

rg@rqgm:not-found$ time /usr/lib/command-not-found lesx
No command 'lesx' found, did you mean:
 Command 'les' from package 'atm-tools' (universe)
 Command 'lex' from package 'flex' (main)
 Command 'lex' from package 'flex-old' (universe)
 Command 'less' from package 'less' (main)
lesx: command not found
real 0m0.202s
user 0m0.168s
sys 0m0.032s

rg@rqgm:not-found$ time ./pkghint lesx
No command 'lesx' found. Did you mean
  command 'less' from package 'less' (main)
  command 'lex' from package 'flex' (main)
  command 'lex' from package 'flex-old' (universe)
real 0m0.018s
user 0m0.004s
sys 0m0.004s

Feel free to reformat the code, but please don't make it slow ;)

Revision history for this message
rgrig (radugrigore) wrote :

Bugfix.

Also, two warnings:
 - I didn't really think about what needs to go in the database. (In other words, makedb.sh is a complete hack.)
 - figuring out the active sources is also a hack, because I was lazy to fugure out how to use libapt from C; it may be worth using it unless it significantly slows down the problem (>50ms for one run is too much in my opinion)

and a note
 - I intentionally sort the output somewhat differently. The criteria are: priority.txt, source, name of command, name of package.

Revision history for this message
Zygmunt Krynicki (zyga) wrote :

While I appreciate your enthusiasm and desire to fix this I have a few comments:

1) Unless I missed something your implementation always reads the whole database file, I would rather avoid that.
2) The similar result search you employed is much weaker than the one used by current implementation, you do not look for any permutations of the input string, just for substrings.
3) Plan on how to migrate those features (or not) to your new implementation
4) Do not use hacks like the one with parsing apt files, in general they tend to strike back sooner than we expect.
4) Propose a merge request, start with a alternative implementation skeleton, then add new pieces one by one, I'll gladly review them.
5) Initially ensure that the output is the same, this will simplify testing. We can change the output of both implementations later if we so desire.

Thanks!

Revision history for this message
rgrig (radugrigore) wrote : Re: [Bug 598282] Re: command-not-found slow for its task

On Fri, Jul 22, 2011 at 3:03 AM, Zygmunt Krynicki
<email address hidden> wrote:
> While I appreciate your enthusiasm and desire to fix this I have a few

Actually, I don't have much of either. But thanks for your appreciation. :)

> 1) Unless I missed something your implementation always reads the whole database file, I would rather avoid that.

That's actually what makes it fast, so you should not avoid it. With
non-SSD one seek takes roughly the same amount of time as reading
sequentially a few MB; with SSD, it's fast anyway. GDBM has to
optimize behind the scenes to also read and cache the whole thing. (If
it doesn't, then that's what is slow.)

> 2) The similar result search you employed is much weaker than the one used by current implementation, you do not look for any permutations of the input string, just for substrings.

Look again.

Revision history for this message
rgrig (radugrigore) wrote :

I profiled your code (0.2.41ubuntu2) and on my computer the bottleneck is this statement:
  SourcesList()

The slow part of that is that it reads the info files in
  /usr/share/python-apt/templates/
every time.

And the slow part of that is that it tries to translate the descriptions.

Revision history for this message
Iain Buclaw (iainb) wrote :

Hi, thanks your your sudden enthusiasm of my report. I wrote the script somewhat 2 years ago now as a proof of concept, and I've never had the need for spell checking feature in command-not-found. :~)

I will see what I can do to help, though I am a bit stretched for time these days.

Regards.
Iain

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.