Comment 5 for bug 273978

Revision history for this message
Martin Pool (mbp) wrote :

I had a look at this with bialix and gz. On both linux and Windows, os.strerror (which gets put into the OSError etc) is a byte string in the current encoding.

We currently call locale.setlocale(locale.LC_ALL, '') which causes it to be set by the environment.

We would have the option to do setlocale(LC_MESSAGES, 'C') which would give English OS error messages always, which would avoid encoding bugs and also avoid variation in tests when running in non-English locales. The only question there is whether users would generally prefer to get English or i18n error messages. Possibly we could recommend they users manually set LC_MESSAGES=C if they prefer this, but this won't work on Windows on python2.5 and later.

On Unix I think what we want to do is:

enc = locale.getlocale(locale.LC_MESSAGES)[1]
print os.strerror(4).decode(enc, 'replace')

and that should give us a safe unicode version of the message. It will vary across OSs and may vary across python versions.

On Windows we can go through ctypes.windll.kernel32.GetACP() to tell us the codepage, or the second return value from locale.getdefaultlocale() should tell us the right encoding to use for error message byte strings.