"could not decode image" when path contains non-latin chars

Bug #678808 reported by maximtee
74
This bug affects 11 people
Affects Status Importance Assigned to Milestone
Hugin
Confirmed
Medium
Unassigned

Bug Description

Hugin fails to add files to project if there non-latin characters (i.e. cyrillic) in path, gives "failed to decode image" message.

Also, when cyrillic letters are used in project file name, they are converted into ISO latin codepage when the file is created. I can supply the screenshots to make the point clear.

Both tested on build 2925 on WinXP

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

Tested on SVN2948 on WinXP -- the bug is still there.

Revision history for this message
Lukas Jirkovsky (l-jirkovsky) wrote :

Logged In: YES
user_id=2036447
Originator: NO

Confirming that this is windows only. This is probably more wxwindows fault than hugin, it seems that unicode characters are not properly handled.

2 maximtee: Are you sure that your wxMSW (wxwidgets) supports Unicode?

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

I'm using whatever wxwidgets get installed with Hugin build, not sure if it has (or needs to have) specific Unicode support.

I can confirm that both bugs do not exist in "beta 4" released on 2007-02-03 and on "beta 5" released on 2007-05-10 (not sure what SVN builds they are) and can be reproduced on SVN builds at least since SVN2925. All tested on same WinXP system.

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

The version is wxMSW-2.8.7, it appears to be the latest stable release and its readme claims that "Unicode is fully supported under Windows NT/2000".

Revision history for this message
Pablo d'Angelo (pablo.dangelo) wrote :

Logged In: YES
user_id=30308
Originator: NO

The problem is that the platform independent core library is written in ordinary C/C++. This means that fopen() calls etc. are used. This is not a problem on linux, where filenames are usually UTF-8 encoded, and thus can be stored safely in 8 bit strings.

But unfortunately, this doesn't work for Windows, where fopen expects 8 bit strings to be in the local codepage, which is not UTF-8. So while the hugin GUI (wxWidgets) can be translated into korean, it can handle only filenames in the local codepage, due to the restriction in the hugin base library. For example, I can use german umlauts (they are in my local codepage), but not cyrillic chars etc.

Fixing this would require using a complete platform abstraction with that respect in the hugin base and all depending libraries (mostly vigra + maybe some image libraries). This is potentially a lot of work and needs quite some thinking to get it right.

Until this happens, hugin does only support filenames that can be encoded with the local codepage.

Pablo

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

the bug is still there on SVN2978 on WinXP

In addition to my previous comment in response to Pablo's post, I would like to make a point that both bugs do not exist in "beta 4" released on 2007-02-03 and on "beta 5" released on 2007-05-10.

As I believe this is important enough to be fixed for the next release, I change group to 0.7.0

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

I have noticed that there are only 5 refrences to fopen in 3 files (see below). Am I right assuming that these would be the places to start fixing this bug?

Find all " fopen", Subfolders, Find Results 1, "Entire Solution"
  D:\huginbase\hugin\src\foreign\jhead\jpgfile.cpp(341): infile = fopen(FileName, "rb"); // Unix ignores 'b', windows needs it.
  D:\huginbase\hugin\src\foreign\jhead\jpgfile.cpp(427): outfile = fopen(FileName,"wb");
  D:\huginbase\hugin\src\foreign\lensdb\PTLensDB.c(718): file = fopen(profileFile, "r");
  D:\huginbase\hugin\src\foreign\lensdb\PTLensDB.c(764): file = fopen(fileName, "r");
  D:\huginbase\hugin\src\tools\vig_optimize.cpp(270): FILE * f = fopen(inputPointsFile.c_str(), "r");
  Matching lines: 5 Matching files: 3 Total files searched: 168

Revision history for this message
Pablo d'Angelo (pablo.dangelo) wrote :

Logged In: YES
user_id=30308
Originator: NO

Hi Maxim,

Also, all calls in the libraries I use for opening image files etc (since this is C++, ofstream, ifstream are also used, not only fopen) need to be changed. I can't make the base library UTF16 compatible so short before a release. As you mentioned that earlier versions of hugin work with cyrillic characters, I'm trying to recreate the behavior of the older version.

wxWidgets provides different convertors from unicode wxString and designed for specific tasks (separate functions for terminal, gui and filename strings) Some time after beta 4 and before svn 2925 I switched all calls that convert filenames from the generic conversion (wxConvCurrent) to the filename conversion (wxConvFileName). I hoped that this change would make support for characters in the local codepage better, but it seems it had the opposite effect.

I have reverted back to wxConvCurrent (for windows only) in SVN 3066. I hope this allows you to use the cyrillic characters which are probably supported by your local encoding.

Pablo

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

Pablo,
thanks for a follow up.

I have tested SVN 3066 and unfortunately it does not solve any of the problems: still get "could not decode image" when adding files with cyrillics in path and when project is saved with cyrillics in file name, it is converted to ISO latin codepage.

standing by for further testing,

Maxim.

Revision history for this message
Pablo d'Angelo (pablo.dangelo) wrote :

Logged In: YES
user_id=30308
Originator: NO

Very strange. I assume that all that worked in SVN 2929 or earlier?

On my german windows machine, I can use umlauts without problems.
What codepage do you use under windows, so that I can try that on my machine?

You can try different wxWidgets conversion settings by changing

#ifdef __WXMSW__
#define HUGIN_CONV_FILENAME (*wxConvCurrent)

to
#define HUGIN_CONV_FILENAME (wxConvLocal)

and see if it makes any difference?

ciao
  Pablo

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

I was able to reproduce the bug on SVN 2925 and all later builds that I tested.

I'm positive the bug was not there on "beta 4" released on 2007-02-03 and on "beta 5" released on 2007-05-10 (not sure what SVN builds they are).

I'm using WinXP, English version, Regional settings as follows:

(Control Panel/Regional Options) Locale set to: Russian;
(Control panel/Advanced) Language for non-unicode programs: Russian

I believe these settings set OEM codepage 866 and ANSI codepage 1251.

I will try wxConvCurrent patch that you suggested and report here.

I would be eager to try earlier builds to get to the point where the bug is introduced. Can someone hint what SVN was "beta 5" released 2007-05-10? Shall I expect that it will build with the current toolchain?

Maxim.

Revision history for this message
Pablo d'Angelo (pablo.dangelo) wrote :

Logged In: YES
user_id=30308
Originator: NO

A further update:

I have reproduced the bug on a Windows machine. It is not related to the conversion settings, all three variants seem to produce the same result. I have traced the error back to the vigra image import libary, which hasn't changed since ages.

The problem seems to be that the C++ io functions do not like the filenames. fopen() works, but std::ifstream (used by most vigra image decoders) doesn't.

beta 4 and beta 5 were build with MSVC 2003 using a custom MSVC solution, not the one created with CMake. So I believe there might be a problem with the MSVC 2008 build. Maybe it can be solved with different compiler flags, (special options, different runtime library) but I don't know. It would probably be best to test with a very simple c++ program that uses ifstream to open a file and see if the errors also happen there. Hugin uses the static runtime libraries (which is discouraged by microsoft, and they might not be supported that well anymore), so maybe switching to the DLL variants might help (this will also require a complete recompilation of all dependencies! ARGHHHHH!). So a simple test program would be good. I don't have time to follow this issue in more detail right now, but maybe you could test with a test program such as:

#include <ifstream>
#include <stdio.h>

int main()
{
  char * filename = "some_filename_with_cyrillic_chars"
  FILE * f = fopen(filename,"rb");
  if (f) {
    printf("OK: fopen %s worked\n", filename);
    fclose(f);
  } else {
    perror("fopen failed");
  }

  std::ifstream fin2(filename, std::ios::binary);
  if (!fin2.good()) {
     printf("FAILED: std::ifstream in binary mode\n");
  } else {
     printf("OK: std::ifstream in binary mode\n");
  }
}

Try to compile it with different runtime libs (hugin uses /MT and /MTd), and see if it makes a difference.

Pablo

Revision history for this message
nobody (nobody-users) wrote :

Logged In: NO

I did some more testing on a Windows XP machine set to Russian encoding:
(Control panel/Advanced) Language for non-unicode programs: Russian
using the simple program below, and it works nicely with russian filenames when compiled with MSVC 2003 (can open the file with both fopen and std::ifstream), but the MSVC 2008 executable fails to open the files using std::ifstream (fopen still works). Looks like a bug in MSVC 2008.

Unfortunately I can't build the windows version with MSVC 2003 right now, as CMake refuses to work properly on the machine I have installed MSVC 2003. Additionally I'll have to recompile most dependencies with MSVC 2003 again first (which is a major pain in the ass!).

Here is the simple test program to reproduce the bug:

#include <fstream>
#include <stdio.h>

int main(int argc, char * argv[])
{
 char * filename = argv[1];
 printf("Trying to open file %s\n", filename);
 FILE * f = fopen(filename,"rb");
 if (f) {
    printf("OK: fopen %s worked\n", filename);
    fclose(f);
 } else {
    perror("fopen failed");
 }

 std::ifstream fin2(filename, std::ios::binary);
 if (!fin2.good()) {
  printf("FAILED: std::ifstream in binary mode\n");
 } else {
  printf("OK: std::ifstream in binary mode\n");
 }
}

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

Attachment contains: 1) Hugin and 2) test code
Built using MSVC 2008 Professional Edition.
File Added: 1908349_test.7z.001

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.001 was added: Built using MSVC 2008 Professional Edition

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

File Added: 1908349_test.7z.002

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.002 was added: None

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

File Added: 1908349_test.7z.003

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.003 was added: None

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

File Added: 1908349_test.7z.004

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.004 was added: None

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

File Added: 1908349_test.7z.005

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.005 was added: None

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

File Added: 1908349_test.7z.006

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

File Added: 1908349_test.7z.007

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.006 was added: None

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.007 was added: None

Revision history for this message
fmannan (fmannan) wrote :

Logged In: YES
user_id=1048760
Originator: NO

File Added: 1908349_test.7z.008

Revision history for this message
fmannan (fmannan) wrote :

The file 1908349_test.7z.008 was added: None

Revision history for this message
tksharpless (tksharpless-users) wrote :
Download full text (4.1 KiB)

Logged In: YES
user_id=1511901
Originator: NO

I used MSVC 2005 Standard Edition to build the test program, which eventually took the form below. It shows the same failures as one built with 2008 Express -- see comments with code.

I think this is not really a bug, though it definitely is a limitation of Windows due to the need to support codepage-based character sets. The best solution is clearly to pass only unicode file names to fstream c'tors, then the codepage issues disappear.

--Tom

/* RussianBug.cpp

  Test program to invsetigate a bug reported in hugin where
  std::ifstream fails to open files with Cyrillic names.

  Bug tracker link:
  http://sourceforge.net/tracker/index.php?func=detail&aid=1908349&group_id=77506&atid=550441

  Original by Pablo d'Angelo and MaxTee, who reported the bug
  and says..
"
 I'm using WinXP, English version, Regional settings as follows:

 (Control Panel/Regional Options) Locale set to: Russian;
 (Control panel/Advanced) Language for non-unicode programs: Russian

 I believe these settings set OEM codepage 866 and ANSI codepage 1251.
"

TKS mods:

  Report the effective codepage number.
  Try ifstream() with 4 filename flavors: argv[]; argv[] translated
    to unicode with current codepage and with Russian one; same
 argument read from commandline as unicode.

TKS findings:

  The global Windows codepage setting has no effect on this program.
  The codepage it reports is the one set in the "advanced" option for
  non-unicode programs.

  That codepage determines whether the commandline arguments are read
  correctly into argv[], and apparently also whether ifstream() can
  translate them correctly into unicode (which is the eventual format
  passed to the OS). If they are not read correctly, unknown chars get
  replaced by '?', and translating to unicode fails.

  The ifstream c'tor has some polymorphous ability to accept either ANSI
  or unicode filename arguments; however the ANSI ones must be supported
  by the effective codepage.

  When translation fails, the eventual result is a "file not found" error
  from the OS -- nobody notices the string format problem.

  If the commandline is read as unicode, then the special codepage
  is not needed, and ifstream( unicode_name ) always suceeds.

*/

#include <fstream>
#include <stdio.h>
#include <windows.h>

int main(int argc, char * argv[])
{

 unsigned codepage = GetACP();
 printf("\nThe current Windows code page is %d\n", codepage);

/* read the command line as Unicode */
 int Wargc;
 LPWSTR * Wargv = CommandLineToArgvW( GetCommandLine(), & Wargc );

 for(int i = 1; i < argc; i++ ){

  printf("\n(ANSI) argv[%d] is '%s'\n", i, argv[i]);
  printf("Targv = argv translated to Unicode with current codepage\n" );
  printf("Rargv = argv translated to Unicode with Russian codepage\n" );
  printf("Wargv = argv read as Unicode\n");
  printf("\n");

  wchar_t Targv[200];
  int k = MultiByteToWideChar(
  CP_ACP, // (use current codepage),
  0, // DWORD dwFlags,
  argv[i], // LPCSTR lpMultiByteStr,
  -1, // (is null terminated)
  Targv, // LPWSTR lpWideCharStr,
  200 // int cchWideChar
  );

  wchar_t Rargv[200];
  k = ...

Read more...

Revision history for this message
maximtee (maximtee) wrote :

Logged In: YES
user_id=2021593
Originator: YES

Tom,
Your suggestion to pass only unicode file names to fstream c'tors seems like the way to move forward.

Problem is, I'm not fluent enought with the codebase to undertake the patch all by my own.
Any suggestion on how to tackle this would be appreciated,

Max.

Revision history for this message
tksharpless (tksharpless-users) wrote :

Logged In: YES
user_id=1511901
Originator: NO

Hi Max

Although the best (and I think finally inevitable) solution is to make hugin use only Unicode strings, that will be hard, as Pablo has pointed out. There is still a lot of code that assumes ANSI strings. Newer code uses UTF-8 Unicode, which can "pass" for ANSI so long as one only uses chars in the current codepage; but in cases such as this one the masquerade breaks down.

Finding and fixing all the ANSI code would be a long, mainly thankless task needing very good debugging tools and plenty of motivation. It may be, as Pablo suggests, that on Linux we will eventually be able to build old ANSI code for UTF-8, without actually changing it; but I don't think that will ever be possible on Windows.

So actually the only practical solution for the present bug is to work around it by using ANSI filenames. However, that need not mean changing your Cyrillic names, because every NTFS file does have an ANSI name: the hidden DOS (8.3) name, which you can see with the command "dir /x". That is supposed to be a valid alternative name for the file in all situations, so it might work here.

Regards, Tom

tmodes (tmodes)
Changed in hugin:
status: New → Triaged
tags: added: codepage hugin international
Revision history for this message
Vladimír Jícha (jech) wrote :

This bug is in Hugin for a very long time, at least since I started using it 4 years ago. I just tried version 2010.4 RC1 and it fails right at the beginning when using CPFind. The root problem is, that some of the tools that Hugin uses do not support unicode filenames on Windows.

There is a quick a dirty fix which I already suggested - to use short paths. Also the user should be at least warned about the problem, if he tries to open or save such files.

I think that using .bat files on Windows is not a good idea in the first place. Windows shell is really bad and should be avoided if possible.

Revision history for this message
Yuv (yuv) wrote :

Thank you for testing 2010.4 and confirming that this is still an issue. Unfortunately it won't be fixed for this release. Please keep using the workaround (avoid non-latin chars) until a fix is contributed and applied.

Changed in hugin:
status: Triaged → Confirmed
tags: added: windows
Revision history for this message
Guy Rutenberg (guyrutenberg) wrote :

This bug also affects Linux (at least Debian Jessie, with UTF-8 locale) when path contains non-ascii characters. A possible workaround is to save the project and then stich using the Batch Processor which, surprisingly, isn't affected by it.

Revision history for this message
Guy Rutenberg (guyrutenberg) wrote :

I did some more debugging on this issue. The failure stems from the call to wxExecute(...) in PanoPanel::DoStitch(). Due to something that looks like a wxWidgets bug (I can replicate it in the wxWidgets sample), arguments with non-ascii characters get dropped, resulting in bad arguments to hugin_stitch_project.

The solution would either be to replace the call to wxExecute, or wait for wxWidgets to fix it.

Revision history for this message
Guy Rutenberg (guyrutenberg) wrote :

I've attached a patch that seems to fix the bug (at least for me). It works by avoiding the buggy argument splitting of wxWidgets, and instead relying on passing an array of arguments to wxExecute after correctly handling wide-char to multibyte char conversion if needed.

Revision history for this message
Kim C (kimflowers) wrote :

According to a comment on the wxWidgets forum, the Unicode handling of arguments for Linux in Unicode was fixed in wxWidgets version 3.1 and (https://forums.wxwidgets.org/viewtopic.php?t=43236). This I can't confirm at the moment, but that'd indicate enforcing 3.1 or higher for wxWidgets would ostensibly mark this as resolved implicitly.

That version was enforced / released as of Feb 29th, 2016 (https://www.wxwidgets.org/news/2016/02/wxwidgets-3.1.0-released/) so that should be viable as an option I'd think?

Revision history for this message
Guy Rutenberg (guyrutenberg) wrote :

A quick check using libwxgtk3.2 shows that this issue is resolved. I checked both project path using non-Latin characters and filenames containing non-Latin characters.

Tested on Debian Unstable, Hugin 2022.

To post a comment you must log in.