MonoUralic contains incorrect encoding information

Bug #73211 reported by Benjamin C. Wiley Sittler
24
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ttf-uralic (Ubuntu)
Triaged
Low
Arne Goetje

Bug Description

Binary package hint: ttf-uralic

the MonoUralic font (/usr/share/fonts/truetype/uralic/monou___.ttf)
has incorrect character map information and so displays characters in
the Unicode range U+0080 ... U+00FF incorrectly (as cyrillic.) this is
especially a problem since the package installs MonoUralic as a
replacement for "Courier", which many applications (including
e.g. Firefox) depend on for correct Unicode display.

other fonts in the ttf-uralic package appear to have the same bug.

this means that if ttf-uralic is installed, then e.g. French webpages become unreadably broken.

Tags: pet-bug
Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

just fyi, the version: 0.0.20040829-1ubuntu1

Revision history for this message
Lionel Le Folgoc (mrpouit) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. You reported this bug a while ago and there hasn't been any activity in it recently. We were wondering is this still an issue for you? Thanks in advance.

Changed in ttf-uralic:
status: New → Incomplete
Revision history for this message
Arne Goetje (arnegoetje) wrote :

From the README file:
-----------------------------------------------------
Additional letters

The encoding of the fonts is based on the model of the Cyrillic Asian
encoding. The Russian alphabet occupies the place of
the Latin-1 supplement in the Western (Windows CP 1252) encoding and its
own Unicode positions in the Cyrillic (CP 1251) encoding. Additional
Uralic letters can be found in three places - instead of additional
characters in the Western encoding, instead of additional characters in
the Cyrillic encoding and in their own Unicode positions (with the
exceptions of those letters that are not found in this standard).
Existing Mari and Udmurt fonts were taken into consideration while
distributing positions, but incorporating their encodings did not prove
possible. Eventually, Udmurt fonts were used as the starting point. See
the test page for details.

----------------------------------------------------------------------------

These fonts are for Uralic languages only! So, don't use these fonts for any other purpose.
I will nevertheless contact upstream and propose a fix. IMHO the fonts have been wrongfully encoded and are therefor broken. Even when used in a Unicode environment, additional Cyrillic characaters should only show up in the U+00A0 - U+00BF range. At least, that's the mapping of CP1251 to Unicode.

Changed in ttf-uralic:
assignee: nobody → arnegoetje
status: Incomplete → In Progress
Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote : Re: [Bug 73211] Re: MonoUralic contains incorrect encoding information

Yes, the bug is only with the Unicode mapping. The 8-bit encoding
seems well-established and correct.

And I can confirm that this package still has this bug on my system :(

On 9/6/07, Arne Goetje <email address hidden> wrote:
> >From the README file:
> -----------------------------------------------------
> Additional letters
>
> The encoding of the fonts is based on the model of the Cyrillic Asian
> encoding. The Russian alphabet occupies the place of
> the Latin-1 supplement in the Western (Windows CP 1252) encoding and its
> own Unicode positions in the Cyrillic (CP 1251) encoding. Additional
> Uralic letters can be found in three places - instead of additional
> characters in the Western encoding, instead of additional characters in
> the Cyrillic encoding and in their own Unicode positions (with the
> exceptions of those letters that are not found in this standard).
> Existing Mari and Udmurt fonts were taken into consideration while
> distributing positions, but incorporating their encodings did not prove
> possible. Eventually, Udmurt fonts were used as the starting point. See
> the test page for details.
>
> ----------------------------------------------------------------------------
>
> These fonts are for Uralic languages only! So, don't use these fonts for any other purpose.
> I will nevertheless contact upstream and propose a fix. IMHO the fonts have been wrongfully encoded and are therefor broken. Even when used in a Unicode environment, additional Cyrillic characaters should only show up in the U+00A0 - U+00BF range. At least, that's the mapping of CP1251 to Unicode.
>
> ** Changed in: ttf-uralic (Ubuntu)
> Assignee: (unassigned) => Arne Goetje
> Status: Incomplete => In Progress
>
> --
> MonoUralic contains incorrect encoding information
> https://bugs.launchpad.net/bugs/73211
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Arne Goetje (arnegoetje) wrote :

I have reencoded the font (attached).

Could you please try if all the characters show correctly in CP1251 encoding?
The font is modified CP1251 according to uralic needs.

If this works, I will create a second font for Unicode.
However, that character mappings will be different and webpages and documents, which are in the modified CP1251 encoding will not work.

For testing, please remove the ttf-uralic package. Then copy the attached font to ~/.fonts/ and run
fc-cache -fv ~/.fonts/
After that:
xset fp rehash

Then test with your favorite application / document.

Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

i installed and tested this font (with no other uralic fonts present),
and it still does not work correctly. it does not seem to contain
correct unicode mappings, or at least it does not display the special
udmurt and mari characters in firefox (firefox falls back on a
different font, freemono perhaps, to display those characters even
when the rest of the page is shown in monouralic.)

furthermore some special characters are incorrectly displayed as
cyrillic on pages using the monouralic font in firefox. for example,
the "dagger" character (utf-8 %E2%80%A0, U+2020) is incorrectly
displayed as a modified cyrillic Ka.

note: all tests were done on utf-8 encoded pages where firefox was
correctly detecting the encoding and characters from other parts of
unicode were displayed correctly (in different fonts, of course).

On 9/20/07, Arne Goetje <email address hidden> wrote:
> I have reencoded the font (attached).
>
> Could you please try if all the characters show correctly in CP1251 encoding?
> The font is modified CP1251 according to uralic needs.
>
> If this works, I will create a second font for Unicode.
> However, that character mappings will be different and webpages and documents, which are in the modified CP1251 encoding will not work.
>
> For testing, please remove the ttf-uralic package. Then copy the attached font to ~/.fonts/ and run
> fc-cache -fv ~/.fonts/
> After that:
> xset fp rehash
>
> Then test with your favorite application / document.
>
> ** Attachment added: "MonoUralic.ttf"
> http://launchpadlibrarian.net/9435247/MonoUralic.ttf
>
> --
> MonoUralic contains incorrect encoding information
> https://bugs.launchpad.net/bugs/73211
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

my mistake, i misread your update. the new font works correctly for the modified cp1251 encoding, but fails horribly (as before) in unicode-aware applications.

Revision history for this message
Arne Goetje (arnegoetje) wrote :

Please note, that the dagger - modified Ka display is because of the modified CP1251 encoding. In the original CP1251 encoding it is indeed a dagger symbol. However, the CP1251 version we are dealing with has replaced some glyphs with Uralic ones.

I have packaged a testing version of MonoUralic.

http://ppa.launchpad.net/arnegoetje/ubuntu/pool/main/t/ttf-uralic/

The package contains only the MonoUralic font in two flavors: one with the modified CP1251 encoding, the other with the correct ISO10646 encoding.

I'm afraid, because the CP1251 encoding was tempered with, we have no choice than to provide 2 font flavors, one for each encoding.

So, if this testing package works for you, I'll modify the remaining fonts accordingly.

Also, I have prepared the same fonts as TTC file (Truetype Collection), which should save some space on users' systems.

If the above package works for you, I will create a new one with the TTC file enabled instead. It will need to be tested with as many applications as possible, weather or not they can use both font flavors out of the TTC.

Are you willing to test that also?

Otherwise, if the TTC doesn't work as expected, we'll need to ship two separate fonts, like we have now.

Revision history for this message
Arne Goetje (arnegoetje) wrote :

need feedback. does the proposed solution work for you?

Changed in ttf-uralic:
status: In Progress → Incomplete
Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

Hi, Arne.

Both of my Ubuntu boxes are currently down and have been for some
months. I will test as soon as I have one of them working again.

Sorry for the delay!

-Ben

On Mon, Nov 10, 2008 at 8:05 PM, Arne Goetje <email address hidden> wrote:
> need feedback. does the proposed solution work for you?
>
> ** Changed in: ttf-uralic (Ubuntu)
> Status: In Progress => Incomplete
>
> --
> MonoUralic contains incorrect encoding information
> https://bugs.launchpad.net/bugs/73211
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

Hi, Arne.

Both of my Ubuntu boxes crashed (different reasons in each case) a couple months ago. I don't know when I'll be able to get them working again, but I will test this when I do.

I do apologize for the inconvenience!

-Ben

Revision history for this message
Arne Goetje (arnegoetje) wrote :

Any update on this?

Actually, I'd like to close this issue, since the fonts in question are only to be used with the modified CP1251 encoding for Uralic. Since we use Unicode (UTF-8) for all languages, these fonts would only be interesting for users who want to display Uralic documents which use the broken encoding.

All other users should refrain from installing this font package. It is simply not meant to be used for other purposes and only exists for compatibility for this special case.

Also, we simply don't fork fonts in Ubuntu, requests like this should actually go to upstream.

(Since this was one of the first font modification requests I got, I considered to fulfill this request, because I thought it would be a one time issue. However, since more and more users report similar requests for other fonts which are actually the same case (they want to use the fonts for purposes they were not designed for), I decided to simply not doing this, since users will complain "you did this for another case, why not for mine?". I simply don't have the resources (read: "time") to fulfil all kind of wishes in different fonts, when it is actually upstream's job to do this.)

I hope you understand my point of view.

If you want to display Uralic characters in general, you can use other fonts which contain Cyrillic glyphs. Most of those fonts should also contain the necessary Uralic glyphs In Unicode encoding (if they don't, request upstream to add them ;) ). Those Unicode fonts can then also be used to display other languages.

Revision history for this message
Arne Goetje (arnegoetje) wrote :

I will fix the _package_ in that regard, that the font does not get used automatically and that the replacement configurations are removed. Means, you will need to explicitly choose the font if you want to display documents with the broken encoding.

Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote :

That sounds good to me. As I said, once I have access to an Ubuntu
machine again, I will test.

On Sun, Jan 18, 2009 at 8:04 PM, Arne Goetje <email address hidden> wrote:
> I will fix the _package_ in that regard, that the font does not get used
> automatically and that the replacement configurations are removed.
> Means, you will need to explicitly choose the font if you want to
> display documents with the broken encoding.
>
> ** Tags added: pet-bug
>
> --
> MonoUralic contains incorrect encoding information
> https://bugs.launchpad.net/bugs/73211
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Arne Goetje (arnegoetje)
Changed in ttf-uralic (Ubuntu):
importance: Undecided → Low
status: Incomplete → Triaged
Revision history for this message
Benjamin C. Wiley Sittler (bsittler) wrote : Re: [Bug 73211] Re: MonoUralic contains incorrect encoding information

Sorry, Ubuntu box has been down for >1yr here :(

On Wed, Mar 17, 2010 at 19:51, Arne Goetje <email address hidden> wrote:
> ** Changed in: ttf-uralic (Ubuntu)
>   Importance: Undecided => Low
>
> ** Changed in: ttf-uralic (Ubuntu)
>       Status: Incomplete => Triaged
>
> --
> MonoUralic contains incorrect encoding information
> https://bugs.launchpad.net/bugs/73211
> You received this bug notification because you are a direct subscriber
> of the bug.
>

Changed in ttf-uralic (Ubuntu):
status: Triaged → Fix Committed
status: Fix Committed → Fix Released
status: Fix Released → New
Arne Goetje (arnegoetje)
Changed in ttf-uralic (Ubuntu):
status: New → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.