PDF import forget math sum symbols

Bug #742364 reported by Mario Valle
26
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Inkscape
Triaged
Medium
Unassigned

Bug Description

0.48.0 r9654 on Windows 7
Imported PDF page with math formulas containing sum symbols (uppercase Greek sigma). Imported everything except this symbol.
to reproduce download http://arxiv.org/PS_cache/arxiv/pdf/1103/1103.4807v1.pdf and load upper half of page 5
Thanks for looking!
mario

Tags: importing pdf
Revision history for this message
Mario Valle (mvalle) wrote :
su_v (suv-lp)
tags: added: importing pdf
Revision history for this message
su_v (suv-lp) wrote :

Reproduced with Inkscape 0.47 and 0.48.1 (official packages) on OS X 10.5.8 (i386)
as well as with Inkscape 0.48+devel r10178 (with poppler 0.16.4 and cairo 1.10.2)

Notes:
1) Evince 2.30.2 displays the file correctly, but when selecting the text, it omits the sum signs (possibly indicating that those glyphs are created differently than other parts of the formulas)
2) GIMP on OS X (2.6.11) does read the uppercase sigma characters on import (using poppler as well, AFAIU)
3) the preview in the PDF import dialog correctly displays the sum signs
4) saving page 5 as SVG with Inkscape 0.47 or 0.48.0 creates invalid SVG files with improperly encoded content:

(inkscape:43131): Gtk-WARNING **: Unable to find default local directory monitor type
/Volumes/blue/img/Inkscape/test/bug/742364-1103.4807v1-p5-0480.svg:3192: parser error : Input is not proper UTF-8, indicate encoding !
Bytes: 0x80 0x3C 0x2F 0x74
             id="tspan4752">?</tspan></text>
                            ^

5) saving page 5 as SVG with Inkscape 0.48.1 or current trunk creates a valid SVG file, but omits the glyph used for the sum symbol. The text objects for the sum symbols exist, but are empty.

Seems related to
Bug #605872“pdf to svg fails with characters from Unicode Plane 1 (SMP)”
and its fix discussed in
Bug #369861“Unable to open previously imported pdf file”
(see all comments by Khaled Hosny who provided a patch to ensure that no invalid UTF-8 code is returned «with caveat that glyphs with no proper Unicode (unencoded glyphs) will be just omitted»).

Changed in inkscape:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Gary Ballantyne (gary-ballantyne-e) wrote :

Reproduced with Inkscape 0.48.1 on Ubuntu 11.04.

To generate a simple test case I used the following latex:

\documentclass{article}
\usepackage{lmodern}
\thispagestyle{empty}
\begin{document}
$\sum$
\end{document}

The output from pdflatex shows a single uppercase sigma "sum" (viewed in Evince). I used the "latin modern" font (lmodern above) because -- once the lmodern package is installed in ubuntu -- the import of the non-mathematical symbols (from PDF) looks very good (in inkscape, and when the saved SVG is viewed in firefox). In fact, a some of the maths imports nicely also (subscripts and fractions, for e.g.), but not symbols (sums, integrals, large brackets/braces).

When the test PDF is imported into inkscape the page is blank (it makes no difference whether the "Replace PDF fonts ..." checkbox selected). The XML editor shows a blank text object with the following style:

fill:#000000;fill-opacity:1;fill-rule:nonzero;stroke:none;font-family:LMMathExtension10;font-variant:normal;font-weight:normal;font-size:9.9626;writing-mode:lr;-inkscape-font-specification:LMMathExtension10-Regular

Using the LM fonts, my experience is that nothing with "font-family:LMMathExtension10" displays properly (I have tried sums, integrals and large braces/brackets). I am not sure whether this is significant, but looking in Text>Glyphs, with:

FontFamily: LMMathExtension10
Style: Normal
Font Size: 10
Script: All
Range: All

shows only 20 glyphs (around U+F8F0), which apparently relate to constructing large brackets/braces. In particular, I can't find a capital sigma.

Is it possible that the problem is with the definition of LMMathExtension10 that inkscape sees? The PDF (I think anyway) embeds the font -- and possibly using a different font definition than inkscape.

Revision history for this message
Beluga (buovjaga) wrote :

Poppler/Cairo import works.

Arch Linux 64-bit, KDE Plasma 5
Inkscape 0.92pre1 15054 (GTK3)

jazzynico (jazzynico)
Changed in inkscape:
status: Confirmed → Triaged
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.