Extension Text > Extract cannot deal with non-ASCII characters

Bug #1810626 reported by Hachmann
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Inkscape
Invalid
Low
Qantas94Heavy

Bug Description

Tested on 0.92.x as of Dec. 23rd 2018

1. Select the text that contains an ö in the attached SVG file#
2. Apply Extensions > Text > Extract...
3. Get error message:

alkjg aghalh ajgajhajhgha ahgah ahgp
Traceback (most recent call last):
  File "text_extract.py", line 164, in <module>
    e.affect()
  File "/opt/inkscape_0.92.x/share/inkscape/extensions/inkex.py", line 283, in affect
    self.effect()
  File "text_extract.py", line 148, in effect
    self.recurse(deepcopy(self.selected[item[1]]))
  File "text_extract.py", line 156, in recurse
    inkex.errormsg(inkex.etree.tostring(node, method='text').strip())
  File "src/lxml/etree.pyx", line 3342, in lxml.etree.tostring
  File "src/lxml/serializer.pxi", line 103, in lxml.etree._tostring
  File "src/lxml/serializer.pxi", line 75, in lxml.etree._textToString
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0: ordinal not in range(128)

Revision history for this message
Hachmann (marenhachmann) wrote :
tags: added: extensions-plugins
Revision history for this message
Firoz Taverbi (firoztaverbi) wrote :

Closing because Inkscape 1.0 has a different bug with Text > Extract, currently being investigated here: https://gitlab.com/inkscape/inbox/issues/409

Closed by: https://gitlab.com/firoztaverbi

Changed in inkscape:
status: New → Invalid
Revision history for this message
Hachmann (marenhachmann) wrote :

Thanks for looking at this.

I disagree that the other issue makes this one moot, though. I added a comment to the other issue, please verify that it works with öüä etc. also, when that issue is fixed.

Changed in inkscape:
status: Invalid → New
Revision history for this message
Firoz Taverbi (firoztaverbi) wrote :

Hi Hachmann,

As far as I'm aware as Inkscape 1.0 is in active development bugs in 0.92 are no longer being looked into, hence the migration. We've been instructed to close bugs that aren't reproducible in Inkscape 1.0. You can find more info here: http://alpha.inkscape.org/bug-migration/index.html

Changed in inkscape:
status: New → Invalid
Revision history for this message
Hachmann (marenhachmann) wrote :

Hi Firoz, I'm a long-term Inkscape project member, and I will retest this when the extension works again. I will leave this open. Thanks for your understanding.

It is untrue that bugs in 0.92.x are no longer being fixed, but what I am saying is this needs checking again with 1.0 WHEN THE EXTENSION WORKS.

Changed in inkscape:
status: Invalid → New
Revision history for this message
Hachmann (marenhachmann) wrote :

So, to be more clear: This issue isn't currently reproducible because another issue is making it untestable. That is not the same as saying that this issue is fixed now.

Revision history for this message
Hachmann (marenhachmann) wrote :

Only issues that are FIXED should be marked as such without migration.

Changed in inkscape:
status: New → Triaged
importance: Undecided → Low
milestone: none → 0.92.5
Changed in inkscape:
assignee: nobody → Qantas94Heavy (qantas94heavy)
Revision history for this message
Hachmann (marenhachmann) wrote :

(I assume this might be fixed already, due to the unicode migration that was done with extensions, but needs testing)

Revision history for this message
Qantas94Heavy (qantas94heavy) wrote :

I just tested this with 1.0alpha (56f1b1b843, 2019-04-29) and this was not fixed.

There's a few parts to it with Python 2:

1. etree.tostring fails because encoding=utf8 is not specified.
2. If you fix that, inkex.errormsg attempts to convert this utf8-encoded byte string into a Unicode string, but unicode(msg) fails because the encoding is not specified.
3. If you fix that, you quickly realise that was sort of useless, as sys.stderr.write expects a byte string and not a Unicode string. (BUT you only see this error when testing if you set LC_ALL=C!)
4. You then realise that some extension files have unicode_literals set but not others, which means whether you receive a byte string or Unicode string becomes a minefield.

All in all, quite a fun (cough) bug! Now I've got to figure out how to fix this properly :)

Revision history for this message
Hachmann (marenhachmann) wrote :

Arghs... I had high hopes for this one to 'just work'...

Thank you for investigating and working on a fix, Quantas94Heavy!

(how the heck did you figure out the LC_ALL thing...? You must be super patient and methodical.)

Revision history for this message
Qantas94Heavy (qantas94heavy) wrote :

Hi there, I'm going to move this conversation over to our new bug tracker on GitLab. This will make it easier for us to discuss a solution. Please check there for future updates. Thank you!

Moved to: https://gitlab.com/inkscape/extensions/issues/76
Closed by: https://gitlab.com/Qantas94Heavy

tags: added: bug-migration
Changed in inkscape:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.