Tables not output correctly for TB2

Bug #1526370 reported by David Booth
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
calibre
Triaged
Undecided
John Schember

Bug Description

HTML tables are not output correctly when converting to FB2.

The following HTML:
<p><sup>9</sup>Вот их опись:<br /></p>
<table>
<tr><td>золотых блюд</td><td>30</td></tr>
<tr><td>серебряных блюд</td><td>1 000</td></tr>
<tr><td>ножей</td><td>29</td></tr>
<tr><td>
<sup>10</sup>золотых чаш</td><td>30</td></tr>
<tr><td>одинаковых серебряных чаш</td><td>410</td></tr>
<tr><td>других предметов</td><td>1 000</td></tr>
</table>
<br />

produces the following output in the FB2 file:

<p><sup>9</sup>Вот их опись:</p><empty-line /><empty-line /><p>золотых блюд</p>
<p>30</p>
<p>серебряных блюд</p>
<p>1 000</p>
<p>ножей</p>
<p>29</p>
<p><sup>10</sup></p>
<p>золотых чаш</p>
<p>30</p>
<p>одинаковых серебряных чаш</p>
<p>410</p>
<p>других предметов</p>
<p>1 000</p><empty-line />

Each cell of the table has become a paragraph.

In fact FB2 supports tables using <table>, <tr> and <td> tags, much like HTML. These tags should have appeared in the FB2 output rather than just <p> tags.

This problem was found in Calibre 2.45, running under Ubuntu 12.04.

Since the input was a proprietry format using a specially developed input conversion plugin, providing the input and output files would not be of use to you. The ebook-convert command line specified --debug-pipeline and the section of HTML quoted above is from one of the files in the input directory as written out during conversion.

Tags: fb2-output
Revision history for this message
Kovid Goyal (kovid) wrote : Re: calibre bug 1526370

Changing the component for this bug.

 assignee user-none
 tag fb2-output
 status triaged

Changed in calibre:
assignee: nobody → John Schember (user-none)
status: New → Triaged
Revision history for this message
David Booth (david-booth12) wrote :

Due to the lack of any progress on this bug in the last two months, I have had a look at the code myself. Attached is fb2ml.py with some changes I have made, for your consideration. I have tested my changes, and they seem to work fine. I have added handling for the HTML tags <table>, <tr>, <th> and <td>. Note that a table is not expected to be part of a paragraph, so I have had to make some changes to prevent <p> tags being generated when text in the table cells is encountered. I have not attempted to do anything more complicated like ensuring that a table is not within a paragraph, or that the table-related tags are correctly nested.

Revision history for this message
Kovid Goyal (kovid) wrote :

While I dont maintain the fb2 output code, so it is not possible for me to review your changes in general, one thing that I can say is that you can make it more general by check the value of style['display'] rather than tag names, since in html any tags can be made to behave like tables by setting the display property to 'table', 'table-row' and 'table-cell'.

Revision history for this message
David Booth (david-booth12) wrote : Re: [Bug 1526370] Re: Tables not output correctly for TB2

Thank you for your comment. I have now updated the code to check for the appropriate values of style['display'] as well as as tag names.I have also moved the checks to before checks for bold, italics, etc. since if a table cell or row has a style which makes it bold or italics, it seems more natural to have the corresponding tag within <td> or <tr> rather than the other way round.

      From: Kovid Goyal <email address hidden>
 To: <email address hidden>
 Sent: Wednesday, 24 February 2016, 3:35
 Subject: [Bug 1526370] Re: Tables not output correctly for TB2

While I dont maintain the fb2 output code, so it is not possible for me
to review your changes in general, one thing that I can say is that you
can make it more general by check the value of style['display'] rather
than tag names, since in html any tags can be made to behave like tables
by setting the display property to 'table', 'table-row' and 'table-
cell'.

--
You received this bug notification because you are subscribed to the bug
report.
https://bugs.launchpad.net/bugs/1526370

Title:
  Tables not output correctly for TB2

Status in calibre:
  Triaged

Bug description:
  HTML tables are not output correctly when converting to FB2.

  The following HTML:
  <p><sup>9</sup>Вот их опись:<br /></p>
  <table>
  <tr><td>золотых блюд</td><td>30</td></tr>
  <tr><td>серебряных блюд</td><td>1 000</td></tr>
  <tr><td>ножей</td><td>29</td></tr>
  <tr><td>
  <sup>10</sup>золотых чаш</td><td>30</td></tr>
  <tr><td>одинаковых серебряных чаш</td><td>410</td></tr>
  <tr><td>других предметов</td><td>1 000</td></tr>
  </table>
  <br />

  produces the following output in the FB2 file:

  <p><sup>9</sup>Вот их опись:</p><empty-line /><empty-line /><p>золотых блюд</p>
  <p>30</p>
  <p>серебряных блюд</p>
  <p>1 000</p>
  <p>ножей</p>
  <p>29</p>
  <p><sup>10</sup></p>
  <p>золотых чаш</p>
  <p>30</p>
  <p>одинаковых серебряных чаш</p>
  <p>410</p>
  <p>других предметов</p>
  <p>1 000</p><empty-line />

  Each cell of the table has become a paragraph.

  In fact FB2 supports tables using <table>, <tr> and <td> tags, much
  like HTML. These tags should have appeared in the FB2 output rather
  than just <p> tags.

  This problem was found in Calibre 2.45, running under Ubuntu 12.04.

  Since the input was a proprietry format using a specially developed
  input conversion plugin, providing the input and output files would
  not be of use to you. The ebook-convert command line specified
  --debug-pipeline and the section of HTML quoted above is from one of
  the files in the input directory as written out during conversion.

To manage notifications about this bug go to:
https://bugs.launchpad.net/calibre/+bug/1526370/+subscriptions

Revision history for this message
David Booth (david-booth12) wrote :

Sorry, there was an error in the copy of fb2ml.py attached to my last message - I had failed to delete some some of the code I had replaced. The correct version of the file is attached.

Albus (albuspercival5)
information type: Public → Private Security
information type: Private Security → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.