Comment 8 for bug 812593

Revision history for this message
Dan Scott (denials) wrote :

As a data point, I talked with the original developer of openils-mapper, and he believes that one of the components he relies on to escape / unescape the code, edi4r, is at least one problem area. At least for characters that are expected to be escaped (he doesn't think # is normally escaped in EDI).

<mbklein> dbs: The problem here is twofold: 1) EDIFACT is stupid about the escape character. First of all, they don't call it an escape character, they call it a release character. Which is just dumb. But the real dumb part is that it only acts as an escape character when it precedes a character that requires escaping.
<mbklein> dbs: i.e., "?:" => ":", but "?$" => "?$"
* dbs hopes he'll have time to read through the voluminous traffic on http://rubyforge.org/forum/forum.php?forum_id=12128 someday
<mbklein> dbs: 2) edi4r seems to be borked when it comes to escaping the escape character itself.
<zoia> gmcharlt` escapes the escape characters.
<mbklein> dbs: And, even though I said there were only 2 things… 3) The UNA segment at the top of the message defines the delimiters and release character for the message, which means that ? will ALMOST ALWAYS (but not necessarily) be the escape character, and [+:'] will ALMOST ALWAYS (but not necessarily) be the complete set of escapable characters.
<mbklein> dbs: Also for what it's worth, # isn't a reserved character in EDIFACT or JSON and shouldn't need special handling.
<mbklein> (Unless it's specified in the UNA segment as a delimiter)

Also, one of the core devs for the Evergreen EDI Perl code had this to say:

<atz> dbs: for what it's worth, at least one of my major testing partners said "we will never send you a question mark in our data"
<atz> or something like that, so the body of tests may reflect that
<dbs> atz: Fair enough on the testing partner front, but there are at least two sites with EDI providers who are not so kind :)
<atz> also the EDI version of titles has only superficial relationship to the underlying MARC records. this was a depressing realization.
<atz> dbs: point being, i think it would be OK to mangle the title/author input to the mapper based on how crappy the other data sources look :\
<mbklein> dbs: There's no reason to believe that the escaping bug would be limited to title fields, though obviously it would blow up differently depending on the consuming code.