Acquisitions - certain characters in title causes EDI translation to fail

Bug #812593 reported by Ben Shum
26
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Released
Medium
Mike Rylander

Bug Description

Evergreen version: 2.0.6
OpenSRF version: 2.0.0
PostgreSQL version: 8.4
Linux distribution: Ubuntu Lucid 10.04

One of our Acq purchase orders activated today was generated with a line item entry where the title had quotes around part of the text. The specific bib record in this scenario was: http://ur1.ca/4qecs

In that bib record, the "V" is surrounded with quotations marks. The action/trigger output that was generated from the PO, using the PO JEDI action/trigger came out with the line being like this:

{"BTI":""V" is for vengeance / Sue Grafton."},

Changing it in the database to:

{"BTI":"\"V\" is for vengeance / Sue Grafton."},

allowed the EDI translator to parse the contents and create the proper message.

Tags: acq 2.0 edi
Revision history for this message
Ben Shum (bshum) wrote : Re: Acquisitions - extra characters in title causes EDI translation to fail

Confirmed to also affect brackets in addition to " quotation marks.

summary: - Acquisitions - title with quotes causes EDI translation to fail
+ Acquisitions - extra characters in title causes EDI translation to fail
Ben Shum (bshum)
summary: - Acquisitions - extra characters in title causes EDI translation to fail
+ Acquisitions - certain characters in title causes EDI translation to
+ fail
Changed in evergreen:
importance: Undecided → Medium
status: New → Confirmed
Revision history for this message
Ben Shum (bshum) wrote :

An error message from our osrferror.log that's a result of the brackets being there:

./edi_pusher.pl: [ERR :2031:EDI.pm:272:] EDI Translator json2edi failed, Error 2: Uncaught exception Input/output error in method json2edi

Revision history for this message
Ben Shum (bshum) wrote :

So, we can also add question marks to the list of characters that will kill a PO that goes through EDI translation.

                {"BTI":"Have You Ever Seen a Hippo with Sunscreen?"},

This line died horribly during EDI translation:

EDI Translator json2edi failed, Error 2: Uncaught exception Pending escape char in IMD+F+BTI+:::Have You Ever Seen a Hippo with Sun:screen? in method json2edi

Removing the ? question mark allowed the PO translation to proceed as normal.

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

I'm pretty sure we can add # to the list. At least I think that's what this means, from edi_webrick.rb --verbose:

DE C273/7008: Illegal character: #

when passing a JEDI blob containing a lineitem title with # in it.

Also, note how I had to get the message. edi_webrick.bash does not call the ruby script with any options by default, and as far as I can tell, it does no logging otherwise, so I haven't found any way to get at meaningful translation error messages with the default setup.

Should edi_webrick.bash be adjusted so the logs can go somewhere, or do those of you fighting this in the wild already know of some other place to find such error messages?

Revision history for this message
Ben Shum (bshum) wrote :

I've only been tracking the symptoms of the problem based on the failures of the edi_pusher.pl script itself. Those usually point at problems parsing the output_template JEDI blob that you referred to. At this point, we've been working trial and error to check and fix by hand issues with translation failures. The only other hint we get is a cryptic failure line about "uncaught exception" in our osrferror.log files.

Having more verbose options for the edi_webrick to list out the specific reason for failure seems useful to me and we'd be willing to test out any suggestions for more scrutinizing logs to that effect. Not sure if EI's testing of EDI message failures has already tried this.

Revision history for this message
Michael Peters (mrpeters) wrote :

For anyone who wishes to give this a run through, bshum and I have documented the EDI configuration.

http://open-ils.org/dokuwiki/doku.php?id=acq:edi_configuration

This should get you through all of the steps to attempt to push out the PO.

Revision history for this message
Dan Scott (denials) wrote :

I think we need to look at adding more real-world tests to https://github.com/mbklein/openils-mapper/tree/master/test

Revision history for this message
Dan Scott (denials) wrote :

As a data point, I talked with the original developer of openils-mapper, and he believes that one of the components he relies on to escape / unescape the code, edi4r, is at least one problem area. At least for characters that are expected to be escaped (he doesn't think # is normally escaped in EDI).

<mbklein> dbs: The problem here is twofold: 1) EDIFACT is stupid about the escape character. First of all, they don't call it an escape character, they call it a release character. Which is just dumb. But the real dumb part is that it only acts as an escape character when it precedes a character that requires escaping.
<mbklein> dbs: i.e., "?:" => ":", but "?$" => "?$"
* dbs hopes he'll have time to read through the voluminous traffic on http://rubyforge.org/forum/forum.php?forum_id=12128 someday
<mbklein> dbs: 2) edi4r seems to be borked when it comes to escaping the escape character itself.
<zoia> gmcharlt` escapes the escape characters.
<mbklein> dbs: And, even though I said there were only 2 things… 3) The UNA segment at the top of the message defines the delimiters and release character for the message, which means that ? will ALMOST ALWAYS (but not necessarily) be the escape character, and [+:'] will ALMOST ALWAYS (but not necessarily) be the complete set of escapable characters.
<mbklein> dbs: Also for what it's worth, # isn't a reserved character in EDIFACT or JSON and shouldn't need special handling.
<mbklein> (Unless it's specified in the UNA segment as a delimiter)

Also, one of the core devs for the Evergreen EDI Perl code had this to say:

<atz> dbs: for what it's worth, at least one of my major testing partners said "we will never send you a question mark in our data"
<atz> or something like that, so the body of tests may reflect that
<dbs> atz: Fair enough on the testing partner front, but there are at least two sites with EDI providers who are not so kind :)
<atz> also the EDI version of titles has only superficial relationship to the underlying MARC records. this was a depressing realization.
<atz> dbs: point being, i think it would be OK to mangle the title/author input to the mapper based on how crappy the other data sources look :\
<mbklein> dbs: There's no reason to believe that the escaping bug would be limited to title fields, though obviously it would blow up differently depending on the consuming code.

Revision history for this message
Michael Peters (mrpeters) wrote :

Just for clarification, it's not a vendor who is shipping records with these types of characters. The problem occurs when we're using existing bib records in our system which, perfectly validly, contain these types of characters to place orders with vendors.

I'm not sure atz's first comment applies.

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

Here's a branch I'd like somebody to test that might deal with some of the problems listed in this ticket.

Any problems with ? : ' + [ ] in titles and other lineitem fields should be fixed by this (I think from that set there were only really problems with ? and [ ], and for ? only when ? was the last character in a string).

Any problems with characters with diacritics (and really any non ASCII characters) in lineitem fields should be fixed by this.

Regarding [ and ], I could not actually reproduce a problem with those characters in a title or other lineitem attribute field. There's no reason not to go ahead and strip them out though, since the fields we're affecting in the EDI output are really there only for visual fallback for the vendor, and aren't used by machines to look anything up, as I understand it.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/senator/edi-translation-improvements

Feedback desired. Thanks!

Revision history for this message
Lebbeous Fogle-Weekley (lebbeous) wrote :

In case you're thinking about this patch, Ben Shum reports in IRC that it still has problems, and could cause disruptions in a production environment. Test carefully! Looking at it further.

Revision history for this message
Ben Shum (bshum) wrote :

Indeed, applying it to our systems led to a situation where attempting to print a hold pull list failed with an error complaining about:

robj.template_output() is null

Checking our osrferror.log entries, we found something like this lurking every time an error occurred elsewhere:

Event reacting failed with Can't call method "run" on an undefined value at /openils/lib/perl5/OpenILS/Application/Trigger/EventGroup.pm line 52.

Restoring to our original code fixed the pull list errors we were seeing. Not sure why they'd be related, but perhaps something broke in our trigger/reactor.pm file when we patched it...

Revision history for this message
Ben Shum (bshum) wrote :

Hmm, tested this patch again today and things do not seem to be completely broken. I must have copy/pasted something wrong into our Reactor.pm file last time.

Please ignore my false alarm. Sorry for the confusion!

Will test further now that it doesn't break everything else...

Revision history for this message
Mike Rylander (mrylander) wrote :

Merged into master, 2.1 and 2.0. Thanks much, folks! senator++

Changed in evergreen:
status: Confirmed → Fix Committed
Ben Shum (bshum)
Changed in evergreen:
milestone: none → 2.0.10
assignee: nobody → Mike Rylander (mrylander)
Changed in evergreen:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.