marc_export: default to UTf-8

Bug #2015758 reported by Galen Charlton
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Evergreen
Fix Committed
Wishlist
Unassigned

Bug Description

There are fewer systems nowadays that can only accept MARC records that using the MARC8 character encoding. Consequently, we should consider changing the default output encoding to UTF-8.

Since such a change would backwards-incompatible, opening this ticket as a discussion item.

Evergreen master

Galen Charlton (gmc)
Changed in evergreen:
status: New → Confirmed
importance: Undecided → Wishlist
tags: added: cat-marc needsdiscussion
Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

+1 to this suggestion, UTF-8 not being the default has bitten me several times when I don't remember that it always needs to be specified. I think this would help others to not make that mistake.

We have no consumers of our MARC records that wouldn't want them in UTF-8 format.

Josh

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Here is a working branch that makes this change in case the discussion of this moves forward.

user/stompro/lp2015758-default-encoding-utf8

https://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/user/stompro/lp2015758-default-encoding-utf8

Revision history for this message
Josh Stompro (u-launchpad-stompro-org) wrote :

Bugs somewhat related to marc_export defaulting to MARC8 instead of UTF-8

Bug 1671845 talks about how marc_export in MARC8 format seems to create invalid output.

Bug 1940702 - MARC8 records with diacritics are exported with incorrect record length.

Maybe these are the same issue? No fixes for either, so anyone using MARC8 format with marc_export may be generating invalid data in any case.

Josh

Changed in evergreen:
assignee: nobody → Jason Stephenson (jstephenson)
Changed in evergreen:
assignee: Jason Stephenson (jstephenson) → nobody
milestone: none → 3.12-beta
status: Confirmed → Fix Committed
tags: removed: needsdiscussion
Revision history for this message
Jason Stephenson (jstephenson) wrote :

I agree that UTF-8 should be the default.

I have tested Josh's changes, they work for me. Exporting a batch of records with no encoding looks the same as exporting a batch of records with the encoding set to UTF-8.

MARC8 records look different as expected when diacritics are present.

I removed the "needsdiscussion" tag because I'm not sure what there is to discuss. The release note indicates that users nay need to modify their scripts and/or workflows.

I signed off and pushed the changes to the main branch for the 3.12 release.

Thanks, Josh, Galen, and Susan for contributing to this bug.

tags: added: signedoff
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.