Pg 9.6 unaccent() changes how certain characters are normalized

Bug #1719986 reported by Galen Charlton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
New
Low
Unassigned

Bug Description

The following test in t/lp1501781-unaccent_and_squash.pg will fail to pass when running on PostgreSQL 9.6:

SELECT is(evergreen.unaccent_and_squash('Œuvres'),
          'euvres', 'oe ligature');

This is because Pg 9.6's unaccent() function was corrected so that unaccent('Œuvres') will now return 'OEuvres' rather than 'Euvres'.

The test case is easy enough to adjust, but it's probably worth poking at this a bit more to identify other cases where the normalization changed, as some REINDEXes on columns in actor.usr may be called for if patron names contain any of the affected ligatures.

Evergreen master

Tags: database
Galen Charlton (gmc)
Changed in evergreen:
milestone: none → 2.12.7
milestone: 2.12.7 → 3.next
importance: Undecided → Low
summary: - Test case in need of adjustment under Pg 9.6
+ Pg 9.6 unaccent() changes how certain characters are normalized
description: updated
Revision history for this message
Jeff Godin (jgodin) wrote :

This would potentially affect sites using pg_upgrade, but not those using pg_dumpall to perform the upgrade.

In addition to the changes to default contrib/unaccent mapping in PostgreSQL 9.6, there are further changes currently committed to master and likely to appear in PostgreSQL 11:

https://commitfest.postgresql.org/14/1161/

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=ec0a69e49bf41a37b5c2d6f6be66d8abae00ee05

It should be possible to write a check for potentially-affected values, but the changes are numerous (between 915 and 1029 mappings added/changed):

$ git diff --shortstat REL9_5_STABLE...REL9_6_STABLE -- contrib/unaccent/unaccent.rules
 1 file changed, 915 insertions(+), 53 deletions(-)

$ git diff --shortstat REL9_5_STABLE...master -- contrib/unaccent/unaccent.rules
 1 file changed, 1029 insertions(+), 53 deletions(-)

Worth noting: with the fix for bug 1671150 we'll be doing a drop/create on the affected indexes in upcoming 2.12 and 3.0 point releases, as well as 3.1.

At a minimum, it might be helpful to document which indexes an admin should consider REINDEXing if using pg_upgrade to move to PostgreSQL 9.6.

Perhaps a general "things to keep in mind when upgrading PostgreSQL" admin document or section of the release notes?

I propose we fix the test and document the concern either in an existing section in the docs/release notes, or start a new place if no suitable section exists.

Revision history for this message
Jeff Godin (jgodin) wrote :

Another option would be to fork the upstream unaccent.rules file and distribute it with Evergreen, where we would then control if and when it changed.

At least worth mentioning, but I'm not sure that I can advocate for it at this point.

tags: added: database
Revision history for this message
Jason Stephenson (jstephenson) wrote :

This one should be closed, I think. We've addressed the test failures. We still have the open question of distributing a custom unaccent.rules with Evergreen.

This issue also comes up again with Pg 12:

t/lp1501781-unaccent_and_squash.pg (Wstat: 0 Tests: 18 Failed: 1)
  Failed test: 12

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.