Search with accents

Bug #553133 reported by Frédéric (Ferme du Sart)
36
This bug affects 5 people
Affects Status Importance Assigned to Milestone
Odoo Server (MOVED TO GITHUB)
Fix Released
Wishlist
OpenERP's Framework R&D

Bug Description

Scenario:
- All users using french language

1. Searching for a resource with translation
- Creation of a product named "test A" (product.template.name = ir.translation.value = "test A")
- Writing the name as "test B" (ir.translation.value = "test B")
- Search for products those name contains 'A'
     => returns "test B" (SELECT name FROM product template WHERE name ILIKE '%A%')

2. Searching for a resource conaining accents
- Trying to find a product named "Bouchée à la reine"
- Search "Bouchee a la reine" returns nothing

=> Both BUGs fixed with a patch. Please patch osv/expression.py as soon as possible

Revision history for this message
Frédéric (Ferme du Sart) (frederic-declercq) wrote :
Revision history for this message
Frédéric (Ferme du Sart) (frederic-declercq) wrote :

I press twice on submit:

Bug #553133 = Bug #553135

Please drop one

Revision history for this message
Christophe CHAUVET (christophe-chauvet) wrote :

Hi

Your patch only work for european people, and not a global solution, i have test many solution, but none are satisfy me.

This patch may create a regression on asian countries, don't apply it

Regards,

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

To me item 1. seems like a feature, not a bug... (I like to be able to find Product A by searching for 'B' *or* 'A' in this case?

Proposed patch for item 2. seem quite brutal and not very clean. Can't we find a solution based on postgres' indexing features?

Anyway we can't change anything about this in stable, but we can discuss this for trunk.

Changed in openobject-server:
assignee: nobody → Anup (Open ERP) (ach-openerp)
milestone: none → 5.0.9
Revision history for this message
Frédéric (Ferme du Sart) (frederic-declercq) wrote :

Thanks, Christophe for the comment.

Somethings has to be done. This patch should be revisited by OpenERP's Team to be usefull for international.

Olivier: no
2 scenarios lived as examples:

 1.
An employee has to create many products.
He creates twice the same (or nearly).
When he sees his error, overwrites the name.
The translation has changed, but not product_template.name.
He cannot understand to see it anytime without SEEING the reason.

2.
The employees has created the good product, but in use, sees that his product name is too large and not productive for any report (labels, reports, tags...)
He decides to overwrite it in a shorter way.
Any request containing a part of the ancient name will return this product as long as this product exists.

He trys to avoid this by duplicating the product, thinking that restarting from zero will avoid the error. But the error is duplicated too !

In most cases, societies don't need internationalization, and modifying a traduction is not instinctive for them.

Revision history for this message
Numérigraphe (numerigraphe) wrote :

"In most cases, societies don't need internationalization, and modifying a traduction is not instinctive for them."
I totally agree with this.
This seems out of the scope of your initial bug report. Maybe you'll want to subscribe to bug 400256 and give your opinion.

Concerning the issue of searches with accents, I agree it's needed on the trunk, but not to be pushed to 5.0 - though it would be nice to have the final patch posted for 5.0 here for those interested.
Lionel

Changed in openobject-server:
status: New → Confirmed
Revision history for this message
Christophe CHAUVET (christophe-chauvet) wrote :

It's a PostgreSQL issue, not OpenERP one

Regards,

Revision history for this message
Frédéric (Ferme du Sart) (frederic-declercq) wrote :

Christophe> It's a PostgreSQL issue, not OpenERP one

??? PostgreSQL desn't have to be user-friendly. An ERP, by definition, has to be user-friendly, even if the technology choosen has unresolved problems.

Searching omitting accent is a normal need.

Please stop answering with that's not an ERP bug, when the need is normal, and resolved for all other good competitors.

My proposition isn't the best one, because, as you say, it could be a drama for asian countries.
Anyway, we don't have to be used to get results different from search criterias, or having to know if products have been created with or without accents.

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote : Re: [Bug 553133] Re: Search with accents

Next version of PostgreSQL which will be 9.0 will include an unaccent module
which will include the "unaccent()" function that can be used for this
purpose:

http://developer.postgresql.org/pgdocs/postgres/unaccent.html#AEN123210

I would not reinvent the wheel and wait for next PostgreSQL release.

A Dijous, 1 d'abril de 2010, Frederic D. va escriure:
> Christophe> It's a PostgreSQL issue, not OpenERP one
>
> ??? PostgreSQL desn't have to be user-friendly. An ERP, by definition,
> has to be user-friendly, even if the technology choosen has unresolved
> problems.
>
> Searching omitting accent is a normal need.
>
> Please stop answering with that's not an ERP bug, when the need is
> normal, and resolved for all other good competitors.
>
> My proposition isn't the best one, because, as you say, it could be a drama
> for asian countries. Anyway, we don't have to be used to get results
> different from search criterias, or having to know if products have been
> created with or without accents.
>

--

Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partner
Mòbil: +34 669 40 40 18

Revision history for this message
Cloves Almeida (cjalmeida) wrote :

For me, field content translation per se is a bug ;)

About non accented search, a more elegant solution would be to provide a module and overload search method when querying the specific columns. This way you make it more flexible and don't "force" the feature on unwanted audiences.

You could even use Whoosh for better full-text search.

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote :

A Dijous, 1 d'abril de 2010, Cloves Almeida va escriure:
> For me, field content translation per se is a bug ;)
>
> About non accented search, a more elegant solution would be to provide a
> module and overload search method when querying the specific columns.
> This way you make it more flexible and don't "force" the feature on
> unwanted audiences.
>
> You could even use Whoosh for better full-text search.
>

Koo already provides full text search functionalities using Postgres' own Full
Text Search capabilities which are already integrated and quite powerful.

--

Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partner
Mòbil: +34 669 40 40 18

Revision history for this message
Ferdinand (office-chricar) wrote :

there is definitively an issue with copying resources (products) IMHO for all other languages than English - as the user does not realize the he has to correct manually both English and his language.
Correcting only one gives realy much headache for those accessing the data in another language

we have discussed this in the usability group - see also
https://blueprints.launchpad.net/openobject-server/+spec/translation-update

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

Ferdinand @ ChriCar wrote:
> there is definitively an issue with copying resources (products) IMHO for all other languages than English

Yes, and this is the real issue with item 1 I think. If a field is translated and people don't use it the translation, we should provide an unobtrusive solution to it, rather than crippling the search feature, which could be quite useful when people *do* use the translations. That's what I meant in my first reply.

Frederic, one cleaner workaround if you don't use the translations is to make a module that removes the "translate" flag on the product.template.name, or automatically synchronizes the translations when you change one of them.
But still there is the general useability issue with copying objects with translated fields, so I guess we need a global solution to make this intuitive.

In any case, this must be discussed for trunk and not for stable (and the same goes for item 2)

Revision history for this message
Frédéric (Ferme du Sart) (frederic-declercq) wrote :

Thanks to all for you replys and advices.

Agree with Olivier:
- forcing fields.char and fields.text to translate=False into a module can resolve all translations problems when we don't need id.

Then I'll can wait a more standard solution (PostgreSQL or OpenERP).

Revision history for this message
Numérigraphe (numerigraphe) wrote :

Frederic, I agree that it should be possible to turn "off" translations (maybe with a server command-line option), or to change the main locale to that of the company.
Please, I beg you do join in on bug 400256 - I'm having a hard time having Tiny to accept this.

I humbly suggest we concentrate this bug report on the issue of searching with accented characters.
Lionel

Revision history for this message
xrg (xrg) wrote :

On Thursday 01 April 2010, you wrote:
> It's a PostgreSQL issue, not OpenERP one
>
Totally agree. The pg team has already admitted this missing feature from the
db (it has to do with the libs used by pg 8.x for international conversions).

Any attempt to do that in python will just be an ugly, sub-optimal hack. That
is, when we search in the db, there can be no python intervention in the
matching (ilike op) loop. And, of course, partially solving a few iso8859-1
characters is useless for other languages.

Changed in openobject-server:
milestone: 5.0.9 → 5.0.10
Changed in openobject-server:
status: Confirmed → In Progress
Revision history for this message
Anup(SerpentCS) (anup-serpent) wrote :

Hello Everyone,

   For the first problem we have solution similar to the patch suggested by Frederic. Its a generalized solution and will not be harmful for any language.
  For the 2nd its not possible to do so as there is no way one can convert that " é = e" This is just for French if some other non english character is there one can not search the english alphabets directly converted to particular language. so i suggest 2nd thing is not a problem at all.

You'll get the results according to the letters which are matching the to the translations in the ir_translation and also the records which do not have translations and match the letters in the particular model.

Here I have attached a clearer patch. Would you please check it and notify?

Thanks.

Revision history for this message
Anup(SerpentCS) (anup-serpent) wrote :
Changed in openobject-server:
importance: Undecided → Medium
Revision history for this message
xrg (xrg) wrote :

On Friday 30 April 2010, you wrote:
> ** Changed in: openobject-server
> Importance: Undecided => Medium
>
>Bug description:
>Scenario:
 > - All users using french language

Scenario 2:
- Some user uses another, but the French, language...

This is where this patch only adds some ugly overhead.

Changed in openobject-server:
milestone: 5.0.10 → 5.0.11
Revision history for this message
Jay Vora (Serpent Consulting Services) (jayvora) wrote :

Hello Panos,

Would you please check with this patch?

If there is a non-English preference set,it will always search for translated value.

And for accented characaters, its working alright.

Thanks.

Changed in openobject-server:
milestone: 5.0.11 → 5.0.12
Revision history for this message
xrg (xrg) wrote :

Let's see what behaviour we consider ideal:

say, we have a table tbl1(id=4, 'Belgium), and translations ir_translation(res_id=4, lang='fr_FR', value='Belgique'), (res_id=4, lang='nl_NL', value='België')

We would want that search( ['name' = 'Belgique'], context.lang='fr_FR') == True
                                search( ['name' = 'Belgium'], context.lang='fr_FR') == False # !
                                search( ['name' = 'België'], context.lang='fr_FR') == False # no cross-lang
                                search( ['name' = 'België'], context.lang='nl_NL') == True
                                search( ['name' = 'Belgie'], context.lang='nl_NL') == True # ignore accent
                                search(['name' = 'Belgium'], context.lang='el_GR') == True # no translation, so default

Revision history for this message
Ferdinand (office-chricar) wrote :

we just had the case today to convert utf-8 to ascii and found this

http://fraggod.net/oss/projects/unicode2ascii.py

IMHO a possible search workflow would be
* search exact matching - default
* search fuzzy matching - extra button in the search window (using postgres features oder unicode2ascii.py)

and/or an attributed on database/company/group/user level which of the methods should be default.

BTW I agree with Albert
https://bugs.launchpad.net/openobject-server/+bug/553133/comments/11

Revision history for this message
Ferdinand (office-chricar) wrote :

another reason fur the need of fuzzy matching in language like French
* the keys for certain letters like ¢, æ,Ç are not easily available on the keyboard of the user (international companies, cross language support)
* not native speakers might not know the correct spelling - of course this goes beyond just replacing single characters but using more sophisticated phonetically matching.

Revision history for this message
xrg (xrg) wrote :

On Thursday 05 August 2010, you wrote:
> we just had the case today to convert utf-8 to ascii and found this
>
> http://fraggod.net/oss/projects/unicode2ascii.py

That's still *only* for latin letters. Here, we have a huge problem with non-
matching Greek accented ones. And, to make things worse, people don't input
the accents right most of the time. So, our search is broken.

> IMHO a possible search workflow would be
> * search exact matching - default
> * search fuzzy matching - extra button in the search window (using postgres
> features oder unicode2ascii.py)
> and/or an attributed on database/company/group/user level which of the
> methods should be default.
I believe that case-insensitive and accent-insensitive search could be the
default. An exception, at "code" fields, strings should match exactly. That
would be a framework enhancement.

>
> BTW I agree with Albert
> https://bugs.launchpad.net/openobject-server/+bug/553133/comments/11

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

By the way, have you tried the "full text search" of the Koo (KDE OpenObject) client? It uses the Postgres text indexing capabilities (nothing we should use on every field, but useful for partner names), so if you search for "Lopez" it will match both "López" and "Lopez", but also if you search for "Eléctricos" (plural) it will match "eléctrico" (singular)...

Maybe we should allow (from the framework side) to specify that a (text/char) field should be indexed like that (maybe something like "field.char('Name', size=128, human_search=True)"). And the OpenERP clients should be nice enough to use such 'human' search by default for this fields (otherwise the standard exact search should be used).

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote :

A Divendres, 6 d'agost de 2010, Borja López Soilán (Pexego) va escriure:
> By the way, have you tried the "full text search" of the Koo (KDE
> OpenObject) client? It uses the Postgres text indexing capabilities
> (nothing we should use on every field, but useful for partner names), so
> if you search for "Lopez" it will match both "López" and "Lopez", but
> also if you search for "Eléctricos" (plural) it will match "eléctrico"
> (singular)...

Borja, I don't think that's true. This capability is not available in
PostgreSQL yet, though it will be in upcomming 9.0 release.

--
Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partners
Mòbil: +34 669 40 40 18

http://twitter.com/albertnan
http://albert-nan.blogspot.com

Revision history for this message
xrg (xrg) wrote :

On Friday 06 August 2010, you wrote:
> By the way, have you tried the "full text search" of the Koo (KDE
> OpenObject) client? It uses the Postgres text indexing capabilities
> (nothing we should use on every field, but useful for partner names), so
> if you search for "Lopez" it will match both "López" and "Lopez", but
> also if you search for "Eléctricos" (plural) it will match "eléctrico"
> (singular)...

Interesting

> Maybe we should allow (from the framework side) to specify that a
> (text/char) field should be indexed like that (maybe something like
> "field.char('Name', size=128, human_search=True)"). And the OpenERP
> clients should be nice enough to use such 'human' search by default for
> this fields (otherwise the standard exact search should be used).

Yes, I was thinking of a field attribute (like the "human_search" you say),
too.

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote :

A Diumenge, 8 d'agost de 2010, xrg va escriure:
> > Maybe we should allow (from the framework side) to specify that a
> > (text/char) field should be indexed like that (maybe something like
> > "field.char('Name', size=128, human_search=True)"). And the OpenERP
> > clients should be nice enough to use such 'human' search by default for
> > this fields (otherwise the standard exact search should be used).
>
> Yes, I was thinking of a field attribute (like the "human_search" you
> say), too.

I wouldn't personally go that way because I prefer to configure that for each
customer as needs may change, but something similar already exists. It cannot
be used in the source code, but it's possible to configure FTS from the fields
view in the administration menu. There you set, not only if the field has to be
indexed, but also which priority (A, B, C or D) should be used. Using Koo's
massive updates, configuring FTS is just a few minutes.

--
Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partners
Mòbil: +34 669 40 40 18

http://twitter.com/albertnan
http://albert-nan.blogspot.com

Revision history for this message
Borja López Soilán (NeoPolus) (borjals) wrote :

Albert Cervera i Areny - http://www.NaN-tic.com escribió:
> A Divendres, 6 d'agost de 2010, Borja López Soilán (Pexego) va escriure:
>
>> By the way, have you tried the "full text search" of the Koo (KDE
>> OpenObject) client? It uses the Postgres text indexing capabilities
>> (nothing we should use on every field, but useful for partner names), so
>> if you search for "Lopez" it will match both "López" and "Lopez", but
>> also if you search for "Eléctricos" (plural) it will match "eléctrico"
>> (singular)...
>>
>
> Borja, I don't think that's true. This capability is not available in
> PostgreSQL yet, though it will be in upcomming 9.0 release.
>
>
Albert, we are using Postgres 8.4 and it actually works that way for us!
You can search for a plural or singular and it will match both (though
with different scores)!

See the attached image: we search for "tunel" (mistyped way of writing
tunnel in Spanish, as it should be "túnel") and it matched "túneles"
(plural for tunnel in Spanish) too! :D

--
Borja López Soilán
<email address hidden>

Revision history for this message
Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote :

A Dilluns, 9 d'agost de 2010, Borja López Soilán (Pexego) va escriure:
> Albert, we are using Postgres 8.4 and it actually works that way for us!
> You can search for a plural or singular and it will match both (though
> with different scores)!

I think it's spanish version of snowball (the library that looks for the root
of a word) that it's "simulating" the unaccent functionality here. For
example, if you try to write the catalan word "diferència", you won't find it
simply because spanish version of snowball cannot understand the 'è'
character.

Something more robust such as the upcomming 'unaccent' module which won't be
available until postgres 9.0, should be used:

http://www.postgresql.org/docs/9.0/static/unaccent.html

--
Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partners
Mòbil: +34 669 40 40 18

http://twitter.com/albertnan
http://albert-nan.blogspot.com

Changed in openobject-server:
milestone: 5.0.12 → none
assignee: Anup (OpenERP) (ach-openerp) → nobody
Changed in openobject-server:
assignee: nobody → OpenERP's Framework R&D (openerp-dev-framework)
importance: Medium → Wishlist
status: In Progress → Triaged
Revision history for this message
Numérigraphe (numerigraphe) wrote :

Hasn't this been fixed recently?
Lionel.

Revision history for this message
Olivier Dony (Odoo) (odo-openerp) wrote :

On 03/20/2012 09:01 AM, Numérigraphe wrote:
> Hasn't this been fixed recently?

You're right, support for the 'unaccent' module of Postgres was added in 6.1 at revision [1] and can be turned on with the --unaccent startup parameter. Thanks for spotting it!

[1] server rev. 3642 rev-id: <email address hidden>

Changed in openobject-server:
milestone: none → 6.1
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.