Odoo Server (MOVED TO GITHUB)

Search with accents

Bug #553133 reported by Frédéric (Ferme du Sart) on 2010-04-01

This bug affects 5 people

Affects		Status	Importance	Assigned to	Milestone
	Odoo Server (MOVED TO GITHUB)	Fix Released	Wishlist	OpenERP's Framework R&D	Odoo Server (MOVED TO GITHUB) 6.1

Bug Description

Scenario:
- All users using french language

1. Searching for a resource with translation
- Creation of a product named "test A" (product.template.name = ir.translation.value = "test A")
- Writing the name as "test B" (ir.translation.value = "test B")
- Search for products those name contains 'A'
=> returns "test B" (SELECT name FROM product template WHERE name ILIKE '%A%')

2. Searching for a resource conaining accents
- Trying to find a product named "Bouchée à la reine"
- Search "Bouchee a la reine" returns nothing

=> Both BUGs fixed with a patch. Please patch osv/expression.py as soon as possible

Revision history for this message

Frédéric (Ferme du Sart) (frederic-declercq) wrote on 2010-04-01:

expression.diff Edit (2.0 KiB, text/plain)

Revision history for this message

Frédéric (Ferme du Sart) (frederic-declercq) wrote on 2010-04-01:

I press twice on submit:

Bug #553133 = Bug #553135

Please drop one

Revision history for this message

Christophe CHAUVET (christophe-chauvet) wrote on 2010-04-01:

Your patch only work for european people, and not a global solution, i have test many solution, but none are satisfy me.

This patch may create a regression on asian countries, don't apply it

Regards,

Revision history for this message

Olivier Dony (Odoo) (odo-openerp) wrote on 2010-04-01:

To me item 1. seems like a feature, not a bug... (I like to be able to find Product A by searching for 'B' *or* 'A' in this case?

Proposed patch for item 2. seem quite brutal and not very clean. Can't we find a solution based on postgres' indexing features?

Anyway we can't change anything about this in stable, but we can discuss this for trunk.

Jay Vora (Serpent Consulting Services) (jayvora) on 2010-04-01

Changed in openobject-server:
assignee:	nobody → Anup (Open ERP) (ach-openerp)
milestone:	none → 5.0.9

Revision history for this message

Frédéric (Ferme du Sart) (frederic-declercq) wrote on 2010-04-01:

Thanks, Christophe for the comment.

Somethings has to be done. This patch should be revisited by OpenERP's Team to be usefull for international.

Olivier: no
2 scenarios lived as examples:

1.
An employee has to create many products.
He creates twice the same (or nearly).
When he sees his error, overwrites the name.
The translation has changed, but not product_template.name.
He cannot understand to see it anytime without SEEING the reason.

2.
The employees has created the good product, but in use, sees that his product name is too large and not productive for any report (labels, reports, tags...)
He decides to overwrite it in a shorter way.
Any request containing a part of the ancient name will return this product as long as this product exists.

He trys to avoid this by duplicating the product, thinking that restarting from zero will avoid the error. But the error is duplicated too !

In most cases, societies don't need internationalization, and modifying a traduction is not instinctive for them.

Revision history for this message

Numérigraphe (numerigraphe) wrote on 2010-04-01:

"In most cases, societies don't need internationalization, and modifying a traduction is not instinctive for them."
I totally agree with this.
This seems out of the scope of your initial bug report. Maybe you'll want to subscribe to bug 400256 and give your opinion.

Concerning the issue of searches with accents, I agree it's needed on the trunk, but not to be pushed to 5.0 - though it would be nice to have the final patch posted for 5.0 here for those interested.
Lionel

Changed in openobject-server:
status:	New → Confirmed

Revision history for this message

Christophe CHAUVET (christophe-chauvet) wrote on 2010-04-01:

It's a PostgreSQL issue, not OpenERP one

Regards,

Revision history for this message

Frédéric (Ferme du Sart) (frederic-declercq) wrote on 2010-04-01:

Christophe> It's a PostgreSQL issue, not OpenERP one

??? PostgreSQL desn't have to be user-friendly. An ERP, by definition, has to be user-friendly, even if the technology choosen has unresolved problems.

Searching omitting accent is a normal need.

Please stop answering with that's not an ERP bug, when the need is normal, and resolved for all other good competitors.

My proposition isn't the best one, because, as you say, it could be a drama for asian countries.
Anyway, we don't have to be used to get results different from search criterias, or having to know if products have been created with or without accents.

Revision history for this message

Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote on 2010-04-01: Re: [Bug 553133] Re: Search with accents

Next version of PostgreSQL which will be 9.0 will include an unaccent module
which will include the "unaccent()" function that can be used for this
purpose:

http://developer.postgresql.org/pgdocs/postgres/unaccent.html#AEN123210

I would not reinvent the wheel and wait for next PostgreSQL release.

A Dijous, 1 d'abril de 2010, Frederic D. va escriure:
> Christophe> It's a PostgreSQL issue, not OpenERP one
>
> ??? PostgreSQL desn't have to be user-friendly. An ERP, by definition,
> has to be user-friendly, even if the technology choosen has unresolved
> problems.
>
> Searching omitting accent is a normal need.
>
> Please stop answering with that's not an ERP bug, when the need is
> normal, and resolved for all other good competitors.
>
> My proposition isn't the best one, because, as you say, it could be a drama
> for asian countries. Anyway, we don't have to be used to get results
> different from search criterias, or having to know if products have been
> created with or without accents.
>

Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partner
Mòbil: +34 669 40 40 18

Revision history for this message

Cloves Almeida (cjalmeida) wrote on 2010-04-01:

#10

For me, field content translation per se is a bug ;)

About non accented search, a more elegant solution would be to provide a module and overload search method when querying the specific columns. This way you make it more flexible and don't "force" the feature on unwanted audiences.

You could even use Whoosh for better full-text search.

Revision history for this message

Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote on 2010-04-01:

#11

A Dijous, 1 d'abril de 2010, Cloves Almeida va escriure:
> For me, field content translation per se is a bug ;)
>
> About non accented search, a more elegant solution would be to provide a
> module and overload search method when querying the specific columns.
> This way you make it more flexible and don't "force" the feature on
> unwanted audiences.
>
> You could even use Whoosh for better full-text search.
>

Koo already provides full text search functionalities using Postgres' own Full
Text Search capabilities which are already integrated and quite powerful.

Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partner
Mòbil: +34 669 40 40 18

Revision history for this message

Ferdinand (office-chricar) wrote on 2010-04-01:

#12

there is definitively an issue with copying resources (products) IMHO for all other languages than English - as the user does not realize the he has to correct manually both English and his language.
Correcting only one gives realy much headache for those accessing the data in another language

we have discussed this in the usability group - see also
https://blueprints.launchpad.net/openobject-server/+spec/translation-update

Revision history for this message

Olivier Dony (Odoo) (odo-openerp) wrote on 2010-04-02:

#13

Ferdinand @ ChriCar wrote:
> there is definitively an issue with copying resources (products) IMHO for all other languages than English

Yes, and this is the real issue with item 1 I think. If a field is translated and people don't use it the translation, we should provide an unobtrusive solution to it, rather than crippling the search feature, which could be quite useful when people *do* use the translations. That's what I meant in my first reply.

Frederic, one cleaner workaround if you don't use the translations is to make a module that removes the "translate" flag on the product.template.name, or automatically synchronizes the translations when you change one of them.
But still there is the general useability issue with copying objects with translated fields, so I guess we need a global solution to make this intuitive.

In any case, this must be discussed for trunk and not for stable (and the same goes for item 2)

Revision history for this message

Frédéric (Ferme du Sart) (frederic-declercq) wrote on 2010-04-02:

#14

Thanks to all for you replys and advices.

Agree with Olivier:
- forcing fields.char and fields.text to translate=False into a module can resolve all translations problems when we don't need id.

Then I'll can wait a more standard solution (PostgreSQL or OpenERP).

Revision history for this message

Numérigraphe (numerigraphe) wrote on 2010-04-02:

#15

Frederic, I agree that it should be possible to turn "off" translations (maybe with a server command-line option), or to change the main locale to that of the company.
Please, I beg you do join in on bug 400256 - I'm having a hard time having Tiny to accept this.

I humbly suggest we concentrate this bug report on the issue of searching with accented characters.
Lionel

Revision history for this message

xrg (xrg) wrote on 2010-04-03:

#16

On Thursday 01 April 2010, you wrote:
> It's a PostgreSQL issue, not OpenERP one
>
Totally agree. The pg team has already admitted this missing feature from the
db (it has to do with the libs used by pg 8.x for international conversions).

Any attempt to do that in python will just be an ugly, sub-optimal hack. That
is, when we search in the db, there can be no python intervention in the
matching (ilike op) loop. And, of course, partially solving a few iso8859-1
characters is useless for other languages.

Stephane Wirtel (OpenERP) (stephane-openerp) on 2010-04-06

Changed in openobject-server:
milestone:	5.0.9 → 5.0.10

Anup(SerpentCS) (anup-serpent) on 2010-04-16

Changed in openobject-server:
status:	Confirmed → In Progress

Revision history for this message

Anup(SerpentCS) (anup-serpent) wrote on 2010-04-19:

#17

Hello Everyone,

For the first problem we have solution similar to the patch suggested by Frederic. Its a generalized solution and will not be harmful for any language.
For the 2nd its not possible to do so as there is no way one can convert that " é = e" This is just for French if some other non english character is there one can not search the english alphabets directly converted to particular language. so i suggest 2nd thing is not a problem at all.

You'll get the results according to the letters which are matching the to the translations in the ir_translation and also the records which do not have translations and match the letters in the particular model.

Here I have attached a clearer patch. Would you please check it and notify?

Thanks.

Revision history for this message

Anup(SerpentCS) (anup-serpent) wrote on 2010-04-19:

#18

accent_expression_patch.diff Edit (1.6 KiB, text/plain)

Jay Vora (Serpent Consulting Services) (jayvora) on 2010-04-30

Changed in openobject-server:
importance:	Undecided → Medium

Revision history for this message

xrg (xrg) wrote on 2010-04-30:

#19

On Friday 30 April 2010, you wrote:
> ** Changed in: openobject-server
> Importance: Undecided => Medium
>
>Bug description:
>Scenario:
> - All users using french language

Scenario 2:
- Some user uses another, but the French, language...

This is where this patch only adds some ugly overhead.

Stephane Wirtel (OpenERP) (stephane-openerp) on 2010-05-05

Changed in openobject-server:
milestone:	5.0.10 → 5.0.11

Revision history for this message

Jay Vora (Serpent Consulting Services) (jayvora) wrote on 2010-05-06:

#20

search_parsing_improved.patch Edit (5.9 KiB, text/plain)

Hello Panos,

Would you please check with this patch?

If there is a non-English preference set,it will always search for translated value.

And for accented characaters, its working alright.

Thanks.

Stephane Wirtel (OpenERP) (stephane-openerp) on 2010-06-08

Changed in openobject-server:
milestone:	5.0.11 → 5.0.12

Revision history for this message

xrg (xrg) wrote on 2010-08-05:

#21

Let's see what behaviour we consider ideal:

say, we have a table tbl1(id=4, 'Belgium), and translations ir_translation(res_id=4, lang='fr_FR', value='Belgique'), (res_id=4, lang='nl_NL', value='België')

We would want that search( ['name' = 'Belgique'], context.lang='fr_FR') == True
                                search( ['name' = 'Belgium'], context.lang='fr_FR') == False # !
                                search( ['name' = 'België'], context.lang='fr_FR') == False # no cross-lang
                                search( ['name' = 'België'], context.lang='nl_NL') == True
                                search( ['name' = 'Belgie'], context.lang='nl_NL') == True # ignore accent
                                search(['name' = 'Belgium'], context.lang='el_GR') == True # no translation, so default

Revision history for this message

Ferdinand (office-chricar) wrote on 2010-08-05:

#22

we just had the case today to convert utf-8 to ascii and found this

http://fraggod.net/oss/projects/unicode2ascii.py

IMHO a possible search workflow would be
* search exact matching - default
* search fuzzy matching - extra button in the search window (using postgres features oder unicode2ascii.py)

and/or an attributed on database/company/group/user level which of the methods should be default.

BTW I agree with Albert
https://bugs.launchpad.net/openobject-server/+bug/553133/comments/11

Revision history for this message

Ferdinand (office-chricar) wrote on 2010-08-06:

#23

another reason fur the need of fuzzy matching in language like French
* the keys for certain letters like ¢, æ,Ç are not easily available on the keyboard of the user (international companies, cross language support)
* not native speakers might not know the correct spelling - of course this goes beyond just replacing single characters but using more sophisticated phonetically matching.

Revision history for this message

xrg (xrg) wrote on 2010-08-06:

#24

On Thursday 05 August 2010, you wrote:
> we just had the case today to convert utf-8 to ascii and found this
>
> http://fraggod.net/oss/projects/unicode2ascii.py

That's still *only* for latin letters. Here, we have a huge problem with non-
matching Greek accented ones. And, to make things worse, people don't input
the accents right most of the time. So, our search is broken.

> IMHO a possible search workflow would be
> * search exact matching - default
> * search fuzzy matching - extra button in the search window (using postgres
> features oder unicode2ascii.py)
> and/or an attributed on database/company/group/user level which of the
> methods should be default.
I believe that case-insensitive and accent-insensitive search could be the
default. An exception, at "code" fields, strings should match exactly. That
would be a framework enhancement.

>
> BTW I agree with Albert
> https://bugs.launchpad.net/openobject-server/+bug/553133/comments/11

Revision history for this message

Borja López Soilán (NeoPolus) (borjals) wrote on 2010-08-06:

#25

By the way, have you tried the "full text search" of the Koo (KDE OpenObject) client? It uses the Postgres text indexing capabilities (nothing we should use on every field, but useful for partner names), so if you search for "Lopez" it will match both "López" and "Lopez", but also if you search for "Eléctricos" (plural) it will match "eléctrico" (singular)...

Maybe we should allow (from the framework side) to specify that a (text/char) field should be indexed like that (maybe something like "field.char('Name', size=128, human_search=True)"). And the OpenERP clients should be nice enough to use such 'human' search by default for this fields (otherwise the standard exact search should be used).

Revision history for this message

Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote on 2010-08-07:

#26

A Divendres, 6 d'agost de 2010, Borja López Soilán (Pexego) va escriure:
> By the way, have you tried the "full text search" of the Koo (KDE
> OpenObject) client? It uses the Postgres text indexing capabilities
> (nothing we should use on every field, but useful for partner names), so
> if you search for "Lopez" it will match both "López" and "Lopez", but
> also if you search for "Eléctricos" (plural) it will match "eléctrico"
> (singular)...

Borja, I don't think that's true. This capability is not available in
PostgreSQL yet, though it will be in upcomming 9.0 release.

--
Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partners
Mòbil: +34 669 40 40 18

http://twitter.com/albertnan
http://albert-nan.blogspot.com

Revision history for this message

xrg (xrg) wrote on 2010-08-08:

#27

On Friday 06 August 2010, you wrote:
> By the way, have you tried the "full text search" of the Koo (KDE
> OpenObject) client? It uses the Postgres text indexing capabilities
> (nothing we should use on every field, but useful for partner names), so
> if you search for "Lopez" it will match both "López" and "Lopez", but
> also if you search for "Eléctricos" (plural) it will match "eléctrico"
> (singular)...

Interesting

> Maybe we should allow (from the framework side) to specify that a
> (text/char) field should be indexed like that (maybe something like
> "field.char('Name', size=128, human_search=True)"). And the OpenERP
> clients should be nice enough to use such 'human' search by default for
> this fields (otherwise the standard exact search should be used).

Yes, I was thinking of a field attribute (like the "human_search" you say),
too.

Revision history for this message

Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote on 2010-08-08:

#28

A Diumenge, 8 d'agost de 2010, xrg va escriure:
> > Maybe we should allow (from the framework side) to specify that a
> > (text/char) field should be indexed like that (maybe something like
> > "field.char('Name', size=128, human_search=True)"). And the OpenERP
> > clients should be nice enough to use such 'human' search by default for
> > this fields (otherwise the standard exact search should be used).
>
> Yes, I was thinking of a field attribute (like the "human_search" you
> say), too.

I wouldn't personally go that way because I prefer to configure that for each
customer as needs may change, but something similar already exists. It cannot
be used in the source code, but it's possible to configure FTS from the fields
view in the administration menu. There you set, not only if the field has to be
indexed, but also which priority (A, B, C or D) should be used. Using Koo's
massive updates, configuring FTS is just a few minutes.

--
Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partners
Mòbil: +34 669 40 40 18

http://twitter.com/albertnan
http://albert-nan.blogspot.com

Revision history for this message

Borja López Soilán (NeoPolus) (borjals) wrote on 2010-08-09:

#29

Full text search human search example.png Edit (25.7 KiB, image/png; name="Full text search human search example.png")

Albert Cervera i Areny - http://www.NaN-tic.com escribió:
> A Divendres, 6 d'agost de 2010, Borja López Soilán (Pexego) va escriure:
>
>> By the way, have you tried the "full text search" of the Koo (KDE
>> OpenObject) client? It uses the Postgres text indexing capabilities
>> (nothing we should use on every field, but useful for partner names), so
>> if you search for "Lopez" it will match both "López" and "Lopez", but
>> also if you search for "Eléctricos" (plural) it will match "eléctrico"
>> (singular)...
>>
>
> Borja, I don't think that's true. This capability is not available in
> PostgreSQL yet, though it will be in upcomming 9.0 release.
>
>
Albert, we are using Postgres 8.4 and it actually works that way for us!
You can search for a plural or singular and it will match both (though
with different scores)!

See the attached image: we search for "tunel" (mistyped way of writing
tunnel in Spanish, as it should be "túnel") and it matched "túneles"
(plural for tunnel in Spanish) too! :D

--
Borja López Soilán
<email address hidden>

Revision history for this message

Albert Cervera i Areny - http://www.NaN-tic.com (albert-nan) wrote on 2010-08-09:

#30

A Dilluns, 9 d'agost de 2010, Borja López Soilán (Pexego) va escriure:
> Albert, we are using Postgres 8.4 and it actually works that way for us!
> You can search for a plural or singular and it will match both (though
> with different scores)!

I think it's spanish version of snowball (the library that looks for the root
of a word) that it's "simulating" the unaccent functionality here. For
example, if you try to write the catalan word "diferència", you won't find it
simply because spanish version of snowball cannot understand the 'è'
character.

Something more robust such as the upcomming 'unaccent' module which won't be
available until postgres 9.0, should be used:

http://www.postgresql.org/docs/9.0/static/unaccent.html

--
Albert Cervera i Areny
http://www.NaN-tic.com
OpenERP Partners
Mòbil: +34 669 40 40 18

http://twitter.com/albertnan
http://albert-nan.blogspot.com

Anup(SerpentCS) (anup-serpent) on 2010-12-31

Changed in openobject-server:
milestone:	5.0.12 → none
assignee:	Anup (OpenERP) (ach-openerp) → nobody

Vinay Rana (OpenERP) (vra-openerp) on 2011-01-04

Changed in openobject-server:
assignee:	nobody → OpenERP's Framework R&D (openerp-dev-framework)
importance:	Medium → Wishlist
status:	In Progress → Triaged

Revision history for this message

Numérigraphe (numerigraphe) wrote on 2012-03-20:

#31

Hasn't this been fixed recently?
Lionel.

Revision history for this message

Olivier Dony (Odoo) (odo-openerp) wrote on 2012-03-20:

#32

On 03/20/2012 09:01 AM, Numérigraphe wrote:
> Hasn't this been fixed recently?

You're right, support for the 'unaccent' module of Postgres was added in 6.1 at revision [1] and can be turned on with the --unaccent startup parameter. Thanks for spotting it!

[1] server rev. 3642 rev-id: <email address hidden>