Implicit ANDs should have higher precedence than explicit ORs

Bug #1040740 reported by Jared Camins-Esakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Evergreen
Triaged
Wishlist
Unassigned

Bug Description

When the implicit ANDs added by QueryParser between terms are combined
with explicit ORs without any explicit grouping, the results can be
unexpected and undesirable. For example, the following query:

  harry potter and the chamber of secrets || sorcerer's stone

Is translated into a query with two branches:
1. harry potter and the chamber of secrets stone
2. sorcerer's stone

This is of course nothing like what the user was expecting, which were
the following two branches:
1. harry potter and the chamber of secrets
2. sorcerer's stone

(Note: of course the user probably wanted to search for "harry potter
and the chamber of secrets" or "harry potter and the sorcerer's stone"
but we have to draw the line on implicit grouping somewhere, and where
it requires reading minds seems like a good place)

A patch modifying the QueryParser can be found in working/user/jcamins/queryparser. Unfortunately, this patch causes problems with the Pg QueryParser driver, returning fewer results before the patch than after for the search "code || law".

tags: added: wishlist
Revision history for this message
Jared Camins-Esakov (jcamins) wrote :
Download full text (6.3 KiB)

With the patch, the driver generates the following SQL:

2012-08-21 16:01:36 EDT LOG: statement: SELECT * -- bib search: #CD_documentLength #CD_meanHarmonic #CD_uniqueWords keyword: code || law depth(0) estimation_strategy(inclusion) limit(1000) core_limit(10000)
           FROM search.query_parser_fts(
                     1::INT,
                     0::INT,
                     $core_query_20111$
 WITH x9411690_keyword_xq AS (SELECT
       to_tsquery('keyword', COALESCE(NULLIF( '(' || btrim(regexp_replace(search_normalize(split_date_range($_20111$code$_20111$)),E'(?:\\s+|:)','&','g'),'&|') || ')', '()'), '')) AS tsq ), x941bc30_keyword_xq AS (SELECT
       to_tsquery('keyword', COALESCE(NULLIF( '(' || btrim(regexp_replace(search_normalize(split_date_range($_20111$law$_20111$)),E'(?:\\s+|:)','&','g'),'&|') || ')', '()'), '')) AS tsq )
 SELECT m.source AS id,
         ARRAY[m.source] AS records,
         1.0/((AVG(
     (COALESCE(ts_rank_cd(x9411690_keyword.index_vector, x9411690_keyword.tsq, 14) * x9411690_keyword.weight, 0.0)
          * /* word_order */ COALESCE(NULLIF( (search_normalize(x9411690_keyword.value) ~ (search_normalize($_20111$code$_20111$))), FALSE )::INT * 10, 1))+
     (COALESCE(ts_rank_cd(x941bc30_keyword.index_vector, x941bc30_keyword.tsq, 0) * x941bc30_keyword.weight, 0.0)
          * /* word_order */ COALESCE(NULLIF( (search_normalize(x941bc30_keyword.value) ~ (search_normalize($_20111$law$_20111$))), FALSE )::INT * 10, 1))
   )+1 * COALESCE( NULLIF( FIRST(mrd.attrs @> hstore('item_lang', $_20111$eng$_20111$)), FALSE )::INT * 5, 1)))::NUMERIC AS rel,
         1.0/((AVG(
     (COALESCE(ts_rank_cd(x9411690_keyword.index_vector, x9411690_keyword.tsq, 14) * x9411690_keyword.weight, 0.0)
          * /* word_order */ COALESCE(NULLIF( (search_normalize(x9411690_keyword.value) ~ (search_normalize($_20111$code$_20111$))), FALSE )::INT * 10, 1))+
     (COALESCE(ts_rank_cd(x941bc30_keyword.index_vector, x941bc30_keyword.tsq, 0) * x941bc30_keyword.weight, 0.0)
          * /* word_order */ COALESCE(NULLIF( (search_normalize(x941bc30_keyword.value) ~ (search_normalize($_20111$law$_20111$))), FALSE )::INT * 10, 1))
   )+1 * COALESCE( NULLIF( FIRST(mrd.attrs @> hstore('item_lang', $_20111$eng$_20111$)), FALSE )::INT * 5, 1)))::NUMERIC AS rank,
         FIRST(mrd.attrs->'date1') AS tie_break
   FROM metabib.metarecord_source_map m
   LEFT JOIN (
     SELECT fe.*, fe_weight.weight, x9411690_keyword_xq.tsq /* search */
       FROM metabib.keyword_field_entry AS fe
       JOIN config.metabib_field AS fe_weight ON (fe_weight.id = fe.field)
       JOIN x9411690_keyword_xq ON (fe.index_vector @@ x9411690_keyword_xq.tsq)
   ) AS x9411690_keyword ON (m.source = x9411690_keyword.source)
   LEFT JOIN (
     SELECT fe.*, fe_weight.weight, x941bc30_keyword_xq.tsq /* search */
       FROM metabib.keyword_field_entry AS fe
       JOIN config.metabib_field AS fe_weight ON (fe_weight.id = fe.field)
       JOIN x941bc30_keyword_xq ON (fe.index_vector @@ x941bc30_keyword_xq.tsq)
   ) AS x941bc30_keyword ON (m.source = x941bc30_keyword.source)
         INNER JOIN metabib.record_attr mrd ON (m.source = mrd.id AND ((x941bc30_keywo...

Read more...

Changed in evergreen:
importance: Undecided → Wishlist
Steven Chan (schan2)
summary: - Implicit and should have higher precedence than explicit or
+ Implicit ANDs should have higher precedence than explicit ORs
Revision history for this message
Mike Rylander (mrylander) wrote :

I've been working on a branch that incorporates the most important bits of Jared's work on the core and extends it further. Specifically, in addition to properly supporting explicit and implicit bool precedence as stated above, it provide stable canonicalization and a mechanism for floating query components to the top level of the query while keeping them syntactically distinct. This last is primarily meant to allow a single query string to have separate user- and system-supplied components, particularly of the sort that Evergreen uses for per-query configuration information.

http://git.evergreen-ils.org/?p=working/Evergreen.git;a=shortlog;h=refs/heads/collab/miker/QP-bool-pushdown

Changed in evergreen:
status: New → Triaged
Revision history for this message
Terran McCanna (tmccanna) wrote :

Looks like this one was never tested - might be worth dusting off.

tags: added: search
tags: added: needsrepatch
tags: removed: wishlist
tags: added: needsrebse
removed: needsrepatch
tags: added: needsrebase
removed: needsrebse
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.