Evergreen

ISBN searching - mixed results..

Bug #833045 reported by George Duimovich on 2011-08-24

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Evergreen	Confirmed	Wishlist	Unassigned

Bug Description

EG 2.0.8
OpenSRF 2.0.1

For our bib record id: 7798179 we have ISBN:
020 . ‡a0-8412-1632-0

We can find this ISBN successfully these ways:

0-84121632-0 works
0841216320 works

But not these ways:
0-8412 1632-0
0 8412 1632 0

It's not uncommon for the space delimited format to appear on book jackets without dashes: "0 8412 1632 0" (and of course for mixed data entry practices by both cataloguers and searchers - dash vs. no dash, etc.).

And don't get me started on those text strings that sometimes follow ISBN's in our data. But these ISBN's seem to be perfectly findable.
Eg.
<datafield tag="020" ind1=" " ind2=" "><subfield code="a">0803109288 (soft)</subfield>
<datafield tag="020" ind1=" " ind2=" "><subfield code="a">3540656049 (softcover : alk. paper)</subfield>

Tags:

Revision history for this message

Mike Rylander (mrylander) wrote on 2011-08-24:

Does adding a "remove spaces" normalizer, in addition to the "remove dashes" one, help? If so, it's just configuration ... which might be worth adding to the stock data.

Revision history for this message

George Duimovich (george-duimovich) wrote on 2011-08-24:

re: normalizer - I would think so.
FWIW, robot librarian had some normalizer code posted somewhere (as well as a good read on isbn data http://robotlibrarian.billdueber.com/?s=isbn)...

Revision history for this message

Dan Scott (denials) wrote on 2011-08-25: Re: [Bug 833045] [NEW] ISBN searching - mixed results..

Wow. I have not run into ISBNS with spaces before - either in MARC
records, or in search queries. I guess it's possible. But as George
suggests, simply adding a "remove space" normalizer would screw up the
"### (pbk.)" values in the MARC records. (If people enter "####
(pbk.)" into a search query then they deserve what they get).

I suppose we could add a 3rd normalizer, which, after all spaces and
hyphens have been removed, would then try the following matches in
order and take the first successful match:

a. 13 digits
b. 12 digits followed by an X
c. 10 digits
d. 10 digits followed by an X

*sigh*

Revision history for this message

George Duimovich (george-duimovich) wrote on 2011-08-25:

I guess at some point a line has to be drawn as to how much "bad data" can be anticipated / accommodated for versus library shops just fixing their data.

Here's another example from just poking around a bit more. I found over 5400 MARC 020's with data in this format (i.e. trailing colon), that might (?) present a problem with remove spaces:

This ISBN is perfectly findable in EG right now, but definitely a good / easy target for cleanup I think.... But wait, and Grrr - looking at sample records, it's clear that many/most cases the embedded ":" are there for display purposes!

020 . ‡a1551050420 : ‡c24.95

But the big data boss in the sky doesn't like that, so he commands that we change the standard, even if the standard won't change itself. That colon, IMHO, should be moved to display only in our shop IMHO, so if |c present, add the colon, etc.

Also, only a small number of our ISBN's have spaces instead of dashs FWIW.

thx

Revision history for this message

Jason Stephenson (jstephenson) wrote on 2012-07-18:

I am setting this to incomplete because I am not certain if this is a bug report at this point.

Changed in evergreen:
status:	New → Incomplete

Revision history for this message

Jane Sandberg (sandbergja) wrote on 2018-05-19:

I moved this to Wishlist, since it's a feature request.

Changed in evergreen:
status:	Incomplete → Confirmed
importance:	Undecided → Wishlist
tags:	added: search

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.