Issue372

Title Default search missed matching issue title for "capi"
Priority bug Status chatting
Superseder Nosy List ezio.melotti, loewis, ncoghlan, rmtew, techtonik
Assigned To Topics

Created on 2011-01-30.12:43:46 by ncoghlan, last changed 2012-04-22.05:17:31 by ncoghlan.

Messages
msg1899 (view) Author: ncoghlan Date: 2011-01-30.12:43:45
From http://posted-stuff.blogspot.com/2011/01/bug-reporting-energy-depleted.html

The author of that blog post tried to search for "capi" to see if a particular crash had already been reported. The default search and a search in all fields failed to find it, but a search specifically in the title text found it.

Default search:
http://bugs.python.org/issue?%40columns=id,activity,title,creator,assignee,status,type&%40sort=-activity&%40filter=status&%40action=searchid&ignore=file:content&%40search_text=capi&submit=search&status=-1,1,2,3

All text:
http://bugs.python.org/issue?%40search_text=capi&ignore=file:content&title=&%40columns=title&id=&%40columns=id&stage=&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search

Title only:
http://bugs.python.org/issue?%40search_text=&ignore=file:content&title=capi&%40columns=title&id=&%40columns=id&stage=&creation=&creator=&activity=&%40columns=activity&%40sort=activity&actor=&nosy=&type=&components=&versions=&dependencies=&assignee=&keywords=&priority=&%40group=priority&status=1&%40columns=status&resolution=&nosy_count=&message_count=&%40pagesize=50&%40startwith=0&%40queryname=&%40old-queryname=&%40action=search
msg1900 (view) Author: techtonik Date: 2011-01-31.16:46:40
Google Search for roundup?
msg1901 (view) Author: ezio.melotti Date: 2011-02-01.16:17:35
My guess is that the matching algorithm used in the "title" search is more permissive than the one used for the "All text" search.
For the title search the algorithm seems to be something like:
  for term in title_search_query.split():
    if term in issue_title: # match
(that's why "urllib/httplib header capitalization" is included in the results), 
whereas in the normal search all the text is splitted in order to match whole words only.
The search is handled in http://svn.python.org/view/tracker/roundup-src/roundup/cgi/actions.py?view=markup (in the SearchAction class), but I couldn't find anything useful there.
msg1906 (view) Author: rmtew Date: 2011-02-04.03:15:09
Note also that case sensitivity seems to be getting enforced.  Today, I searched for "UDP" and did not find any matches, then I searched for "udp" and found matches.  This hinders location of relevant matches, and should also be addressed.
msg2482 (view) Author: loewis Date: 2012-04-22.05:00:38
rmtew: search is definitely case insensitive. I cannot reproduce your report; the search for "UDP" yields plenty results.

The issue here is that the search only looks for whole words indeed, with words being defined by r'(?u)\b\w{%d,%d}\b'. As the underscore counts as a word letter, search for "capi" doesn't find "test_capi".

I don't think that having a complete substring search is desirable. However, IMO, _ should not be considered as a letter; alternatively each word should be split around underscores, and these should be added to the words of a property as well.
msg2483 (view) Author: ncoghlan Date: 2012-04-22.05:17:30
+1 for Martin's suggested fix - treating '_' as a word separator when building the word index.

Unfortunately, throwing Google at the problem doesn't return especially useful results, because it doesn't understand the logical structure of Roundup issues and returns direct links to individual comments instead of the issues that they relate to: "https://www.google.com/search?q=site%3Abugs.python.org%20capi#hl=en&sclient=psy-ab&q=site:bugs.python.org+capi+-ext%3Adiff+-ext%3Apatch&oq=site:bugs.python.org+capi+-ext%3Adiff+-ext%3Apatch"
History
Date User Action Args
2012-04-22 05:17:31ncoghlansetmessages: + msg2483
2012-04-22 05:00:38loewissetnosy: + loewis
messages: + msg2482
2011-02-04 03:15:09rmtewsetnosy: + rmtew
messages: + msg1906
2011-02-01 16:17:35ezio.melottisetpriority: urgent -> bug
status: unread -> chatting
messages: + msg1901
nosy: + ezio.melotti
2011-01-31 17:09:13techtoniksetstatus: chatting -> unread
2011-01-31 16:46:40techtoniksetstatus: unread -> chatting
messages: + msg1900
2011-01-31 16:46:06techtoniksetnosy: + techtonik
2011-01-30 12:43:46ncoghlancreate