Documentation for the Textpattern plugin smd_fuzzy_find by Stef Dawson follows this short message from our sponsor ;-)
If you like my code and it soothes an otherwise unscratchable itch, feel free to show your appreciation with something from my UK Amazon wish list (or US) or donate to the Stef Dawson community coding pot, either via paypal.me/stefdawson by following the Donate button below to PayPal. Thanks!
Ever wanted TXP’s search facility to be a little more, well, inaccurate? Help people with fat fingers to find what they were actually looking for with this tool that can find mis-spelled or like-sounding words from search results.
Results are ordered by approximate relevancy and it’s automatically context-sensitive because the pool of words it compares against are from your site. On a zoo website, someone typing in “lino” won’t get articles about flooring but more likely articles about lions, which is what they really wanted. We hope.
- Search for similarly spelled, or similar sounding words (may be switched on/off)
- Limit the search to article
categoryto speed up proceedings
- Tweak sensitivity to give better results (more specific, less likely to find a match) or general results (less specific, may match stuff you don’t expect)
- Can offer links to exact search terms if you wish (a bit like Google’s “Did you mean …”)
- Display matching articles using a form/container. Default is the built-in
- Unless overridden using the
match_withattribute, the Title and Body will be searched. Alternatively, if you have set the search locations using wet_haystack then those places will be searched instead
Installation / Uninstallation
Requires Textpattern 4.0.7 and smd_lib v0.33 must be installed and activated.
Download the plugin from either textpattern.org, or the software page, paste the code into the TXP Admin -> Plugins pane, install and enable the plugin. Visit the forum thread for more info or to report on the success or otherwise of the plugin.
To uninstall, simply delete from the Admin -> Plugins page.
The plugin is not a replacement for the built-in TXP search; it should be used to augment it, like this:
<txp:if_search> <dl class="results"> <txp:chh_if_data> <txp:article limit="8" searchform="excerpts" /> <txp:else /> <txp:smd_fuzzy_find form="excerpts" /> </txp:chh_if_data> </dl> </txp:if_search>
Exact matches will be processed as normal but mismatches will be handled by smd_fuzzy_find. If you try to use smd_fuzzy_find on its own, you will likely receive a warning about a missing
<txp:article /> tag.
|search_term|| || ||You may use a fixed string here but it’s rather pointless|
|match_with|| || || Which article fields you would like to match. Define the object you want to look in (currently only |
|show|| || ||Whether to list the closest matching articles, the closest matching search terms, or both|
|section||unset (i.e. search the whole site)||any valid section containing articles|| Limit the search to one or more sections; give a comma-separated list. You can use |
|category||unset (i.e. search all categories)||any valid article category|| Limit the search to one or more categories; give a comma-separated list. You can use |
|sublevel (formerly subcats)|| || integer / || Number of subcategory levels to traverse. |
|status|| || ||Restricts the search to particular types of document status|
|tolerance|| ||0 – 5||How fuzzy the search is and how long the minimum search term is allowed to be. 0 means a very close match, that allows short search words. 5 means it’s quite relaxed and is likelt return nothing like what you searched for; search words must then be longer, roughly >7 characters. Practical values are 0-3|
|refine|| || ||Switch on soundex and/or metaphone support for potentially better matching (though it’s usually only of use in English)|
|case_sensitive|| || ||Does what it says on the tin|
|min_word_length|| ||integer||The minimum word length allowed in the search results|
|limit|| || integer / || The maximum number of words and/or articles to display in the results. Use a single integer to limit both to the same value. Specify |
|form|| ||any valid form name|| The TXP form with which to process each matching article. You may also use the plugin as a container. Note that |
|delim|| ||any characters||Change the delimiter for all options that take a comma-separated list|
|no_match_label|| MLP: ||any text or MLP string|| The phrase to display when no matches (maybe not even fuzzy ones) are found. Use |
|suggest_label|| MLP: ||any text or MLP string|| The phrase to display immediately prior to showing close-matching articles/words. Use |
|too_short_label|| MLP: ||any text or MLP string|| Searches of under about 3 characters (sometimes more depending on your content) are too short for any reasonable fuzziness to be applied. This message is displayed in that circumstance. Use |
|labeltag|| ||any valid tag name, without its brackets||The (X)HTML tag in which to wrap any labels|
Tips and tricks
- If you can, limit the search criteria using
categoryto improve performance
- Tweak the
refineoptions to see if you get better or worse results for your content / language
- Most of the default values are optimal for good results, but for scientific or specialist sites you may wish to increase the
min_word_lengthto avoid false positives
- For offering an advanced search facility, write an HTML form that allows people to customise the search criteria (e.g. check boxes for
soundex / case_sensitive; text boxes for
limit; select lists of categories to search ; etc). Then use a series of smd_if statements inside your
<txp:if_search />to check for the existence of each URL variable, check they have acceptable values and then plug them into the smd_fuzzy_find tag using replacements such as
- Slow with large article sets
- Searching for a word with an apostrophe in it may cause odd character encoding or incomplete results
- Searching for multiple (space-separated) words can lead to odd results
- Sometimes it makes you laugh and picks something that seems totally unrelated
This plugin wouldn’t have existed without the original Fuzzy Find algorithm by Jarno Elonen, as noted above. All kudos goes in that direction. Also, extended thanks to the beta testers, especially Els Lepelaars for feedback and unending patience during development.
If you’d rather frolic in the raw code halls, you’ll need to step into the view source page.
If, for some inexplicable reason, you need a prior version of a plugin, it can probably be found on the plugin archive page.
If you’re feeling brave, or fancy swimming with piranhas, you can test out some of my beta code. It can be found on the plugin beta page.