Finding nicknames and diminutives using PG text search

by Jacob   Last Updated August 12, 2018 10:06 AM

I'm trying to create a text search dictionary to match English given names. e.g. "Bob" and "Robert"

It looks like I need to use the dict_xsyn extension because it will match both ways?

I've created a new dictionary:

ALTER TEXT SEARCH DICTIONARY xsyn (RULES='given_name', KEEPORIG=true, MATCHORIG=true, KEEPSYNONYMS=true, MATCHSYNONYMS=true);

And filled the given_name.rules file:

bob rob robert bobby robby
bill billy will willy william
dick rich richard ricardo rick

At this point ts_lexize seems to indicate this all worked:

SELECT ts_lexize('xsyn', 'robert');
-- returns: {bob,rob,robert,bobby,robby}

Next it looks like I probably need a custom configuration if I don't want the regular dictionaries interfering:

CREATE TEXT SEARCH CONFIGURATION english_names (COPY = simple)

ALTER TEXT SEARCH CONFIGURATION english_names ALTER MAPPING FOR asciiword, asciihword, hword_asciipart WITH xsyn;

SELECT * FROM ts_debug('english_names', 'robert');
-- returns: asciiword   Word, all ASCII robert  {xsyn}  xsyn    {bob,rob,robert,bobby,robby}

Now I seem to be able to search using

SELECT first_name FROM contact WHERE  to_tsvector('english_names', first_name) @@ to_tsquery('robert') limit 100

So my question is. Is this the best way to do this? Should I be doing anything else besides caching the ts_vector of the first_name in a separate column? Am I doing anything unnecessary?



Related Questions


Performing search on a column by query

Updated May 22, 2015 22:02 PM

Fuzzy Matching with Postgresql 9.3

Updated April 06, 2017 14:06 PM