Soundex
来自 PostgreSQL 维基
跳转到导航跳转到搜索
Soundex 是一种简单的算法,用于根据语音相似性比较/规范化英语单词。
PostgreSQL 8.3 及更高版本在 fuzzystrmatch 扩展 中包含一个 soundex(text) 函数。请注意,此函数的行为与其他一些实现不同,因为它不会跳过由 'w' 和 'h' 分隔的类似辅音。
PL/pgSQL 版本
如果您无法安装 fuzzystrmatch 扩展(由于权限有限或其他原因),这里有一个 PL/pgSQL 中的替代版本。此函数的行为与 fuzzystrmatch 实现完全相同。(在 PostgreSQL 9.0 和 9.1 上测试,使用多种不同语言的词典)。
CREATE OR REPLACE FUNCTION soundex(input text) RETURNS text
IMMUTABLE STRICT COST 500 LANGUAGE plpgsql
AS $$
DECLARE
soundex text = '';
char text;
symbol text;
last_symbol text = '';
pos int = 1;
BEGIN
WHILE length(soundex) < 4 LOOP
char = upper(substr(input, pos, 1));
pos = pos + 1;
CASE char
WHEN '' THEN
-- End of input string
IF soundex = '' THEN
RETURN '';
ELSE
RETURN rpad(soundex, 4, '0');
END IF;
WHEN 'B', 'F', 'P', 'V' THEN
symbol = '1';
WHEN 'C', 'G', 'J', 'K', 'Q', 'S', 'X', 'Z' THEN
symbol = '2';
WHEN 'D', 'T' THEN
symbol = '3';
WHEN 'L' THEN
symbol = '4';
WHEN 'M', 'N' THEN
symbol = '5';
WHEN 'R' THEN
symbol = '6';
ELSE
-- Not a consonant; no output, but next similar consonant will be re-recorded
symbol = '';
END CASE;
IF soundex = '' THEN
-- First character; only accept strictly English ASCII characters
IF char ~>=~ 'A' AND char ~<=~ 'Z' THEN
soundex = char;
last_symbol = symbol;
END IF;
ELSIF last_symbol != symbol THEN
soundex = soundex || symbol;
last_symbol = symbol;
END IF;
END LOOP;
RETURN soundex;
END;
$$;
示例
db=# select column1, soundex(column1) from (values ('Robert'), ('Rupert'), ('Rubin'), ('Ashcroft')) data; column1 | soundex ----------+--------- Robert | R163 Rupert | R163 Rubin | R150 Ashcroft | A226 db=# create table words(word text); db=# copy words from '/usr/share/dict/words'; db=# create index on words(soundex(word)); db=# select * from words where soundex(word)=soundex('hotel'); word -------- hotel hotly huddle