MySQL5.6 Full-Text 스톱 워드

12.9.4 전문 스톱 워드

서버 문자 집합 및 데이터 정렬 ( character_set_server 및 collation_server 시스템 변수)를 사용하면 전체 쿼리의 중지 단어 목록을로드 및 검색됩니다. 전체 색인 또는 검색에 사용되는 불용어 파일 또는 컬럼에 character_set_server 또는 collation_server 다른 문자 세트 또는 데이터 정렬이 포함되어있는 경우에는 중지 단어 검색에서 잘못된 히트 또는 오류 발생 가능성이 있습니다.

중지 단어 검색에서 대소 문자를 구별 될지 여부는 서버 데이터 정렬에 따라 다릅니다. 예를 들어, 데이터 정렬이 latin1_swedish_ci 의 경우 검색에서 대소 문자를 구분하지 않지만 데이터 정렬이 latin1_general_cs 또는 latin1_bin 의 경우 검색에서 대소 문자를 구분합니다.

InnoDB 검색 인덱스의 중지 단어

기술, 문학 및 기타 소스에서 문서에서는 키워드로 또는 중요한 문구에 짧은 단어가 사용되는 경우가 많기 때문에, InnoDB 는 디폴트 스톱 워드가 비교적 짧아집니다. 예를 들어, "to be or not to be"를 검색하여 해당 단어가 무시되는 것이 아니라 적절한 결과를 가져 오는 것을 기대합니다.

기본 InnoDB 중지 단어 목록을 확인하려면 INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD 테이블을 쿼리합니다.

mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
+-------+
| value |
+-------+
| a     |
| about |
| an    |
| are   |
| as    |
| at    |
| be    |
| by    |
| com   |
| de    |
| en    |
| for   |
| from  |
| how   |
| i     |
| in    |
| is    |
| it    |
| la    |
| of    |
| on    |
| or    |
| that  |
| the   |
| this  |
| to    |
| was   |
| what  |
| when  |
| where |
| who   |
| will  |
| with  |
| und   |
| the   |
| www   |
+-------+
36 rows in set (0.00 sec)

모든 InnoDB 테이블에서 자신의 중지 단어 목록을 정의하려면 INNODB_FT_DEFAULT_STOPWORD 테이블과 같은 구조를 가진 테이블을 정의하고 중지 단어를 정착시키고 innodb_ft_server_stopword_table 옵션 값을 db_name / table_name 형식의 값으로 설정하고 에서 전체 텍스트 인덱스를 만듭니다. 불용어 테이블에는 value 라는 하나의 VARCHAR 컬럼이 포함되어 있어야합니다. 다음 예제에서는 InnoDB 에 새로운 글로벌 불용어 테이블을 만들고 구성하는 시연합니다.

  - Create a new stopword table

 mysql> CREATE TABLE my_stopwords (value VARCHAR (30)) ENGINE = INNODB;
 Query OK, 0 rows affected (0.01 sec)

 - Insert stopwords (for simplicity, a single stopword is used in this example)  

 mysql> INSERT INTO my_stopwords (value) VALUES ( 'Ishmael');
 Query OK, 1 row affected (0.00 sec)

 - Create the table

 mysql> CREATE TABLE opening_lines (
 id INT UNSIGNED AUTO_INCREMENT NOT NULL PRIMARY KEY,
 opening_line TEXT (500)
 author VARCHAR (200)
 title VARCHAR (200)
 ) ENGINE = InnoDB;
 Query OK, 0 rows affected (0.01 sec)

 - Insert data into the table

 mysql> INSERT INTO opening_lines (opening_line, author, title) VALUES
 ( 'Call me Ishmael.', 'Herman Melville', 'Moby-Dick')
 ( 'A screaming comes across the sky', 'Thomas Pynchon', 'Gravity \'s Rainbow ')
 ( 'I am an invisible man', 'Ralph Ellison', 'Invisible Man')
 ( 'Where now? Who now? When now?', 'Samuel Beckett', 'The Unnamable')
 ( 'It was love at first sight', 'Joseph Heller', 'Catch-22')
 ( 'All this happened, more or less', 'Kurt Vonnegut', 'Slaughterhouse-Five')
 ( 'Mrs. Dalloway said she would buy the flowers herself.', 'Virginia Woolf', 'Mrs. Dalloway')
 ( 'It was a pleasure to burn', 'Ray Bradbury', 'Fahrenheit 451');
 Query OK, 8 rows affected (0.00 sec)
 Records : 8 Duplicates : 0 Warnings : 0

 - Set the innodb_ft_server_stopword_table option to the new stopword table

 mysql> SET GLOBAL innodb_ft_server_stopword_table = 'test / my_stopwords';
 Query OK, 0 rows affected (0.00 sec)

 - Create the full-text index (which rebuilds the table if no FTS_DOC_ID column is defined)

 mysql> CREATE FULLTEXT INDEX idx ON opening_lines (opening_line);
 Query OK, 0 rows affected 1 warning (1.17 sec)
 Records : 0 Duplicates : 0 Warnings : 1

INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE 단어를 문의 지정된 중지 단어 ( 'Ishmael')가 표시되지 않는 것을 확인합니다.

참고

기본적으로 길이가 3 문자보다 적은 단어 또는 84 문자보다 많은 단어는 InnoDB 의 전체 텍스트 검색 인덱스에 표시되지 않습니다. 단어의 최대 길이와 최소 길이의 값은 innodb_ft_max_token_size 및 innodb_ft_min_token_size 변수를 사용하여 구성 할 수 있습니다.

mysql> SET GLOBAL innodb_ft_aux_table='test/opening_lines';
Query OK, 0 rows affected (0.00 sec)
  
mysql> SELECT word FROM INFORMATION_SCHEMA.INNODB_FT_INDEX_TABLE LIMIT 15;
+-----------+
| word      |
+-----------+
| across    |
| all       |
| burn      |
| buy       |
| call      |
| comes     |
| dalloway  |
| first     |
| flowers   |
| happened  |
| herself   |
| invisible |
| less      |
| love      |
| man       |
+-----------+
15 rows in set (0.00 sec)

중지 단어 목록을 테이블 당 만들려면 다른 불용어 테이블을 만들고 innodb_ft_user_stopword_table 옵션을 사용하여 사용되는 불용어 테이블을 지정하고 텍스트 인덱싱을 만듭니다.

MyISAM 검색 인덱스의 중지 단어

MySQL 5.6에서는 character_set_server 이 ucs2 , utf16 , utf16le 또는 utf32 의 경우 불용어 파일이 latin1 을 사용하여로드 및 검색됩니다.

MyISAM 테이블의 기본 스톱 워드를 무시하려면 ft_stopword_file 시스템 변수를 설정합니다. ( 섹션 5.1.4 "서버 시스템 변수" 를 참조하십시오.) 변수의 값은 중지 단어 목록을 포함하는 파일의 경로 이름 또는 중지 단어 필터링을 무효화 빈 문자열이되도록합니다 하십시오. 서버는 다른 디렉토리를 지정하는 절대 경로 이름이 지정되지 않는 한 데이터 디렉토리에있는 파일을 검색합니다. 이 변수 값 또는 스톱 워드 파일의 내용을 변경 한 후 서버를 다시 시작하고 FULLTEXT 인덱스를 다시 작성하십시오.

중지 단어 목록은 자유 형식으로 줄 바꿈, 공백, 쉼표 등의 숫자가 아닌 문자로 중지 단어가 구분됩니다. 예외적으로 밑줄 ( " _ ")와 단일 아포스트로피 ( ' ' ')는 단어의 일부로 처리됩니다. 스톱 워드리스트 문자 집합은 서버의 기본 문자 세트입니다. 섹션 10.1.3.1 "서버 문자 집합 및 정렬 순서" 를 참조하십시오.

다음 표는 MyISAM 검색 인덱스의 기본 중지 단어 목록을 보여줍니다. 이 목록은 MySQL 소스 배포판의 storage/myisam/ft_static.c 파일에서 찾을 수 있습니다.

a 's	able	about	above	according
accordingly	across	actually	after	afterwards
again	against	is not	all	allow
allows	almost	alone	along	already
also	although	always	am	among
amongst	an	and	another	any
anybody	anyhow	anyone	anything	anyway
anyways	anywhere	apart	appear	appreciate
appropriate	are	are not	around	as
aside	ask	asking	associated	at
available	away	awfully	be	became
because	become	becomes	becoming	been
before	beforehand	behind	being	believe
below	beside	besides	best	better
between	beyond	both	brief	but
by	c'mon	c 's	came	can
can not	can not	cant	cause	causes
certain	certainly	changes	clearly	co
com	come	comes	concerning	consequently
consider	considering	contain	containing	contains
corresponding	could	could not	course	currently
definitely	described	despite	did	did not
different	do	does	does not	doing
do not	done	down	downwards	during
each	edu	eg	eight	either
else	elsewhere	enough	entirely	especially
et	etc	even	ever	every
everybody	everyone	everything	everywhere	ex
exactly	example	except	far	few
fifth	first	five	followed	following
follows	for	former	formerly	forth
four	from	further	furthermore	get
gets	getting	given	gives	go
goes	going	gone	got	gotten
greetings	had	had not	happens	hardly
has	has not	have	have not	having
he	he 's	hello	help	hence
her	here	here 's	hereafter	hereby
herein	hereupon	hers	herself	hi
him	himself	his	hither	hopefully
how	howbeit	however	i 'd	i 'll
i 'm	i 've	ie	if	ignored
immediate	in	inasmuch	inc	indeed
indicate	indicated	indicates	inner	insofar
instead	into	inward	is	is not
it	it 'd	it 'll	it 's	its
itself	just	keep	keeps	kept
know	known	knows	last	lately
later	latter	latterly	least	less
lest	let	let 's	like	liked
likely	little	look	looking	looks
ltd	mainly	many	may	maybe
me	mean	meanwhile	merely	might
more	moreover	most	mostly	much
must	my	myself	name	namely
nd	near	nearly	necessary	need
needs	neither	never	nevertheless	new
next	nine	no	nobody	non
none	noone	nor	normally	not
nothing	novel	now	nowhere	obviously
of	off	often	oh	ok
okay	old	on	once	one
ones	only	onto	or	other
others	otherwise	ought	our	ours
ourselves	out	outside	over	overall
own	particular	particularly	per	perhaps
placed	please	plus	possible	presumably
probably	provides	que	quite	qv
rather	rd	re	really	reasonably
regarding	regardless	regards	relatively	respectively
right	said	same	saw	say
saying	says	second	secondly	see
seeing	seem	seemed	seeming	seems
seen	self	selves	sensible	sent
serious	seriously	seven	several	shall
she	should	should not	since	six
so	some	somebody	somehow	someone
something	sometime	sometimes	somewhat	somewhere
soon	sorry	specified	specify	specifying
still	sub	such	sup	sure
t 's	take	taken	tell	tends
th	than	thank	thanks	thanx
that	that 's	thats	the	their
theirs	them	themselves	then	thence
there	there 's	thereafter	thereby	therefore
therein	theres	thereupon	these	they
they 'd	they 'll	they 're	they 've	think
third	this	thorough	thoroughly	those
though	three	through	throughout	thru
thus	to	together	too	took
toward	towards	tried	tries	truly
try	trying	twice	two	un
under	unfortunately	unless	unlikely	until
unto	up	upon	us	use
used	useful	uses	using	usually
value	various	very	via	viz
vs	want	wants	was	was not
way	we	we 'd	we 'll	we 're
we 've	welcome	well	went	were
were not	what	what 's	whatever	when
whence	whenever	where	where 's	whereafter
whereas	whereby	wherein	whereupon	wherever
whether	which	while	whither	who
who 's	whoever	whole	whom	whose
why	will	willing	wish	with
within	without	will not	wonder	would
would not	yes	yet	you	you 'd
you 'll	you 're	you 've	your	yours
yourself	yourselves	zero