This chapter describes a lot of things that you need to know when working on the MySQL code. If you plan to contribute to MySQL development, want to have access to the bleeding-edge in-between versions code, or just want to keep track of development, follow the instructions in See section 4.8 Installing from development source tree. If you are interested in MySQL internals, you should also subscribe to internals@lists.mysql.com. This is a relatively low traffic list, in comparison with mysql@lists.mysql.com.
The MySQL server creates the following threads:
process_alarm()
to force timeouts on connections
that have been idle too long.
mysqld
is compiled with -DUSE_ALARM_THREAD
, a dedicated
thread that handles alarms is created. This is only used on some systems where
there are problems with sigwait()
or if one wants to use the
thr_alarm()
code in ones application without a dedicated signal
handling thread.
--flush_time=#
option, a dedicated thread is created
to flush all tables at the given interval.
INSERT DELAYED
gets its
own thread.
--master-host
, a slave replication thread will be
started to read and apply updates from the master.
mysqladmin processlist
only shows the connection, INSERT DELAYED
,
and replication threads.
Since Version 3.23.23, MySQL has support for full-text indexing
and searching. Full-text indexes in MySQL are an index of type
FULLTEXT
. FULLTEXT
indexes can be created from VARCHAR
and TEXT
columns at CREATE TABLE
time or added later with
ALTER TABLE
or CREATE INDEX
. For large datasets, adding
FULLTEXT
index with ALTER TABLE
(or CREATE INDEX
) would
be much faster than inserting rows into the empty table with a FULLTEXT
index.
Full-text search is performed with the MATCH
function.
mysql> CREATE TABLE t (a VARCHAR(200), b TEXT, FULLTEXT (a,b)); Query OK, 0 rows affected (0.00 sec) mysql> INSERT INTO t VALUES -> ('MySQL has now support', 'for full-text search'), -> ('Full-text indexes', 'are called collections'), -> ('Only MyISAM tables','support collections'), -> ('Function MATCH ... AGAINST()','is used to do a search'), -> ('Full-text search in MySQL', 'implements vector space model'); Query OK, 5 rows affected (0.00 sec) Records: 5 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM t WHERE MATCH (a,b) AGAINST ('MySQL'); +---------------------------+-------------------------------+ | a | b | +---------------------------+-------------------------------+ | MySQL has now support | for full-text search | | Full-text search in MySQL | implements vector-space-model | +---------------------------+-------------------------------+ 2 rows in set (0.00 sec) mysql> SELECT *,MATCH a,b AGAINST ('collections support') as x FROM t; +------------------------------+-------------------------------+--------+ | a | b | x | +------------------------------+-------------------------------+--------+ | MySQL has now support | for full-text search | 0.3834 | | Full-text indexes | are called collections | 0.3834 | | Only MyISAM tables | support collections | 0.7668 | | Function MATCH ... AGAINST() | is used to do a search | 0 | | Full-text search in MySQL | implements vector space model | 0 | +------------------------------+-------------------------------+--------+ 5 rows in set (0.00 sec)
The function MATCH
matches a natural language query AGAINST
a text collection (which is simply the columns that are covered by a
FULLTEXT index). For every row in a table it returns relevance -
a similarity measure between the text in that row (in the columns that are
part of the collection) and the query. When it is used in a WHERE
clause (see example above) the rows returned are automatically sorted with
relevance decreasing. Relevance is a non-negative floating-point number.
Zero relevance means no similarity. Relevance is computed based on the
number of words in the row, the number of unique words in that row, the
total number of words in the collection, and the number of documents (rows)
that contain a particular word.
MySQL uses a very simple parser to split text into words. A ``word'' is any sequence of letters, numbers, `'', and `_'. Any ``word'' that is present in the stopword list or just too short (3 characters or less) is ignored.
Every correct word in the collection and in the query is weighted, according to its significance in the query or collection. This way, a word that is present in many documents will have lower weight (and may even have a zero weight), because it has lower semantic value in this particular collection. Otherwise, if the word is rare, it will receive a higher weight. The weights of the words are then combined to compute the relevance of the row.
Such a technique works best with large collections (in fact, it was carefully tuned this way). For very small tables, word distribution does not reflect adequately their semantical value, and this model may sometimes produce bizarre results.
For example, search for the word "search" will produce no results in the above example. Word "search" is present in more than half of rows, and as such, is effectively treated as a stopword (that is, with semantical value zero). It is, really, the desired behavior - a natural language query should not return every other row in 1GB table.
A word that matches half of rows in a table is less likely to locate relevant documents. In fact, it will most likely find plenty of irrelevant documents. We all know this happens far too often when we are trying to find something on the Internet with a search engine. It is with this reasoning that such rows have been assigned a low semantical value in a particular dataset.
Unfortunately, full-text search has no user-tunable parameters yet, although adding some is very high on the TODO. However, if you have a MySQL source distribution (See section 4.7 Installing a MySQL Source Distribution.), you can somewhat alter the full-text search behavior.
Note that full-text search was carefully tuned for the best searching effectiveness. Modifying the default behavior will, in most cases, only make the search results worse. Do not alter the MySQL sources unless you know what you are doing!
myisam/ftdefs.h
file by the line
#define MIN_WORD_LEN 4Change it to the value you prefer, recompile MySQL, and rebuild your
FULLTEXT
indexes.
myisam/ft_static.c
Modify it to your taste, recompile MySQL and rebuild
your FULLTEXT
indexes.
myisam/ftdefs.h
:
#define GWS_IN_USE GWS_PROBto
#define GWS_IN_USE GWS_FREQand recompile MySQL. There is no need to rebuild the indexes in this case.
This section includes a list of the fulltext features that are already implemented in the 4.0 tree. It explains More functions for full-text search entry of section H.1 Things that should be in 4.0.
REPAIR TABLE
with FULLTEXT
indexes,
ALTER TABLE
with FULLTEXT
indexes, and
OPTIMIZE TABLE
with FULLTEXT
indexes are now
up to 100 times faster.
MATCH ... AGAINST
now supports the following
boolean operators:
+
word means the that word must be present in every
row returned.
-
word means the that word must not be present in every
row returned.
<
and >
can be used to decrease and increase word
weight in the query.
~
can be used to assign a negative weight to a noise
word.
*
is a truncation operator.
ft_dump
added for low-level FULLTEXT
index operations (querying/dumping/statistics).
FULLTEXT
index faster.
()
in boolean fulltext search.
MERGE
tables.
FULLTEXT
in CREATE/ALTER TABLE
).
Until recently, our main full-coverage test suite was based on proprietary
customer data and for that reason has not been publicly available. The only
publicly available part of our testing process consisted of the crash-me
test, a Perl DBI/DBD benchmark found in the sql-bench
directory, and
miscellaneous tests located in tests
directory. The lack of a
standardized publicly available test suite has made it difficult for our users,
as well developers, to do regression tests on the MySQL code. To address this
problem, we have created a new test system that is included in the source
and binary distributions starting in Version 3.23.29.
The test system consist of a test language interpreter (mysqltest
),
a shell script to run all tests(mysql-test-run
), the actual test cases
written in a special test language, and their expected results. To run the
test suite on your system after a build, type mysql-test/mysql-test-run
from the source root. If you have installed a binary distribution, cd
to the install root (eg. /usr/local/mysql
), and do
scripts/mysql-test-run
. All tests should succeed. If they do not,
use mysqlbug
to send a bug report to bugs@lists.mysql.com.
Make sure to include the output of mysql-test-run
, as well as
contents of all .reject
files in mysql-test/r
directory.
If you have a copy of mysqld
running on the machine where you want to
run the test suite you do not have to stop it, as long as it is not using
ports 9306
and 9307
. If one of those ports is taken, you should
edit mysql-test-run
and change the values of the master and/or slave
port to one that is available.
The current set of test cases is far from comprehensive, as we have not yet converted all of our private tests to the new format. However, it should already catch most obvious bugs in the SQL processing code, OS/library issues, and is quite thorough in testing replication. Our eventual goal is to have the tests cover 100% of the code. We welcome contributions to our test suite. You may especially want to contribute tests that examine the functionality critical to your system, as this will ensure that all future MySQL releases will work well with your applications.
You can use the mysqltest
language to write your own test cases.
Unfortunately, we have not yet written full documentation for it - we plan to
do this shortly. You can, however, look at our current test cases and use
them as an example. The following points should help you get started:
mysql-test/t/*.test
mysql-test/mysql-test-run test_name
removing .test
extension from the file name
;
terminated statements and is similar to the
input of mysql
command line client. A statement by default is a query
to be sent to MySQL server, unless it is recognized as internal
command ( eg. sleep
).
SELECT
, SHOW
,
EXPLAIN
, etc., must be preceded with @/path/to/result/file
. The
file must contain the expected results. An easy way to generate the result
file is to run mysqltest -r < t/test-case-name.test
from
mysql-test
directory, and then edit the generated result files, if
needed, to adjust them to the expected output. In that case, be very careful
about not adding or deleting any invisible characters - make sure to only
change the text and/or delete lines. If you have to insert a line, make sure
the fields are separated with a hard tab, and there is a hard tab at the end.
You may want to use od -c
to make sure your text editor has not messed
anything up during edit. We, of course, hope that you will never have to edit
the output of mysqltest -r
as you only have to do it when you find a
bug.
mysql-test/r
directory and name them test_name.result
. If the
test produces more than one result, you should use test_name.a.result
,
test_name.b.result
, etc.
.reject
extension. If your test case is
failing, you should do a diff on the two files. If you cannot see how
they are different, examine both with od -c
and also check their
lengths.
!
if the test can continue after that query
returns an error.
source include/master-slave.inc;
. To switch between
master and slave, use connection master;
and connection slave;
.
If you need to do something on an alternate connection, you can do
connection master1;
for the master, and connection slave1;
for
the slave.
let $1=1000; while ($1) { # do your queries here dec $1; }
sleep
command. It supports fractions
of a second, so you can do sleep 1.3;
, for example, to sleep 1.3
seconds.
mysql-test/t/test_name-slave.opt
. For
the master, put them in mysql-test/t/test_name-master.opt
.
Go to the first, previous, next, last section, table of contents.