Sphinx

I use Sphinx search on a lot of my sites as it has high indexing and searching performance as well as advanced indexing and querying tools. It does have a rather complicated configuration depending on what you intend to search (fourms, blogs, etc). If you seach around you will find various Sphinx configuration files, which are a modified form of MySQL querys, for your specific platform.

Note that SphinxSearch requires at least the MySQL client libraries to be installed. If you are using MySQL Server (or a MySQL variant), that will also install the client libraries.

Install

The version of sphinx that comes with your distrubtion may be out of date, so you will want to add the PPA repository for a version you want to run. Below is version 2.2 for Ubuntu (use -rel21 for 2.1 or even rel23 for the beta of 2.3)

# Add the ppa of your choice
add-apt-repository ppa:builds/sphinxsearch-rel22
apt-get update
apt-get install sphinxsearch

# Stop Sphinx until we get our configuration in place
sudo service sphinxsearch stop

Thats all that is involved to install Sphinx, below are where the log and configurtaion files will be located. You need to know these paths for your config files.

File Locations

/var/log/sphinxsearch # log
/var/lib/sphinxsearch/data # data
/etc/sphinxsearch # configuration directory

Configuration Files

Sphinx configuration consists of 3 main blocks index, searchd, and source. Each of these blocks is described below.

Source Block

The source block contains the type of source, username and password to the MySQL server. The first column of the SQL query should be a unique id. The SQL query will run on every index and dump the data to Sphinx index file. Below are descriptions of each field and the source block itself.

  • sql_host: Hostname for the MySQL host.
  • sql_user: Username for the MySQL login.
  • sql_pass: Password for the MySQL user.
  • sql_db: Name of the database that contains the data to index.
  • sql_query: This is the query thats creates data for the index and will be specific to the platform (blog, forum) that you are running.

Index Area

The index area contains the source and the path to store the data.

  • source: Name of the source block.
  • path: This path to save the index
  • charset_type: This is the charset of the index, like utf-8.

Searchd Area

The searchd area contains variables to run the Sphinx daemon.

  • listen: This is the port which sphinx daemon will run, generally 9312.
  • query_log: This path to save the query log.
  • pid_file: This is path to the PID file of the Sphinx daemon.
  • max_matches: Maximum number matches to return per search term.
  • seamless_rotate: Prevents searchd stalls while rotating indexes
  • preopen_indexes: Whether to forcibly preopen all indexes on startup.
  • unlink_old: Whether to unlink old index copies on successful rotation.

Add data

Add data to index using the config we created.

sudo indexer --all

Start Sphinx

By default, the Sphinx daemon is tuned off on system statup. To enable it start on boot edit /etc/default/sphinxsearch and find the line START=no and change it to yes.

sudo vi /etc/default/sphinxsearch

Save and close the file, and start the Sphinx daemon.

sudo service sphinxsearch start

Keeping the index up to date

Since data will be added to your site, the index needs to be updated on some reoccuring scheudle. To do this create a cronjob. How often you do this depends on how much data is being added as well as the time it takes to run the command. Below updates the index every hour.

sudo crontab -e
0 * * * * /usr/bin/indexer --config /etc/sphinxsearch/sphinx.conf --rotate -all