Rails Free Text Search: Sphinx or Ferret

May 2nd, 2008

Posted by Steve Butterworth

I’ve now had chance to give both Ferret and Sphinx a good flogging on recent Rails projects and think its time to put them head to head.

Ferret

Ferret along with Acts As Ferret offers a developer friendly Rails way of adding free text search. It closely follows the Lucene syntax so if you want to go beyond the normal keyword search and add filters, data rangers, fuzzy strength, flags its all possible with a bit of swatting up on the Lucene here. The reason it is so developer friendly is that it uses ActiveRecord callbacks to update the indexes. That gives 2 key advantages:
  1. Indexes are always up to date if all your relevent database changes are happening through your rails app.
  2. The indexer will call ActiveRecord attributes/methods rather than looking directly in the database which means where you have complex associations you are trying index in someway you can just convert the relevant data into a comma separated string or something like that.

So ferret is pretty straight forward to use thanks to a great plugin, a great tutorial and a background server which is vital in a production environment. But and it seems to be a big BUT does it work in busy production environments. Ezra from Engine Yard who knows more about these things than most says they have had constant trouble with ferret on deployed applications and have helped their clients move to other solutions. I on the other hand have had no problems in production environments and found it works fine as long as capistrano correctly stops and starts the server between code updates to avoid index corruption. Also looking around the web plenty of other people seem to have success with ferret too. Perhaps its just high traffic sites where it becomes a problem or perhaps recent ferret releases have fixed the issues but I’m still not convinced that this solution is dead.

Sphinx

Sphinx is the new kid on the block in the Rails world and already has 3 well used plugins. I hope they converge eventually as it seems wrong 3 people/teams working separately on the same goal. Anyway for reference the plugins are ThinkingSphinx, Sphincter and finally my chosen companion UltraSphinx.

I found this a lot harder to setup probably as the tutorials out there are not as clear as the Rails Envy ferret tutorial (its on my to-do list) and perhaps because its relationship with Rails even with a well developed plugin seems less natural than ferrets. Sphinx is powerful, fast and robust several things that ferret supposedly isn’t. But I have a couple of issues with it.
  1. The code required to get associations indexed requires some lengthy sql joinery.
  2. A bigger issue is the fact that you need to reindex with a cron job periodically. This means for the period of time in between reindexes the search results may be out of date. Now for many sites this isn’t an issue especially traditional publishing style sites. But for web application it can be. If you add data you expect to be able to find it again seconds later. Thanks to delta indexing we can run frequent delta indexes to keep indexes more up to date without having to reindex the whole database but still there are likely to be a couple of minutes of lag and thats not always acceptable depending on the application.

Conclusion

So recently in the Rails world I have read a lot of Sphinx good Ferret bad posts and now I’m not sure its that simple. I think if you need a fast and robust solution for high traffic sites then sphinx may be best but if you don’t have huge traffic or you have web apps where instant indexing is a real useability need then I still think ferret has legs.

Sorry, comments are closed for this article.