SQLite FTS5 for (Drupal) Search API

There is a predictable sequence of events that unfolds whenever a new project requires genuinely good search.

You start with the default database solution—in Drupal, that’s typically the Search API Database Search backend. You quickly hit its performance wall and recognize that it won’t deliver the necessary relevance or speed. The industry-standard pivot point then looms large: Solr.

Image

Solr is the benchmark for enterprise search. It offers incredible power (NLP, MLT, advanced tokenization), but this power comes at a cost: a separate Java (JVM) process, dedicated resources, network latency, and the overhead of managing a whole new piece of infrastructure. For many small-to-medium-sized Drupal sites, this complexity can feel like a disproportionate tax on solving a relatively simple problem.

I recently found myself in this exact situation while architecting a new content model for a CakePHP-based application. The requirement was high-quality, relevant search for several hundred thousand records. My background is heavily rooted in complex ORM and SQL database design—I know my way around a query planner, but I’ll readily admit I’m no dedicated database guru, so it’s possible I haven’t accounted for every optimization technique available in alternative backends. I initially braced myself for the complexity of deploying Solr, but as an experiment, I explored SQLite’s FTS5 extension. When I saw the speed and BM25 relevance it delivered—all without touching a JVM or configuring a single external port—a powerful question emerged: If a single, well-optimized file on disk can deliver this level of performance and relevance, why are we automatically reaching for the complexity of the Solr/Elasticsearch paradigm? This realization, rather than a predefined mandate, led me to pursue a viable alternative.


🧐 FTS5: The Technical Core

My initial exploration led me back to a tool many of us take for granted: SQLite. Specifically, its modern Full-Text Search (FTS5) extension.

FTS5 is a serious, highly-optimized C implementation designed for complex text retrieval. The feature that solidified my decision was its native support for the BM25 ranking algorithm.

BM25, or Best Match 25, is the same probability-based ranking model used by major search engines and, critically, by Solr and Elasticsearch. It provides intelligent relevance scoring by factoring in term frequency and document length, avoiding the low-quality ranking typical of basic SQL solutions.

Architecture: Inverted Index & Snippets

The efficiency stems from FTS5’s structure. Queries aren’t running against the main database tables; they run against a highly specialized virtual table that manages the inverted index.

A search query leveraging FTS5 operates as an index lookup, retrieving the BM25 rank directly.

SELECT
    title,
    snippet(docs_fts, 0, '<b>', '</b>', '...', 5) AS body_snippet,
    rank
FROM
    docs_fts
WHERE
    docs_fts MATCH 'performance AND Solr'
ORDER BY
    rank DESC
LIMIT
    20;

Notice the native snippet() function for term highlighting. This functionality, which typically requires careful configuration in Solr, is built into the SQLite FTS5 extension. The module itself uses a Dual-Table Architecture—separating the FTS5 Virtual Table (for relevance) from the Field Data Table (for filtering/faceting)—to ensure both speed and flexibility.


🧪 Local Benchmark Caveats & Insights

To get a sense of raw performance, I ran a simplistic benchmark using the wrk tool. These results should be treated as initial insights only; they are not fully reliable production-grade metrics. The tests were conducted on a local, unoptimized ddev environment running within Docker containers on my development machine. Importantly, the machine runs Linux, which at least delivers near-native filesystem and CPU performance for the container layer, minimizing OS-level virtualization overhead.

The index contained roughly 7,000 items (across two content types), and the search page included two faceted filters with “OR” logic.

The test command used was:

wrk -t8 -c50 -d30s "https://[obfuscated-url]/products-index"

The results, obtained under significant simulated load (8 threads and 50 concurrent connections over 30 seconds):

MetricResultImplication
Average Latency (Query Time)57.09 msExcellent for a local solution and provides a very fast user experience, even without production caching layers.
Requests per Second (Total)839.53Nearly 840 searches per second. FTS5 is clearly not the bottleneck.
Max Latency83.26 msConsistent performance under stress.

Understanding the Performance Profile

The speed we’re seeing is a structural advantage. Because FTS5 is embedded, we eliminate the primary performance bottlenecks associated with external search engines like Solr. The key factors are: Zero Network Latency (the request is processed right where PHP is running), excellent Built-in Concurrency (WAL Mode allows simultaneous reads without blocking), and Smart Index Management (Memory-Mapped I/O keeps hot index data in memory for rapid, repeated queries).


🏗️ The Search API Integration

The logical step was to create the Search API SQLite FTS5 module. It’s the pragmatic, reliable middle ground developers have needed.

The module is well-tested and actively developed. You can find the module page, installation instructions, and documentation at www.drupal.org/project/search_api_sqlite. The source code is available for review and contribution at git.drupalcode.org/project/search_api_sqlite.

The biggest historical knock against using a file-based database like SQLite in a web environment is concurrency. We tackled this head-on using Write-Ahead Logging (WAL) Mode, which is the cornerstone of its high performance. WAL ensures that multiple users can perform search queries concurrently—reads never block reads—eliminating the most common cause of slowdowns. Furthermore, by giving each Search API index its own dedicated file and opening search connections in read-only mode, we maximize safety and concurrent optimization.

Beyond concurrency, the module delivers on feature parity. It fully supports Faceted Search (even complex “OR” logic) and fast Autocomplete using native prefix matching. Developers can choose powerful tokenizers like Unicode61 for language-aware searching—all configurable via the familiar Search API interface, but without the headache of managing external services.


📊 Solr vs. FTS5: The Comparison

The decision to use Search API SQLite FTS5 is one of scope and trade-offs. It requires a pragmatic assessment of whether your project genuinely needs the advanced features of a distributed system.

BackendInfrastructureRelevance RankingBest ForNLPSpellcheck
SQLite FTS5NoneBM25 (High)Small/Medium Sites (<500K items)Limited (via tokenizers)
Database SearchNoneBasic Frequency (Low)Very Simple Search
SolrServer (JVM)BM25 (High)Large Scale, Advanced Data Models✓ (Robust)
ElasticsearchServer (JVM/API)BM25 (High)Large Scale, Analytics, Logging✓ (Robust)

FTS5: The Right Scope

Consider this module when:

  1. Your site has a manageable number of searchable items (comfortably up to 500,000 documents).
  2. Your priority is high relevance (BM25) and low latency, but you cannot or will not manage external services like Solr.
  3. Your search needs are primarily centered on full-text retrieval, prefix matching, and standard faceting.

When Solr is Necessary

Solr remains the undisputed choice when:

  • You require features only available in a distributed search engine, such as true language-specific stemming.
  • You need “More Like This” (MLT) features.
  • You must handle millions of documents and require sharding or horizontal scaling capabilities.
  • You need integrated advanced analytics or logging alongside your search engine.

The Search API SQLite FTS5 module is a necessary corrective against the tendency to over-engineer. It is a highly optimized, low-cost solution that delivers enterprise-grade relevance (BM25) where the complexity of a full search stack is simply not justified.


The path to efficient software often involves choosing the least complicated tool that can successfully meet 95% of the requirements. Have you found other tools that redefine the complexity ceiling for seemingly ’enterprise’ problems?