Galileo: Implementing Full Text Search without Elastic Search

Summary
In order to allow quick access to all different pages on Galileo UI, we built our full text search. Instead of navigating through pages, users now are able to search for relevant documents including attributes, experiments, metrics, rollouts and segments.

Team
Data & AI

Author(s)
Ahmed Wael, Mohamed Abdo, Roman Atachiants

About the Author(s)
Ahmed, Mohamed and Roman are software engineers in Careem, working in the Data & AI team and helping to build a state of the art experimentation and feature toggles platform as well as a machine learning platform.

In the process of building Galileo, our experimentation platform, users needed to look up different pages in order to access the details of a specific document (Experiment, Rollout, etc). For instance, if a user wants to access the details of an experiment, they need to navigate to the experiments page, apply view filters to listed experiments and then look up the target experiment in the result list.
As the number of documents was increasing, this process was becoming very tedious for our users and was taking more time, negatively impacting user experience. 

In order to improve this, we decided to add a full text search feature in our portal to reduce the loading time and maintain an appealing user experience . Using the relevant keywords, users now can directly find relevant documents without the need for manually looking up all the listed documents.

In this section we cover the main structure for our portal and the base components for implementing our solution for the search feature.
Galileo UI (Portal)
Our UI is divided into pages, each page represents a document type such as attribute, experiment, metrics, etc. Each of these document types is stored as a separate database table in a document model. When a user navigates to one of these pages, they will be able to see a paginated table of documents belonging to that kind.

Kepler (Backend)
Our backend, named Kepler, is a microservice written in Go and allows our users to manage metadata around experiments, configurations and metrics. It exposes an HTTP(s) API to our front-end and command line tools and both publishes and consumes from a Kafka topic that is used to communicate with other microservices in our system.
Most importantly, each modification of a document generates a separate event that can be acted upon. In order to implement our decentralized search, we have leveraged this mechanism for data replication.

Publish / Subscribe
All micro-services within our dynamic configuration & experimentation platform (Galileo) communicate through a common message bus, employing a publish/subscribe pattern as shown in the diagram below. The message bus itself is built using Kafka and we are using cloudevents as the envelope for all of our messages on the bus.

We wanted users to be able to search for all documents they can see. Search thus must be expanded to cover different tables and be synchronized with all changes happening to each table.
On searching with a keyword, users expect to find all documents having that keyword in one of the text fields including:

Document name, project name and kind as a single string (e.g. “project/experiment”)
Description as a free text paragraph
Tags as a list of string tags (e.g. “mobile”, “pricing”)

In addition, there are some goals  that should be fulfilled by our proposed solution:

Simplicity. We wanted to keep it simple, no standalone services and no dedicated centralized caching layers would decrease the effort needed to maintain and operate the service on the long term. Handling availability and integrity of additional instances would add more complexity to our system and increase operational effort/cost.
Low response time. Users should see search results with minimal response time to ensure that users get search results instantly, as if they were using Google.
Minimize machine resources. We need to keep in mind the memory and disk footprints for the new solution. Search is good to have but we don’t want to overload our running instances with something being used only occasionally by our users.
Cost effectiveness. In addition to the previous point, this new feature should operate with minimum dollar cost. For example, using a separate ElasticSearch just for this use-case would have increased our server costs substantially.
Easy integration. Solution should be implemented in Golang in order to allow this to be seamlessly integrated with our backend and kept consistent across our tech stack.

Bleve text search engine was our first choice, as it is an open source text indexing library written in Go. It fulfills the majority of our goals and it can be integrated with our backend easily without any extra components, keeping the system simple. There must be events triggered on update to update the search index for both elasticsearch and Bleve given they are not our main database.
How Bleve works
All documents are ingested to Bleve taking care of parsing the text fields and saving indexes into a memory or disk directory in its own format. Moreover, indexes can be queried through APIs the library provides.
Our Approach

The diagram above is summarizing the way the service interacts with the different components
Index Mapping
Indexing in search engines is the process of feeding the data you have to the engine and then querying the engine with string queries and fetch results.
Bleve engine have 2 ways for understanding the mappings of objects:

On new instance instantiated 

Read all rows from tables we want to index with Bleve
Create Bleve batches from these rows
Index those batches with Bleve

The document create/update/delete in db generate kafka event
The kafka db event (document updated in the database by any instance)

Check if instance in the tables we index & if in index update entry in Bleve

The search request returns a search result sorted by relevance from Bleve index

Default mapping is the engine trying to figure out the type of each field in the object & add the proper analyzer and tokenization to it. Custom mapping is you defining a mapping for each field and define it’s type and analyzers.Default mapping is good for simple objects indexing. But if objects are a bit complex defining a custom mapping is crucial.  
Analyzers description
Analyzers are used to transform input text into a stream of tokens for indexing, picking the right analyser for each field type will help in getting better search results. Also we could try different Bleve analyzers as we captured their output in this link. Below the description of the components: 

Bleve simple analyzer: Tokenizer for single word fields as Name, Project, and Kind. 
English analyzer: Applies Compact Language Detection and lowercase filtering, which is useful for text fields that support edit distances as Description. 
Keyword analyzer: Which doesn’t apply any change and requires a complete match, useful for the ID field.
Standard analyzer:  Used in our tags field. Apply Unicode tokens, remove English stop words and lowercase text. As the example below.

Implementing Bleve with such configurations will help us avoid reinventing the wheel every time we index a document. In addition, we used Bleve batch to batch multiple documents in one go which is much faster than indexing documents one by one.
Querying description
The last step in Bleve integration was to adjust queries to our needs.We have two types of query.

General Queries for all document types.
Query documents of a specific type.

The difference between both types is how we treated the types in the search query:

General queries can be represented as a Disjunction (Union) of:

WildCard(“*%v*”) : for middle word search.
Fuzzy: with edit distance 3.
Prefix search

Kind-specific queries are a Conjunction (Intersection) of the General query with an extra Kind condition.

Search UI component
We use React in our Front End with Ant Design components, which works really well for building internal platforms. It has a variety of out of the box react components that we can use compatible and performant with modern browsers.
On the UI side, we have added the search component in the navigation bar as shown below. A search box is available where the user can navigate through using the up and down keyboard keys. Ant design components were a bit limited so we decided to implement a few components from scratch like the autocomplete functionality. We opted for a nice capability that has the same style as the Github search component.

On the other side, and in order to provide a user with search scopes, a custom React component was built to specify the document types we have in addition to a wildcard type selected by default, as shown in the image below. This implementation allowed our users to narrow down search results to a specific type of document.
After submitting the search query, results are displayed as shown in the image below. Search results are ranked by relevance and presented to the user in a similar way to other documents across the Galileo platform.

On any document create/update/delete we need to apply the change in the Bleve search index corresponding entry. 
Since we have Bleve as a search index in each instance of kepler, we should make sure that the state of all indexes across all backend instances is the same. Building on what we have mentioned above in the Publish / Subscribe section, when any instance has a successful storage layer operation, it needs to broadcast an index event to the message bus. The index event has two main fields:

Document: the new Document structure after applying the storage change. This field shall contain the Id of the Document and the necessary fields for the indexing operation.
Action: Operation type (create, update, delete, archive, …etc)

Hence, any Index event triggered on one instance will be visible to all other instances.

When an instance consumes an index event, it first checks the action attribute, then passes the document to the corresponding Bleve client method.

goos: darwin goarch: amd64 CPU: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz Memory: 16GB Laptop: Macbook PRO 2019

Type of query
Number of documents
Query Time

No Result
100
~0.60 ms

No Result
1000
~5.2 ms

No Result
10000
~68.6 ms

No Result
100000
~695.2 ms

1 Result Query
100
~0.5 ms

1 Result Query
1000
~4.4 ms

1 Result Query
10000
~57.9 ms

1 Result Query
100000
~666.1 ms

All Documents Match
100
~1.2 ms

All Documents Match
1000
~7.6 ms

All Documents Match
10000
~85.2 ms

All Documents Match
100000
~1087.7 ms

In our quest of simplifying the process of using experimentation, Bleve has proven a good tool for the job. We were able to integrate it easily enough and synchronize document indexing across all of our backend services.
However, Bleve does not seem to scale well to millions of documents, at least in our synthetic benchmarks (shown above). As you can see, searching a single document in a 100K collection takes about 600 milliseconds, which would negatively impact user experience. That being said, if your corpus of documents is relatively small, under 10-20 thousand documents, Bleve is easy to use and provides good enough performance for most use-cases.

Go to Source