SOLR search engine and Elastic Search – comparision

27.06.2019 Angelika Siczek

In the field of open source search engines, SOLR and Elasticsearch are the leaders. Their common feature is Lucene – the software on which both search engines are based. There are a lot of differences, so the choice between them must be subjective. To choose correctly, you need to be aware of your needs and to match them with the right search engine.

A global or partial cache

Both search engines have a complicated system of using handheld files. Crawling in Lucene is done on data files and can not be changed. The indexes are divided into segments. During the indexing new segments are created, and at the same time Lucene can create large segments from small ones.

Elasticsearch developers decided to take advantage of a certain dependency, which occurs in cooperation between cache files and segments. During the change in the segment, only a small piece of data needs to be refreshed. On the other hand, SOLR uses the global method, meaning that every smallest change must refresh the entire database. This solution is more time-consuming, but it also strengthens hardware.

Performing queries

Queries that narrow the search range are not always good for the search engine performance. No wonder, since the entire database must be searched in order to find records that meet the requirements. Doc values can reduce search time. In Elasticsearch, these values are automatically run. This means that the search engine knows whether it must search all documents or just iterate a subset. SOLR does not have such a function, so that the scope-narrowing query will take longer.

Network node search (node)

Each of the compared search engines has a different approach to searching nodes. This process is responsible for undertaking actions in the event of creating a new cluster, destroying the noda or when a new node is added to the cluster. SOLR uses Apache Zookeper in these situations. At least three instances are needed for implementation. However, Elasticsearch uses Zen. To gain full tolerance, there must be three masternodes.

High performance and precision analysis engines

If the data on which you operate is unchangeable, and high efficiency and accuracy is your priority – SOLR is a better choice. It can handle this matter better than Elasticsearch because there is no degradation in SOLR due to loss of precision.

Technical and development support

Here the differences take on the contrast. Elasticsearch is a fully commercial entity, while SOLR belongs to the Apache Software Foundation. Apache is a community-based open source platform. Developers can modify the code and anyone who proves their abilities can support a common cause. The community is on guard, and every member of the group who knows Apache’s philosophy and approach knows how high the level of cooperation between creators is.

On the other hand, there is Elasticsearch, which allows modifying the code, but the final decision about implementation belongs to Elastic. Community members can not create the shape of the product themselves, because the most important and decisive creators are Elastic employees.

Configuration and installation

At the moment, the Elasticsearch starter package is ⅙ of SOLR size. Thanks to the JSON configuration, the installation process is also easier and much faster in Elastic. However, it has some limitations, such as the inability to comment on any change in files. SOLR developers removed troublesome complications that existed in previous iterations. This was done using the Rest API. So if you work on a project you do not use JSON for, the SOLR will be the better choice.

Machine learning

For machine learning, Elasticsearch uses the X-Pack. This is a commercial plugin that works with Kibana. It provides algorithms for learning machines. Its plus is a number of additional tools, but the downside is high costs. For Elasticsearch, there are some alternatives available through cloud, commercial software houses or open access tools. SOLR provides machine learning for free. The process consists in classifying the logical regression based on the implementation.

Tools and equipment

In this matter, Elasticsearch is the prime knowledge. Has a huge amount of tools such as the aforementioned Kibana. However, if you want to write on SQL, you will do it using the SOLR connection with Apache Zeppelin. Unfortunately, none of the tools provided by SOLR meet the modern requirements. The Elasticsearch environment is newer, more frequently updated and enriched with new features. This is also seen by the number of companies that use Elasticsearch, also in the field of data shipping.

Developer and IT cooperation (DevOps)

In this matter, SOLR falls out worse in the eyes of DevOps supporters. Feedback is limited and shredded. Only part of it leaks through the JMX MBean, as well as through the new Metrics API. Elasticsearch went on. Their diagnostics are facilitated by access to such data as job statistics, disk usage, memory, information storage and caching. In addition, it has better API and easier installation.

Speed and full-text search

It is difficult to clearly determine which of these search engines is faster. Elasticsearch deals better in a rapidly changing environment. On the other hand, SOLR, by the way you use the cache, will be better for unchanging data. SOLR knows the full-text search experience because it has many special functions for it. There is a function of improving errors of typed passwords, syntax analyzers and implementation of search suggestions between them. In the case of Elasticsearch, there is one dedicated hinting tool. This means that the implementation is easy, but without the possibility of making changes to the code. It reduces its flexibility. SOLR offers more search configuration options.

The final bill

It is difficult to determine which of these search engines is better. It all depends on the developer’s requirements and the project he is working on. So you have to ask yourself a few questions – what kind of environment do you prefer? How easy is the installation and configuration? But most importantly – what type of data should the engine handle? Each product has different qualities and only on the basis of specific requirements you can make the right choice.