Skip to main content
  1. Posts/

Index refresh issue in Elasticsearch

·298 words·2 mins·
Note Elasticsearch
Table of Contents

Recently when I was working with Elasticsearch, I found this weird issue that the document count for the newly created index is not correct. Also there are problems with reindex, if you reindex the source index to dst index, there is nothing in the dst index despite no errors.

Here is a short code to reproduce the issue:

from elasticsearch import Elasticsearch

# set up client
es_client = Ealsticsearch(...)

n = 100
index_name = "my_index"
if es_client.indices.exists(index=index_name):
    es_client.indices.delete(index=index_name)

docs = []
for i in range(n):
    doc = {"title": f"this is document {i}"}
    docs.append(doc)

for doc in docs:
    es_client.index(index=index_name, document=doc)
response = es_client.count(index=index_name)
print("count: ", response)

In the above code, we create a index named my_index and try to index 100 document into it. The print out message shows the number of document in my_index is actually not 100.

However, if you wait for a few seconds and run the count API from kibana or you run the count API only, the number of doc in my_index is correct.

This has something to do with the inner workings of Elasticsearch. After immediately indexing document to the index, they are not available for search. Elasticsearch will refresh the index every index.refresh_interval1, and the interval is by default 1 second. Only after this refresh, you can search the document.

There are two solutions for this:

  • after the indexing operation, use the refresh API to refresh the index manually.
  • However, the refresh operation may be expensive. We can also wait for the periodic refresh of Elasticsearch. If you use the bulk index or index API, we can set the parameter refresh to wait_for. This will make sure there is a refresh before your search operation.

references
#

Related

How to Use the Elasticsearch task API
··329 words·2 mins
Note Elasticsearch
Speed up document indexing in Elasticsearch via bulk indexing
·355 words·2 mins
Python Elasticsearch
Databricks Cli Usage
·141 words·1 min
Note Databricks