Recently when I was working with Elasticsearch, I found this weird issue that the document count for the newly created index is not correct. Also there are problems with reindex, if you reindex the source index to destination index, there is nothing in the destination index despite no errors.
Here is a short code to reproduce the issue:
from elasticsearch import Elasticsearch
# set up client
es_client = Ealsticsearch(...)
n = 100
index_name = "my_index"
if es_client.indices.exists(index=index_name):
es_client.indices.delete(index=index_name)
docs = []
for i in range(n):
doc = {"title": f"this is document {i}"}
docs.append(doc)
for doc in docs:
es_client.index(index=index_name, document=doc)
response = es_client.count(index=index_name)
print("count: ", response)
In the above code, we create an index my_index
and try to index 100 documents into it.
The print out message shows the number of document in my_index
is actually not 100.
However, if you wait for a few seconds and run the count API from kibana or you run the count API only,
the number of doc in my_index
is correct.
This has something to do with the inner workings of Elasticsearch.
After immediately indexing document to the index, they are not available for search.
Elasticsearch will refresh the index every index.refresh_interval
1, which is by default set to 1s.
Only after the refresh, you can search/find the document.
There are two solutions for this:
- after the indexing operation, use the refresh API to refresh the index manually.
- However, the refresh operation may be expensive. We can also wait for the periodic refresh of Elasticsearch.
If you use the bulk index or index API,
we can set the parameter
refresh
towait_for
. This will make sure there is a refresh before your search operation.
References#
- near real-time search: https://www.elastic.co/guide/en/elasticsearch/reference/current/near-real-time.html
- the refresh param for index, delete, update, bulk API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html