Recently when I was working with Elasticsearch, I found this weird issue that the document count for the newly created index is not correct. Also there are problems with reindex, if you reindex the source index to dst index, there is nothing in the dst index despite no errors.
Here is a short code to reproduce the issue:
from elasticsearch import Elasticsearch
# set up client
es_client = Ealsticsearch(...)
n = 100
index_name = "my_index"
if es_client.indices.exists(index=index_name):
es_client.indices.delete(index=index_name)
docs = []
for i in range(n):
doc = {"title": f"this is document {i}"}
docs.append(doc)
for doc in docs:
es_client.index(index=index_name, document=doc)
response = es_client.count(index=index_name)
print("count: ", response)
In the above code, we create a index named my_index
and try to index 100 document into it.
The print out message shows the number of document in my_index
is actually not 100.
However, if you wait for a few seconds and run the count API from kibana or you run the count API only,
the number of doc in my_index
is correct.
This has something to do with the inner workings of Elasticsearch.
After immediately indexing document to the index, they are not available for search.
Elasticsearch will refresh the index every index.refresh_interval
1, and the interval is by default 1 second.
Only after this refresh, you can search the document.
There are two solutions for this:
- after the indexing operation, use the refresh API to refresh the index manually.
- However, the refresh operation may be expensive. We can also wait for the periodic refresh of Elasticsearch.
If you use the bulk index or index API,
we can set the parameter
refresh
towait_for
. This will make sure there is a refresh before your search operation.
references#
- near real-time search: https://www.elastic.co/guide/en/elasticsearch/reference/current/near-real-time.html
- the refresh param for index, delete, update, bulk API: https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-refresh.html