Elasticsearch Dynamic Mapping Pitfalls

In Elasticsearch, when you index document to an index, if you do not specify the mapping for the field, Elastic will use dynamic mapping to infer the data type for your field.

This is an convenient feature, but it sometimes has unindented behavior. If you do not have unified data for your field, the inferred data type for the field can be random. Let’s check an example:

DELETE my-index

PUT my-index/_doc/1
{
  "field1": 12,
  "field2": 12.99
}

PUT my-index/_doc/2
{
  "field1": 4.99,
  "field2": 4
}

PUT my-index/_doc/3
{
  "field1": 11.99,
  "filed2": 2.9
}

If we check the mapping for field field1 and field2:

GET my-index/_mapping

The mapping for these two fields are different:

{
  "my-index": {
    "mappings": {
      "properties": {
        "field1": {
          "type": "long"
        },
        "field2": {
          "type": "float"
        }
      }
    }
  }
}

So if you do not have explicit mapping for the field, then the inferred field type will be whichever type Elastic sees first for that field. It also has real impact for your search, e.g., when you aggregate over field1, the result is not correct:

POST my-index/_search
{
  "size": 0,
  "aggs": {
    "avg_price": {
      "avg": {
        "field": "field1"
      }
    }
  }
}

The average value is shown as 9, which is not correct.

To ensure correctness, it is better to specify the field mapping in dynamic mapping template.

Related