Skip to main content
  1. Posts/

object vs nested type in data mapping in Elasticsearch

·572 words·3 mins·
Table of Contents

In this post, I compare the object vs nested type used in data mapping in Elasticsearch.

object type mapping
#

By default, if you have a field where value is a list of dictionary type itself, the field is indexes by Elastic as object type. The structure of each dict under the field is not preserved.

Let’s have a concrete example:

DELETE new_index
PUT new_index/_doc/1
{
  "name": [
    {
      "first": "alice",
      "last": "smith"
    },
    {
      "first": "john",
      "last": "white"
    }
  ]
}

Internally, the document is flattened to something like this:

{
  "name.first": ["alice", "john"],
  "name.last": ["smith", "white"]
}

To verify this, let’s add a second document:

PUT new_index/_doc/2
{
  "name": [
    {
      "first": "alice",
      "last": "white"
    },
    {
      "first": "john",
      "last": "smith"
    }
  ]
}

Then we search the index to find document where “name.first” is alice, and “name.last” is white:

GET new_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "name.first": "alice"
          }
        },
        {
          "term": {
            "name.last": "white"
          }
        }
      ]
    }
  }
}

You would expect that document with id=2 is returned, however, both document 1 and 2 are returned.

nested type mapping
#

In order to correctly preserve structure of inner dictionary under the field, we need to define the “name” field as nested type. In this case, we need to explicitly setting the mapping for the “name” field before adding documents.

DELETE my_index

PUT my_index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "nested"
      },
      "attribute": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "keyword"
          },
          "value": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

Then let’s add two documents to this index:

PUT my_index/_doc/1
{
  "name": [
    {
      "first": "alice",
      "last": "smith"
    },
    {
      "first": "john",
      "last": "white"
    }
  ],
  "attribute": [
    {
      "name": "size",
      "value": "23"
    },
    {
      "name": "color",
      "value": "blue"
    }
  ]
}

PUT my_index/_doc/2
{
  "name": [
    {
      "first": "alice",
      "last": "white"
    },
    {
      "first": "john",
      "last": "smith"
    }
  ]
}

Now you can try to find documents where name.first is alice and name.last is white. Note that however, you need to use nested query instead of plain one above:

GET my_index/_search
{
  "query": {
    "nested": {
      "path": "name",
      "query": {
        "bool": {
          "must": [
            {
              "term": {
                "name.first": "alice"
              }
            },
            {
              "term": {
                "name.last": "white"
              }
            }
          ]
        }
      }
    }
  }
}

Now only document 2 is returned in the result.

nested vs object type
#

If you define a field as nested type, internally each dict under this field is stored as a separate Lucene document. It is just on the surface, you see one document when you do the normal search.

In the output of cat-indices api, there is this docs.count field, which shows the number of Lucene documents that this index has. In the above example, field name in index new_index is object type, and we indexed 2 documents to this index. If you run the cat-indices api for new_index, you see docs.count is 2.

GET _cat/indices/new_index?v
health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size dataset.size
yellow open   new_index dtWNUHroQ_OQkIRbQj8Bvw   1   1          2            0     10.9kb         10.9kb       10.9kb

For index my_index, both field name and attribute is defined as nested type, we indexed 2 documents to this index. The cat-index API shows that docs.stat is 8.

If you are interested in only the number of documents you indexed to an index, you can use the get-count API.

GET my_index/_count
GET new_index/_count

Related