Skip to main content
  1. Posts/

Select fields in Elasticsearch: _source, fields and stored_fields

·490 words·3 mins·
Table of Contents

In Elasticsearch, when we index documents to an index, by default the source of the document is stored in meta field _source. When you search your index, you see a special field _source for each matched/hit product.

source and stored_fields
#

This is the default behavior, if you want to disable the storing of _source and only store a few fields, this is also possible1. You can disable the _source field like this:

PUT movies
{
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "name": {
        "type": "text",
        "store": true
      },
      "plot": {
        "type": "text",
        "store": false
      }
    }
  }
}

In the above request to create the index setting, we disabled the _source and enabled the storage of name field with store mapping. Then we can try to index a document and search this index

POST movies/_doc/1
{
  "name": "name1",
  "plot": "exciting plot hello"
}


GET movies/_search
{
  "query": {
    "match": {
      "name": "name1"
    }
  }
}

Notice that there is no _source field for each hit. Even if you add "_source": true to the request, it won’t work.

There is a parameter stored_fields in the search api, where you can specify the stored fields you want to check.

GET movies/_search
{
  "stored_fields": ["name", "plot"],
  "query": {
    "match": {
      "name": "name1"
    }
  }
}

In the above search request, we explicitly specify the fields we want to check. However, only name is a stored field. In the result for each hit, you only see the info for field name, not plot.

source filtering and field selection
#

You can get the value of a field from both the _source and through the fields parameter. However, in the _source, you get raw, untransformed value. If you specify a field in the fields parameter, you get mapped/transformed result.

PUT my_index/
{
  "mappings": {
    "runtime": {
      "calculated_count": {
        "type": "long",
        "script": {
          "source": "emit(doc['count'].value + 1)"
        }
      }
    },
    "properties": {
      "created": {
        "type": "date"
      }
    }
  }
}

POST my_index/_doc/1
{
  "count": 100,
  "name": "hello",
  "created": "2024-05-06"
}

GET my_index/_search
{
  "fields": [
    "created", "calculated_count"
  ],
  "_source": true
}

In the above request, we set the created field to date type. If you check the search request output, you will find that the date value is different under fields and _source,

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "count": 100,
          "name": "hello",
          "created": "2024-05-06"
        },
        "fields": {
          "created": [
            "2024-05-06T00:00:00.000Z"
          ],
          "calculated_count": [
            101
          ]
        }
      }
    ]
  }
}

The fields parameter can also include runtime fields, such as calculated_count above, which is not possible with _source. See doc here for more details discussion on fields vs _source.

References
#

Related