Skip to main content
  1. Posts/

Select fields in Elasticsearch: _source, fields and stored_fields

·490 words·3 mins·
Database Elasticsearch
Table of Contents

In Elasticsearch, when we index documents to an index, by default the source of the document is stored in meta field _source. When you search your index, you see a special field _source for each matched/hit product.

source and stored_fields
#

This is the default behavior, if you want to disable the storing of _source and only store a few fields, this is also possible1. You can disable the _source field like this:

PUT movies
{
  "mappings": {
    "_source": {
      "enabled": false
    },
    "properties": {
      "name": {
        "type": "text",
        "store": true
      },
      "plot": {
        "type": "text",
        "store": false
      }
    }
  }
}

In the above request to create the index setting, we disabled the _source and enabled the storage of name field with store mapping. Then we can try to index a document and search this index

POST movies/_doc/1
{
  "name": "name1",
  "plot": "exciting plot hello"
}


GET movies/_search
{
  "query": {
    "match": {
      "name": "name1"
    }
  }
}

Notice that there is no _source field for each hit. Even if you add "_source": true to the request, it won’t work.

There is a parameter stored_fields in the search api, where you can specify the stored fields you want to check.

GET movies/_search
{
  "stored_fields": ["name", "plot"],
  "query": {
    "match": {
      "name": "name1"
    }
  }
}

In the above search request, we explicitly specify the fields we want to check. However, only name is a stored field. In the result for each hit, you only see the info for field name, not plot.

source filtering and field selection
#

You can get the value of a field from both the _source and through the fields parameter. However, in the _source, you get raw, untransformed value. If you specify a field in the fields parameter, you get mapped/transformed result.

PUT my_index/
{
  "mappings": {
    "runtime": {
      "calculated_count": {
        "type": "long",
        "script": {
          "source": "emit(doc['count'].value + 1)"
        }
      }
    },
    "properties": {
      "created": {
        "type": "date"
      }
    }
  }
}

POST my_index/_doc/1
{
  "count": 100,
  "name": "hello",
  "created": "2024-05-06"
}

GET my_index/_search
{
  "fields": [
    "created", "calculated_count"
  ],
  "_source": true
}

In the above request, we set the created field to date type. If you check the search request output, you will find that the date value is different under fields and _source,

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "count": 100,
          "name": "hello",
          "created": "2024-05-06"
        },
        "fields": {
          "created": [
            "2024-05-06T00:00:00.000Z"
          ],
          "calculated_count": [
            101
          ]
        }
      }
    ]
  }
}

The fields parameter can also include runtime fields, such as calculated_count above, which is not possible with _source. See doc here for more details discussion on fields vs _source.

References
#

Related

Prevent Accidental Index Delete in Elasticsearch
·213 words·1 min
Database Elasticsearch
How to Use the Elasticsearch task API
··329 words·2 mins
Database Elasticsearch
Speed up document indexing in Elasticsearch via bulk indexing
··355 words·2 mins
Python Database Elasticsearch