In Elasticsearch, when we index documents to an index, by default the source of the document is stored in meta field _source.
When you search your index, you see a special field _source for each matched/hit product.
source and stored_fields#
This is the default behavior, if you want to disable the storing of _source and only store a few fields, this is also possible1.
You can disable the _source field like this:
PUT movies
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"name": {
"type": "text",
"store": true
},
"plot": {
"type": "text",
"store": false
}
}
}
}
In the above request to create the index setting, we disabled the _source and enabled the storage of name field with store mapping.
Then we can try to index a document and search this index
POST movies/_doc/1
{
"name": "name1",
"plot": "exciting plot hello"
}
GET movies/_search
{
"query": {
"match": {
"name": "name1"
}
}
}
Notice that there is no _source field for each hit. Even if you add "_source": true to the request, it won’t work.
There is a parameter stored_fields in the search api, where you can specify the stored fields you want to check.
GET movies/_search
{
"stored_fields": ["name", "plot"],
"query": {
"match": {
"name": "name1"
}
}
}
In the above search request, we explicitly specify the fields we want to check.
However, only name is a stored field. In the result for each hit, you only see the info for field name, not plot.
source filtering and field selection#
You can get the value of a field from both the _source and through the fields parameter.
However, in the _source, you get raw, untransformed value.
If you specify a field in the fields parameter, you get mapped/transformed result.
PUT my_index/
{
"mappings": {
"runtime": {
"calculated_count": {
"type": "long",
"script": {
"source": "emit(doc['count'].value + 1)"
}
}
},
"properties": {
"created": {
"type": "date"
}
}
}
}
POST my_index/_doc/1
{
"count": 100,
"name": "hello",
"created": "2024-05-06"
}
GET my_index/_search
{
"fields": [
"created", "calculated_count"
],
"_source": true
}
In the above request, we set the created field to date type.
If you check the search request output, you will find that the date value is different under fields and _source,
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"count": 100,
"name": "hello",
"created": "2024-05-06"
},
"fields": {
"created": [
"2024-05-06T00:00:00.000Z"
],
"calculated_count": [
101
]
}
}
]
}
}
The fields parameter can also include runtime fields, such as calculated_count above, which is not possible with _source.
See doc here for more details discussion on fields vs _source.
References#
- stored fields vs _source: https://stackoverflow.com/q/28678296/6064933
This may have undesired effects, you should make sure you understand this: https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#disable-source-field ↩︎