In Elasticsearch, when we index documents to an index, by default the source of the document is stored in meta field _source
.
When you search your index, you see a special field _source
for each matched/hit product.
source and stored_fields#
This is the default behavior, if you want to disable the storing of _source
and only store a few fields, this is also possible1.
You can disable the _source
field like this:
PUT movies
{
"mappings": {
"_source": {
"enabled": false
},
"properties": {
"name": {
"type": "text",
"store": true
},
"plot": {
"type": "text",
"store": false
}
}
}
}
In the above request to create the index setting, we disabled the _source
and enabled the storage of name
field with store
mapping.
Then we can try to index a document and search this index
POST movies/_doc/1
{
"name": "name1",
"plot": "exciting plot hello"
}
GET movies/_search
{
"query": {
"match": {
"name": "name1"
}
}
}
Notice that there is no _source
field for each hit. Even if you add "_source": true
to the request, it won’t work.
There is a parameter stored_fields
in the search api, where you can specify the stored fields you want to check.
GET movies/_search
{
"stored_fields": ["name", "plot"],
"query": {
"match": {
"name": "name1"
}
}
}
In the above search request, we explicitly specify the fields we want to check.
However, only name
is a stored field. In the result for each hit, you only see the info for field name
, not plot
.
source filtering and field selection#
You can get the value of a field from both the _source
and through the fields
parameter.
However, in the _source
, you get raw, untransformed value.
If you specify a field in the fields
parameter, you get mapped/transformed result.
PUT my_index/
{
"mappings": {
"runtime": {
"calculated_count": {
"type": "long",
"script": {
"source": "emit(doc['count'].value + 1)"
}
}
},
"properties": {
"created": {
"type": "date"
}
}
}
}
POST my_index/_doc/1
{
"count": 100,
"name": "hello",
"created": "2024-05-06"
}
GET my_index/_search
{
"fields": [
"created", "calculated_count"
],
"_source": true
}
In the above request, we set the created
field to date
type.
If you check the search request output, you will find that the date
value is different under fields
and _source
,
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"count": 100,
"name": "hello",
"created": "2024-05-06"
},
"fields": {
"created": [
"2024-05-06T00:00:00.000Z"
],
"calculated_count": [
101
]
}
}
]
}
}
The fields
parameter can also include runtime fields, such as calculated_count
above, which is not possible with _source
.
See doc here for more details discussion on fields
vs _source
.
References#
- stored fields vs _source: https://stackoverflow.com/q/28678296/6064933
This may have undesired effects, you should make sure you understand this: https://www.elastic.co/docs/reference/elasticsearch/mapping-reference/mapping-source-field#disable-source-field ↩︎