OpenSearch: Do some stats on your indices

Dejanu Alex
3 min readApr 8, 2024

For a smooth introduction to OpenSearch concepts like shards, indices, and more, I recommend starting with the OpenSearch for Humans article.

I’m going to use Dev Tools in order to access the OpenSearch API. First, get cluster statistics, to gain insights for the number of indices, shards, and docs.

GET _cluster/stats
cluster stats

A document refers to a JSON object that is stored in an index. Documents are the basic unit of information that can be indexed and searched within an OpenSearch. An index is partitioned across shards (of type primary and replica).

When a document is indexed into an index, OpenSearch determines which shard the document should be routed to and the index configuration (such as the index’s sharding strategy).

To get an idea about indices, we can use the indices info endpoint and sort them based on the creation date or size:

# indices sorted by size or date
GET _cat/indices?v&s=store.size:desc
GET _cat/indices?v&s=creation.date:desc

# or for a specific index pattern
GET _cat/indices/<index>?v=true&s=creation.date:desc

We can drill down to a specific index, for stats like the number of documents docs.count or size of the index store.size

GET _cat/indices/<index>?v=true

It’s important to understand that an OpenSearch index is composed of shards…as shards play a significant role in its structure.

this action would add [x] total shards, but this cluster currently has [x]/[x] maximum shards open;

When adding or searching data within an index, that index is in an open state, the longer you keep the indices open the more shards you use.

  • Each node can accommodate a number of shards, to check how many shards a node can accommodate look at ”max_shards_per_node” setting: GET /_cluster/settings?include_defaults=true .
  • Depending on the number of data nodes, you’re going to have a max number of shards for your cluster, to check active_shards value (which represents the total number of active shards, including primary and replica shards): GET _cluster/health .
Active shards

Potential solutions to reduce the no of shards:

  • Reduce the no of replicas (a redundant copy of the data) for the index:
PUT /<index>/_settings
{
"index" : {
"number_of_replicas" : 0
}
}

# check index setting "number_of_replicas" afterward
GET <index>/_settings
  • Close a specific index (once an index is closed, you cannot add data to it or search for any data within the index): POST /<index>/_close .
  • Last but not least if possible add another data node to the cluster.

Closing notes

Use the aforementioned API endpoints to get indices sizes, no of documents, and shards sizes. The number of primary shards is configured when an index is created, and cannot be adjusted afterward, as opposed to the number of replicas.

Shards sizes are very important because they impact both search latency and write performance, too many small shards will exhaust the memory (JVM Heap) too few large shards prevent OpenSearch from properly distributing requests, a good rule of thumb is to keep shard size between 10–50 GB.

Photo by Marcin Jozwiak on Unsplash

--

--

Dejanu Alex

Seasoned DevOps engineer — Jack of all trades master of None