Monitoring capabilities
AWS provides:
- Built-in monitoring and health checks
- Native monitoring tools
- Log collection, publishing, and alarm capabilities
You are responsible for:
- Enabling features: profiler, Performance Insights (preview), auditing
- Monitoring/alarming on relevant events
- Custom metrics
- Log analysis
Instance level monitoring
Instance monitoring

BufferCacheHitRatio: The percentage of requests that are served by the buffer cache.
– When data is requested, AWS DocumentDB first checks if it is available in the buffer cache (in-memory).
– If the data is available in the cache, the request is served from there, which is much faster. This is considered a “cache hit.”
– If the data is not in the cache, it has to be fetched from disk, which is slower. This is a “cache miss.”
==> High BufferCacheHitRatio (close to 100%): This indicates that most read requests are being served from the cache, which results in better performance.
IndexBufferCacheHitRatio: The percentage of index requests that are served by the buffer cache. You might see a spike greather than 100% for the metric right after you drop an index, collection or database. This will automatically be corrected after 60 seconds. This limitation will be fixed in a future patch update.
DatabaseConnections: The number of connections open on an instance taken at a one-minute frequency.
DatabaseCursors: The number of cursors open on an instance taken at a one-minute frequency.
When you run a query in DocumentDB, the database engine generates a cursor that points to the beginning of the result set. You can then use this cursor to fetch the data incrementally. For example, when dealing with pagination, a cursor can return the first 100 documents, and then you can request the next 100 documents in subsequent batches.
FreeableMemory: The amount of available random access memory, in bytes.
CPUUtilization: The percentage of CPU used by an instance.
DatabaseConnections and DatabaseCursors
trong từng instance type sẽ có suggestion về số lượng Connections cho phép.

Best Practices: Instance Monitoring

Cluster level monitoring

DBClusterReplicaLagMaximum: The maximum amount of lag, in milliseconds, between the primary instance and each Amazon DocumentDB instance in the cluster.
– It shows the delay in time for the replica instances to catch up with the changes made to the primary instance. This is essentially how much time it takes for data written to the primary instance to be available on the replicas.
DatabaseCursorsTimedOut: The number of cursors that timed out in a one-minute period.
VolumeWriteIOPs: The average number of billed write I/O operations from a cluster volume
VolumeReadIOPs: The average number of billed read I/O operations from a cluster volume, reported at 5-minute intervals
Opcounters:
– OpcountersUpdate: Tracks update operations.
– OpcountersCommand: Tracks all commands issued.
– OpcountersDelete: Tracks delete operations.
– OpcountersGetmore: Tracks operations retrieving the next batch of documents.
– OpcountersInsert: Tracks insert operations.
– OpcountersQuery: Tracks query (read) operations.
Volume IOPS and IOPS

WriteIOPS (count/second)
ReadIOPS (count/second)
Best Practices: Cluster Monitoring:

Storage and backup monitoring
VolumeBytesUsed: The amount of storage used by your cluster in bytes. This value affects the cost of the cluster. For pricing information, see the Amazon DocumentDB product page
BackupRetentionPeriodStorageUsed: The total amount of backup storage in GiB used to support the point-in-time restore feature within the Amazon DocumentDB’s retention window. Included in the total reported by the TotalBackupStorageBilled
metric. Computed separately for each Amazon DocumentDB cluster.
SnapshotStorageUsed: The total amount of backup storage in GiB consumed by all snapshots for a given Amazon DocumentDB cluster outside its backup retention window. Included in the total reported by the TotalBackupStorageBilled
metric. Computed separately for each Amazon DocumentDB cluster.
TotalBackupStorageBilled: The total amount of backup storage in GiB for which you are billed for a given Amazon DocumentDB cluster. Includes the backup storage measured by the BackupRetentionPeriodStorageUsed
and SnapshotStorageUsed
metrics. Computed separately for each Amazon DocumentDB cluster.
(BackupRetentionPeriodStorageUsed + SnapshotStorageUsed) – VolumeBytesUsed (today)
