#sre #monitoring
# Latency:
![[latency.png]]
- The time it takes to process a request.
- Both successful and unsuccessful requests
- Measured as the average time or median or 95th percentile (this is the value that does not exceed 95% of all other values)
# Traffic:
![[traffic.png]]
- The amount of work performed by the system.
- Can be measured in requests per second, page visits per second, or other metrics
- For a database, it can be the number of queries per second or the number of records per second TPS.
- For the network, it can be the number of bytes per second or packets per second.
- It helps to understand how much the system is loaded
# Errors:
![[errors.png]]
- The number of requests that failed.
- These can be HTTP errors, database errors, network errors, or application errors.
- It helps to understand what problems occur in the system
# Saturation:
![[saturation.png]]
- How close the system is to its resource limits
- This can be the number of simultaneous requests, CPU usage, memory usage, and disk space usage.
- It helps to understand how powerful the system needs to be
How to apply SRE Golden Signals:
- Measure these metrics in real-time
- Set thresholds for each metric
- Set alerts for each metric.
- Set up monitoring for each metric.
- Analyze these metrics to identify problems.
- Solve problems that arise