Envoy Log Cost Reduction
The video commentary section is a note section for videos I enjoyed watching.
Like many posts on this blog, it's my thoughts at the time - Potentially even unedited before the video completes.
- Double logging - Same or similar data being logged multiple times. No brainer to cut these.
- Filter details - What was the result of a filter? This doesn’t sound like a valuable bit of data to log in every request. Could the data be sampled or aggregated? Sampling ended up being the solution.
- Double cost
- Emitting the logs was having a significant cost factor (presumably resource wise - CPU & egress?)
- Storage of the logs was the second cost factor
- Expression Convenience - The CEL extraction method was convenient, but had a 17% cost of CPU for the real workload, versus a limited amount of fields with direct extract which used 3.8% of a core.
- Logging filters - about a 1:600 improvement in CPU usage to switch to logging filters.
Recommendations
Tier 1 - Essential (Always on)
- 100% sampling
- fields:
statusdurationbytespathmethod
- Use cases: Compliance, basic debugging
Tier 2 - Debug (Errors only)
- 1-5% sampling
- fields:
- (Tier 1 fields)
response_flagsupstream_servicerequest_id
- Use cases: full error context
Tier 3 - Deep diagnostics
- 10% sampling
- fields:
- (Tier 1 fields)
- (Tier 2 fields)
metadatatimings
- Use cases: Deep diagnostics
- Alternative - Tracing?