Automated rightsizing with CostGraph

CostGraph Operator is a Kubernetes operator that automates the process of rightsizing your Kubernetes workloads.
We do not enable this feature across all your workloads by default. You must specify this on a workload-by-workload basis.
To enable the operator make changes to your workload, you need to add the following annotation to your workload manifest:
annotations:
  costgraph.baselinehq.cloud/enable-rightsizing: true

Patching Strategy

For now, we do not support backfilling the recommendations from the cluster to other sources. We may support this in the future.
By default, the operator will patch the owner resource (Deployment etc) of the workload with the new resource requirements. However, if you run on Kubernetes 1.33+, we support in-place pod updates as well as updating the owner resource. The operator will only update resources with a 10% drift from the current resource requirements, the recommendation is also compared against the last applied update through a consistent hash.

Annotations

Annotation NameKeyDescriptionExample Value
AnnotationEnablecostgraph.baselinehq.cloud/enable-rightsizingEnables rightsizing for the resource”true”
AnnotationPlancostgraph.baselinehq.cloud/rightsizing-planRightsizing plan identifier”p99”
AnnotationLastHashcostgraph.baselinehq.cloud/rightsizing-hashHash of the last applied rightsizing plan (readonly)“abc123”
AnnotationLastUpdatecostgraph.baselinehq.cloud/rightsizing-last-updateTimestamp of the last rightsizing update (readonly)“2024-06-01T12:00:00Z”
AnnotationCPURequestRangecostgraph.baselinehq.cloud/rightsizing-cpu-request-rangeAllowed CPU request range (cores)“1,5”
AnnotationCPULimitRangecostgraph.baselinehq.cloud/rightsizing-cpu-limit-rangeAllowed CPU limit range (cores)“2,10”
AnnotationMemoryRequestRangecostgraph.baselinehq.cloud/rightsizing-memory-request-rangeAllowed memory request range (GiB)“0.1,1”
AnnotationMemoryLimitRangecostgraph.baselinehq.cloud/rightsizing-memory-limit-rangeAllowed memory limit range (GiB)“0.5,2”

Ranges for recommendation values

Whilst this is a feature, we advise you to audit recommendations before specifying max values for memory. Unlike CPU that is compressible, memory is not and capped updates can affect the stability of your workload.
As part of the rightsizing process, customers can specify an override range for the recommendations to allow for a minimum and maximum value. This is configured using the Annotation(CPU|Memory)(Request|Limit)Range fields and is specified as a comma-separated list of values in the format min,max with floats on absolute values for the resource. For example, to allow a CPU request minimum of 100m cores and a maximum of 200m cores, you would specify costgraph.baselinehq.cloud/rightsizing-cpu-request-range : "0.1,0.2".

Allowed fields for the plan annotation

Our operator takes a non-conventional approach to recommendations by performing the audits on a node-by-node basis. This allows you to define control of the recommendations and the auditing process. As part of the rightsizing process, you can specify a plan which clarifies how the range of values obtained on a per-node audit can be used to determine the right size of the workload. We allow percentile and arithmetic aggregates to perform this operation:
  • mean, avg
  • p50, median
  • p90
  • p95
  • p100, max
  • p99
By default, the operator will use an aggregate p99 of container usage across nodes to determine the right size of the workload. The operator by default uses the same values as the recommended source in the usage data. Therefore, if you want a wider or smaller gap from usage to recommendation, tune the expected_utilisation_percent field in the configuration.
expected_utilisation_percent:
  cpu: 50
  memory: 50