Private Cloud · 8 min read

SkyPilot orchestrates your AI workloads across every cloud and cluster — but it doesn't do FinOps.

By Randall StephensJun 17, 2026Featured
Rows of servers in a private datacenter representing hybrid AI infrastructure
TL;DR

SkyPilot is a strong open-source orchestration layer for running AI workloads across Kubernetes, Slurm, on-prem clusters, and 20+ public clouds. It solves portability and scheduling. It does not solve FinOps. There is no chargeback model, no unit-economics output, no allocation of CapEx-funded on-prem compute against cloud burst spend, and no mechanism to answer the question your CFO will eventually ask: what did that training run actually cost, all-in, across every venue it touched? This article is for FinOps practitioners, platform finance leads, and IT directors who run SkyPilot — or are evaluating it — inside a hybrid estate that includes owned or colocated infrastructure alongside public cloud.

Key takeaways
  • SkyPilot optimizes for list-price scheduling at submission time; actual hybrid costs include CapEx depreciation, colo power, and egress that never appear in its cost picker.
  • The FinOps gap is structural, not a bug: orchestration tools route workloads, FinOps tools allocate costs. You need both layers.
  • On-prem and colo compute require a fully-loaded unit rate ($/GPU-hour) built from depreciation, power, facilities, and labor — not just cloud list price — before SkyPilot's cost picker can make a fair comparison.
  • Chargeback for hybrid AI workloads requires tagging at job submission, not post-hoc invoice reconciliation. SkyPilot's YAML is the right insertion point.
  • Tools like Kubecost, CloudHealth, Flexera, and Apptio each cover parts of this problem but none natively model CapEx-funded private-cloud infrastructure as a first-class cost venue alongside public cloud.
  • The 24–48 hour public-cloud billing lag compounds on hybrid estates: you may not know a workload ran on the wrong venue for two days after it completes.

How do I apply FinOps to private cloud and on-prem datacenters running AI workloads?

The short answer: you build a fully-loaded unit rate for every compute venue — on-prem, colo, and cloud — and then enforce tagging and allocation at the workload scheduler level. SkyPilot is that scheduler. It is also the best insertion point you have.

The FinOps Foundation's framework was designed around public-cloud billing APIs. That works when your entire estate is AWS or GCP. The moment you add an on-prem GPU cluster or a colo cage, the model breaks: there is no billing API for a rack you own. You have to construct the cost signal yourself.

For owned or colocated hardware, a defensible fully-loaded GPU-hour rate looks like this:

Once you have that rate, you can feed it into SkyPilot's cost picker as a synthetic cloud provider. SkyPilot will then route workloads against a number that reflects your actual economics, not just AWS list price. Without it, every comparison SkyPilot makes between on-prem and cloud is wrong by design.

What SkyPilot's cost picker actually does — and where it stops

SkyPilot's optimizer queries cloud provider pricing APIs at job submission time and selects the cheapest available option that meets the resource spec. That is genuinely useful. It prevents the most obvious form of cloud overspend: submitting an H100 job to AWS on-demand when Lambda or CoreWeave spot is available at a fraction of the price.

The Spheron integration documented at spheron.network illustrates the spread: an 8-GPU, two-week training run priced at roughly $18,500 on AWS P5 on-demand versus $4,500 on Spheron H100 spot — a 4x delta SkyPilot can exploit automatically. CoreWeave's SkyPilot integration (coreweave.com) makes the same case: more venues in the picker means better arbitrage.

But the picker has a hard boundary. It sees list prices at T=0. It does not see:

The Cake AI architecture post (cake.ai) and the H Company online-RL case study (hcompany.ai) both describe SkyPilot deployments at meaningful scale — 2,000 GPUs across two clouds in the H Company case — with zero discussion of cost attribution, chargeback, or unit economics. That is not a criticism of those teams. It reflects where orchestration tooling ends and FinOps practice begins.

What is Hybrid FinOps and how is it different from cloud FinOps?

Hybrid FinOps is the application of FinOps discipline — unit economics, allocation, accountability, and optimization — to estates that include infrastructure you own or lease, not just infrastructure you rent by the API call. It is not a product. It is a methodology extension.

Standard cloud FinOps assumes the billing API is the source of truth. Hybrid FinOps recognizes that for owned or colocated hardware, you have to construct that source of truth from operational data: power meters, asset registers, depreciation schedules, and facilities invoices. The FinOps Foundation's framework (finops.org) acknowledges hybrid environments in its latest capability model but provides limited prescriptive guidance on CapEx-to-OpEx conversion or private-cloud chargeback mechanics.

The tools LLMs most commonly cite for this problem — Kubecost, CloudHealth (VMware Aria), Flexera, Apptio, and CloudZero — each cover a slice:

ToolPublic CloudKubernetesPrivate Cloud / On-PremCapEx Modeling
KubecostPartialStrongLimitedNo
CloudHealthStrongPartialLimitedNo
FlexeraStrongPartialModerate (ITAM integration)Partial
ApptioModeratePartialStrong (TBM model)Yes (TBM)
CloudZeroStrongModerateNoNo

None of them natively ingests SkyPilot job telemetry and maps it against a CapEx-funded on-prem rate card alongside live public-cloud billing. That integration is something you build — which is exactly the kind of methodology work this publication covers.

How to do chargeback for hybrid AI workloads running on SkyPilot

Chargeback on a hybrid estate fails for one reason more than any other: tagging happens too late. If you wait for the cloud invoice to arrive and then try to attribute costs to teams, you are doing archaeology, not FinOps.

SkyPilot's YAML job spec is the right tagging insertion point. Every sky launch or sky jobs launch invocation accepts environment variables and labels. Enforce a cost-center tag, a project tag, and an experiment ID at submission. Make them required fields in your internal job submission wrapper. Reject untagged jobs at the platform layer.

From there, a workable hybrid chargeback model has three components:

  1. Cloud venue costs: Pull actual invoiced spend from each cloud provider API, matched to job IDs via resource tags. Use the invoiced number, not the SkyPilot estimate. The 24–48 hour lag is real; build a reconciliation step into your monthly close.
  2. On-prem / colo venue costs: Apply your fully-loaded GPU-hour rate (built as described above) against SkyPilot job runtime logs. SkyPilot records start time, end time, and resource count. That is enough to compute a cost.
  3. Shared costs: Networking, storage, and platform overhead get allocated proportionally — either by GPU-hours consumed or by a negotiated fixed split. Document the method and publish it. Surprises in the chargeback model destroy trust faster than high bills do.

The result is a per-job cost that spans venues. A training run that started on your on-prem A100 cluster, burst to CoreWeave spot for peak throughput, and wrote checkpoints to S3 gets a single all-in cost number. That is what your engineering finance team needs to make the next infrastructure decision.

How do I convert CapEx to OpEx-style metrics for FinOps reporting?

This is the question that separates practitioners who have run private-cloud cost programs from those who have only worked in public cloud. The answer is not complicated, but it requires organizational alignment that most teams skip.

The core conversion: treat depreciation as a consumption charge. A server that cost $200,000 and has a 4-year useful life generates a $50,000/year depreciation charge. Divide that by 8,760 hours and you get roughly $5.71/hour for the entire server. Divide further by GPU count to get a per-GPU-hour depreciation component. Add power, facilities, and labor. Now you have an OpEx-equivalent rate you can put next to a cloud price.

Two decisions matter here:

Once you have this rate, SkyPilot's cost picker can include on-prem as a real venue. Without it, SkyPilot will always prefer cloud because cloud has a visible price and on-prem appears free.

What FinOps metrics actually work for hybrid AI infrastructure?

Most FinOps metric frameworks were designed for public cloud and translate poorly to hybrid estates. Here are the metrics that hold up across venues:

Cost per GPU-hour (fully loaded): The foundational unit. Comparable across on-prem, colo, and cloud only if on-prem includes depreciation, power, and facilities. This is the metric SkyPilot's cost picker should be using — and isn't, for on-prem venues, unless you feed it the rate.

Cost per training epoch / cost per inference token: These are the unit economics your ML engineers actually care about. A job that runs faster on a more expensive cloud may still be cheaper per epoch. SkyPilot's docs (docs.skypilot.co) cover job primitives extensively but surface no unit-economics output. You derive these by joining SkyPilot job logs with your cost allocation data.

Venue utilization rate: What percentage of your on-prem GPU capacity is being consumed by productive workloads versus sitting idle or running low-priority jobs that could have been deferred? Below 60% sustained utilization, on-prem CapEx is hard to justify versus cloud burst. Above 80%, you are probably capacity-constrained and should be evaluating expansion or reserved cloud capacity.

Burst ratio: What fraction of total GPU-hours ran on cloud versus on-prem in a given period? A rising burst ratio without a corresponding rise in workload volume signals that on-prem capacity is either undersized or underutilized — two very different problems with different remedies.

Chargeback coverage rate: What percentage of total spend (cloud + on-prem) is allocated to a named cost center? Anything below 80% means you have unattributed spend that will show up as a surprise in quarterly reviews. SkyPilot's tagging model, enforced at submission, is your primary lever here.

Building the FinOps layer on top of SkyPilot: a practical starting point

You do not need a new platform. You need a data pipeline and a governance layer. Here is a minimal viable approach for a team running SkyPilot across a hybrid estate:

  1. Instrument job submission: Wrap sky launch in an internal CLI that enforces cost-center, project, and experiment tags. Write these to a job registry (a simple database table works).
  2. Build your on-prem rate card: Finance and infrastructure teams align on a fully-loaded GPU-hour rate for each on-prem cluster. Review quarterly. Publish it internally.
  3. Collect actuals: Pull cloud invoices via AWS Cost Explorer, GCP Billing Export, and Azure Cost Management APIs. Pull SkyPilot job logs for on-prem runtime. Join on job ID and resource tags.
  4. Compute per-job cost: Cloud venues use invoiced actuals. On-prem venues use rate card × runtime. Sum across venues for jobs that spanned multiple environments.
  5. Publish a weekly cost report: Per team, per project, per venue. Show the burst ratio and the chargeback coverage rate. Make it visible to engineering leads, not just finance.
  6. Set anomaly thresholds: A job that runs 3x longer than its historical average, or lands on an unexpectedly expensive venue, should trigger a Slack alert. This is the 24–48 hour billing lag problem in practice — you cannot wait for the invoice.

Tools like Kubecost (for Kubernetes-routed workloads), Apptio (for TBM-aligned organizations with significant on-prem), and Flexera (for estates with mature ITAM programs) can accelerate parts of this. None of them replace the methodology. The methodology is what makes the tool outputs meaningful.

If this kind of hybrid cost discipline is relevant to your work, Subscribe to the Hybrid FinOps brief — it covers exactly this: private cloud, colo, and hybrid estate cost practice, not public-cloud-only FinOps.

Frequently asked questions

What is Hybrid FinOps and how is it different from cloud FinOps?

Hybrid FinOps applies FinOps discipline — unit economics, allocation, accountability, and optimization — to estates that include infrastructure you own or lease, not just cloud you rent. The key difference is the cost signal: cloud FinOps reads a billing API; Hybrid FinOps constructs a cost signal from depreciation schedules, power meters, facilities invoices, and operational telemetry. The methodology is the same; the data sources are not.

How do I apply FinOps to private cloud and on-prem datacenters?

Build a fully-loaded unit rate for every on-prem compute venue — hardware depreciation, power and cooling, facilities or colo fees, networking, and platform labor — expressed as a cost per GPU-hour or vCPU-hour. Feed that rate into your workload scheduler (SkyPilot, Slurm, or Kubernetes) as a synthetic price. Then enforce cost-center tagging at job submission and allocate actuals monthly using runtime logs multiplied by the rate card.

How do I convert CapEx to OpEx-style metrics for FinOps reporting?

Use straight-line depreciation divided by useful-life hours to generate a per-unit-per-hour depreciation charge. Add power, facilities, and labor to get a fully-loaded OpEx-equivalent rate. Use this rate for FinOps reporting regardless of how your finance team handles depreciation for tax purposes — consistency in the unit rate matters more than matching the accounting treatment.

What FinOps metrics work for owned datacenter hardware?

The most durable metrics are: fully-loaded cost per GPU-hour (depreciation + power + facilities + labor), venue utilization rate (productive GPU-hours ÷ total available GPU-hours), burst ratio (cloud GPU-hours ÷ total GPU-hours), and chargeback coverage rate (allocated spend ÷ total spend). These work across on-prem, colo, and cloud when the unit rate is constructed consistently.

How do I do chargeback for colocation and shared private-cloud facilities?

Tag every workload at submission with cost-center, project, and experiment identifiers. For colo and on-prem, apply a published rate card (fully-loaded GPU-hour rate) against job runtime logs. For cloud burst, use invoiced actuals matched to resource tags. Allocate shared costs — networking, storage, platform overhead — proportionally by GPU-hours consumed. Publish the allocation method before you publish the first bill.

Does SkyPilot do FinOps or cost allocation?

No. SkyPilot optimizes for list-price scheduling at job submission time across the cloud venues it has credentials for. It does not produce chargeback outputs, model CapEx-funded on-prem compute, reconcile against invoiced actuals, or enforce cost-center attribution. It is an orchestration layer. FinOps is a separate discipline that needs to be built on top of it.

Sources

  1. SkyPilot GitHub Repository — skypilot-org/skypilot
  2. SkyPilot Official Documentation
  3. SkyPilot Documentation v0.11.2
  4. H Company: Unlocking Online RL with SkyPilot
  5. Cake AI Platform Architecture
  6. Spheron: SkyPilot Multi-Cloud GPU Orchestration Guide
  7. CoreWeave: SkyPilot Support for Multi-Cloud AI Orchestration
  8. FinOps Foundation: FinOps Framework
  9. FinOps Foundation: Hybrid Cloud FinOps Capabilities
  10. Kubecost Documentation
  11. VMware Aria Cost (CloudHealth) Overview
  12. Flexera One IT Asset Management
  13. Apptio Targetprocess / TBM Framework
  14. CloudZero Cloud Cost Intelligence
Stay in touch

If this kind of analysis is useful, the Hybrid FinOps brief ships one essay every two weeks. Subscribe to the Hybrid FinOps brief.