OpenAI Service – Consider Using Preferred SKUs

OpenAI Service – Consider Using Preferred SKUs

Ensure that OpenAI deployment SKUs meet your organization’s specific requirements. These can be based on your organization’s data processing location compliance or usage (e.g., Standard for variable workloads, ProvisionedManaged for high volume).

When deploying OpenAI services in Azure, selecting the appropriate SKU (Stock Keeping Unit) is a critical decision that impacts cost efficiency, performance, and compliance. Different SKUs offer varying levels of computational resources, pricing models, and geographical availability. Making informed choices about these deployments can lead to significant cost savings while maintaining the performance levels your applications require.

Azure OpenAI Service offers multiple deployment options, each designed for specific use cases:

  • Standard SKUs: Pay-per-token pricing model ideal for variable workloads

  • ProvisionedManaged SKUs: Fixed capacity with predictable pricing for high-volume scenarios

  • Regional SKUs: Variations based on geographic data processing requirements

  • Organizations that don’t standardize their OpenAI SKU selection often experience unnecessary cost overruns, performance issues, and potential compliance violations.

Cost Impact Assessment

Selecting non-optimal SKUs can lead to substantial unnecessary expenditures. Here’s how the wrong choices impact your cloud budget:

  • Overprovisioning: Using ProvisionedManaged SKUs for variable or low-volume workloads results in paying for unused capacity

  • Regional price variations: Costs can vary up to 15-20% between regions

  • Newer model versions: Often more cost-effective than older generations for the same capabilities

Potential Savings

Consider these real-world examples of cost optimization through proper SKU selection:

  • Example 1: Workload-Appropriate SKU Selection

  • Organization using ProvisionedManaged SKU ($10/hour) for sporadic workloads

  • Monthly cost: $7,200 (24×7 availability)

  • After switching to Standard SKU (pay-per-token): $1,800/month

  • Monthly savings: $5,400 (75% reduction)

Example 2: Regional Optimization

  • 10 million tokens processed daily in higher-cost region: $8,000/month

  • Same workload in optimized region: $6,800/month

  • Monthly savings: $1,200 (15% reduction)

  • Example 3: Multiple Small Deployments Consolidation

  • Five separate small ProvisionedManaged deployments: $3,600/month each ($18,000 total)

  • Consolidated to two optimized deployments: $7,200/month

  • Monthly savings: $10,800 (60% reduction)

Implementation Guide

  • Infrastructure-as-Code Implementation (Terraform Example)

  • When defining OpenAI deployments in Terraform, ensure you’re selecting the appropriate SKU based on your usage patterns and compliance requirements.

Non-Compliant Example:

resource "azurerm_openai_account" "example" {

name                = "example-openai"

  resource_group_name = azurerm_resource_group.example.name

location            = "West US"

sku_name            = "S0"

}

resource "azurerm_openai_deployment" "example" {

name                = "example-deployment"

  openai_account_id   = azurerm_openai_account.example.id

  model {

format  = "OpenAI"

name    = "gpt-4"

version = "0613"

  }

  scale {

type     = "Standard"

capacity = 120

  }

}

Compliant Example:

resource "azurerm_openai_account" "example" {

name                = "example-openai"

  • resource_group_name = azurerm_resource_group.example.name

  • location            = "East US"  # Choose region based on compliance and cost

sku_name            = "S0"

}

resource "azurerm_openai_deployment" "example" {

name                = "example-deployment"

  openai_account_id   = azurerm_openai_account.example.id

  model {

format  = "OpenAI"

name    = "gpt-4"

    version = "1106-preview"  # Use newer versions when appropriate

  }

  scale {

    type     = "ProvisionedManaged"  # Only use for consistent high-volume workloads

    capacity = 60  # Right-sized based on actual usage patterns

  }

}

Step-by-Step Implementation

Audit existing deployments: Use Infracost to scan your infrastructure code and identify non-compliant OpenAI SKUs. Infracost includes this policy check, enabling you to quickly identify optimization opportunities.

Analyze usage patterns:

Review token consumption and API call patterns over 30-60 days

Identify peak usage and baseline requirements

Determine if usage is predictable or variable

Define SKU selection criteria:

For variable or unpredictable workloads: Use Standard SKUs

For high-volume, consistent workloads: Consider ProvisionedManaged SKUs

For regulated workloads: Ensure regional selection meets compliance requirements

Implement SKU standards in IaC:

Update Terraform/ARM/Bicep templates with standardized SKU configurations

Implement automated validation using Infracost to prevent deployment of non-compliant SKUs

Document exceptions with appropriate justification

Monitor and optimize:

Regularly review usage metrics to ensure SKU selections remain appropriate

Adjust capacity or SKU type as usage patterns evolve

Best Practices

Create a SKU selection framework based on:

Monthly token volume

Request pattern predictability

Budget constraints

Compliance requirements

Performance needs

Implement guardrails:

  • Use Infracost policies to prevent deployment of non-preferred SKUs

  • Create approval workflows for exceptions

  • Document justifications for non-standard selections

Establish regular review cycles:

Quarterly assessment of SKU appropriateness

Alignment with model version updates from OpenAI

Cost vs. performance optimization

Centralize model deployment management:

  • Use shared services approach where possible

  • Consolidate deployments to reduce overhead

Standardize deployment patterns

Example Scenarios

Example 1: Enterprise AI Development Platform

Before Policy Implementation:

Multiple teams deploying individual OpenAI instances

Mix of SKUs across regions with no standardization

Inconsistent versioning and unnecessary duplications

Monthly spend: $42,000

After Policy Implementation:

  • Standardized deployments based on workload type

  • Consolidated to three regional deployments

  • Optimized SKU selection based on usage patterns

  • Monthly spend: $23,000 (45% reduction)

  • Example 2: AI-Powered Customer Service System

Before Policy Implementation:

ProvisionedManaged SKU deployed for 24/7 availability

Actual usage concentrated in business hours

70% of capacity unused during nights and weekends

Monthly spend: $21,600

After Policy Implementation:

  • Switched to Standard SKU with pay-per-token model

  • Maintained smaller ProvisionedManaged instance for baseline operations

  • Implemented auto-scaling for peak periods

  • Monthly spend: $8,900 (59% reduction)

  • Example 3: Regulatory Compliance Scenario

Before Policy Implementation:

All AI workloads deployed in US regions by default

EU data processing requirements not consistently met

Risk of non-compliance with GDPR

Unnecessary data transfer costs

After Policy Implementation:

Region-specific deployment strategy

  • EU data processed in EU regions

  • Reduced latency for regional users

Eliminated compliance risks

Reduced data transfer costs by 22%

Considerations and Caveats

  • When This Policy May Not Apply

  • Prototype or POC environments: During initial testing phases, standard deployments may be acceptable for short durations

  • Specialized model requirements: Some specific models may only be available in certain regions or SKUs

  • Integration constraints: Some legacy systems may have dependencies requiring specific deployment configurations

Implementation Challenges

  • Usage forecasting complexity: Accurately predicting token consumption patterns can be difficult, especially for new applications

  • Model version transitions: Changing model versions may require recalibration of capacity requirements

  • Regional availability limitations: Not all models are available in all regions, potentially forcing trade-offs between locality and model capability

Performance Considerations

  • Cold start impacts: Standard SKUs may experience latency during periods of inactivity

  • Quota limitations: Be aware of subscription and regional quota constraints when planning deployments

  • Burst capacity requirements: Some workloads may have extreme peak demands that justify oversizing

Monitoring and Maintenance

To ensure ongoing optimization:

Implement usage dashboards tracking:

Token consumption by deployment

  • Request patterns and peak usage

  • Cost per model version and deployment

Set up alerting for:

Sustained high utilization (>80%)

  • Extended periods of low utilization (<20%)

  • Cost anomalies or sudden changes in usage patterns

Regular optimization reviews:

Quarterly assessment of SKU appropriateness

Adjustment based on changing usage patterns

Evaluation of new SKU options as they become available

Infracost’s policy scanning capabilities can help you continuously monitor your infrastructure code for compliance with this policy, identifying opportunities for optimization even as your deployment grows and evolves. The free trial allows you to scan your existing codebase and identify potential savings opportunities.

Create Free Account

This policy is supported in Infracost and available in the free trial. Sign up today and scan your code using our entire library of FinOps policies.

Get started
with Infracost

© 2026 Infracost Inc

Manage cookies

Get started
with Infracost

© 2026 Infracost Inc

Manage cookies

Get started
with Infracost

© 2026 Infracost Inc

Manage cookies