OpenAI Service – Consider Using Preferred SKUs

Mar 10, 2025

OpenAI Service – Consider Using Preferred SKUs

Mar 10, 2025

Ensure that OpenAI deployment SKUs meet your organization’s specific requirements. These can be based on your organization’s data processing location compliance or usage (e.g., Standard for variable workloads, ProvisionedManaged for high volume).

When deploying OpenAI services in Azure, selecting the appropriate SKU (Stock Keeping Unit) is a critical decision that impacts cost efficiency, performance, and compliance. Different SKUs offer varying levels of computational resources, pricing models, and geographical availability. Making informed choices about these deployments can lead to significant cost savings while maintaining the performance levels your applications require.

Azure OpenAI Service offers multiple deployment options, each designed for specific use cases:

Standard SKUs: Pay-per-token pricing model ideal for variable workloads
ProvisionedManaged SKUs: Fixed capacity with predictable pricing for high-volume scenarios
Regional SKUs: Variations based on geographic data processing requirements
Organizations that don’t standardize their OpenAI SKU selection often experience unnecessary cost overruns, performance issues, and potential compliance violations.

Cost Impact Assessment

Selecting non-optimal SKUs can lead to substantial unnecessary expenditures. Here’s how the wrong choices impact your cloud budget:

Overprovisioning: Using ProvisionedManaged SKUs for variable or low-volume workloads results in paying for unused capacity
Regional price variations: Costs can vary up to 15-20% between regions
Newer model versions: Often more cost-effective than older generations for the same capabilities

Potential Savings

Consider these real-world examples of cost optimization through proper SKU selection:

Example 1: Workload-Appropriate SKU Selection
Organization using ProvisionedManaged SKU ($10/hour) for sporadic workloads
Monthly cost: $7,200 (24×7 availability)
After switching to Standard SKU (pay-per-token): $1,800/month
Monthly savings: $5,400 (75% reduction)

Example 2: Regional Optimization

10 million tokens processed daily in higher-cost region: $8,000/month
Same workload in optimized region: $6,800/month
Monthly savings: $1,200 (15% reduction)
Example 3: Multiple Small Deployments Consolidation
Five separate small ProvisionedManaged deployments: $3,600/month each ($18,000 total)
Consolidated to two optimized deployments: $7,200/month
Monthly savings: $10,800 (60% reduction)

Implementation Guide

Infrastructure-as-Code Implementation (Terraform Example)
When defining OpenAI deployments in Terraform, ensure you’re selecting the appropriate SKU based on your usage patterns and compliance requirements.

Non-Compliant Example:

resource "azurerm_openai_account" "example" {

name = "example-openai"

resource_group_name = azurerm_resource_group.example.name

location = "West US"

sku_name = "S0"

}

resource "azurerm_openai_deployment" "example" {

name = "example-deployment"

openai_account_id = azurerm_openai_account.example.id

model {

format = "OpenAI"

name = "gpt-4"

version = "0613"

}

scale {

type = "Standard"

capacity = 120

}

Compliant Example:

resource "azurerm_openai_account" "example" {

name = "example-openai"

resource_group_name = azurerm_resource_group.example.name
location = "East US" # Choose region based on compliance and cost

sku_name = "S0"

}

resource "azurerm_openai_deployment" "example" {

name = "example-deployment"

openai_account_id = azurerm_openai_account.example.id

model {

format = "OpenAI"

name = "gpt-4"

version = "1106-preview" # Use newer versions when appropriate

}

scale {

type = "ProvisionedManaged" # Only use for consistent high-volume workloads

capacity = 60 # Right-sized based on actual usage patterns

}

Step-by-Step Implementation

Audit existing deployments: Use Infracost to scan your infrastructure code and identify non-compliant OpenAI SKUs. Infracost includes this policy check, enabling you to quickly identify optimization opportunities.

Analyze usage patterns:

Review token consumption and API call patterns over 30-60 days

Identify peak usage and baseline requirements

Determine if usage is predictable or variable

Define SKU selection criteria:

For variable or unpredictable workloads: Use Standard SKUs

For high-volume, consistent workloads: Consider ProvisionedManaged SKUs

For regulated workloads: Ensure regional selection meets compliance requirements

Implement SKU standards in IaC:

Update Terraform/ARM/Bicep templates with standardized SKU configurations

Implement automated validation using Infracost to prevent deployment of non-compliant SKUs

Document exceptions with appropriate justification

Monitor and optimize:

Regularly review usage metrics to ensure SKU selections remain appropriate

Adjust capacity or SKU type as usage patterns evolve

Best Practices

Create a SKU selection framework based on:

Monthly token volume

Request pattern predictability

Budget constraints

Compliance requirements

Performance needs

Implement guardrails:

Use Infracost policies to prevent deployment of non-preferred SKUs
Create approval workflows for exceptions
Document justifications for non-standard selections

Establish regular review cycles:

Quarterly assessment of SKU appropriateness

Alignment with model version updates from OpenAI

Cost vs. performance optimization

Centralize model deployment management:

Use shared services approach where possible
Consolidate deployments to reduce overhead

Standardize deployment patterns

Example Scenarios

Example 1: Enterprise AI Development Platform

Before Policy Implementation:

Multiple teams deploying individual OpenAI instances

Mix of SKUs across regions with no standardization

Inconsistent versioning and unnecessary duplications

Monthly spend: $42,000

After Policy Implementation:

Standardized deployments based on workload type
Consolidated to three regional deployments
Optimized SKU selection based on usage patterns
Monthly spend: $23,000 (45% reduction)
Example 2: AI-Powered Customer Service System

Before Policy Implementation:

ProvisionedManaged SKU deployed for 24/7 availability

Actual usage concentrated in business hours

70% of capacity unused during nights and weekends

Monthly spend: $21,600

After Policy Implementation:

Switched to Standard SKU with pay-per-token model
Maintained smaller ProvisionedManaged instance for baseline operations
Implemented auto-scaling for peak periods
Monthly spend: $8,900 (59% reduction)
Example 3: Regulatory Compliance Scenario

Before Policy Implementation:

All AI workloads deployed in US regions by default

EU data processing requirements not consistently met

Risk of non-compliance with GDPR

Unnecessary data transfer costs

After Policy Implementation:

Region-specific deployment strategy

EU data processed in EU regions
Reduced latency for regional users

Eliminated compliance risks

Reduced data transfer costs by 22%

Considerations and Caveats

When This Policy May Not Apply
Prototype or POC environments: During initial testing phases, standard deployments may be acceptable for short durations
Specialized model requirements: Some specific models may only be available in certain regions or SKUs
Integration constraints: Some legacy systems may have dependencies requiring specific deployment configurations

Implementation Challenges

Usage forecasting complexity: Accurately predicting token consumption patterns can be difficult, especially for new applications
Model version transitions: Changing model versions may require recalibration of capacity requirements
Regional availability limitations: Not all models are available in all regions, potentially forcing trade-offs between locality and model capability

Performance Considerations

Cold start impacts: Standard SKUs may experience latency during periods of inactivity
Quota limitations: Be aware of subscription and regional quota constraints when planning deployments
Burst capacity requirements: Some workloads may have extreme peak demands that justify oversizing

Monitoring and Maintenance

To ensure ongoing optimization:

Implement usage dashboards tracking:

Token consumption by deployment

Request patterns and peak usage
Cost per model version and deployment

Set up alerting for:

Sustained high utilization (>80%)

Extended periods of low utilization (<20%)
Cost anomalies or sudden changes in usage patterns

Regular optimization reviews:

Quarterly assessment of SKU appropriateness

Adjustment based on changing usage patterns

Evaluation of new SKU options as they become available

Infracost’s policy scanning capabilities can help you continuously monitor your infrastructure code for compliance with this policy, identifying opportunities for optimization even as your deployment grows and evolves. The free trial allows you to scan your existing codebase and identify potential savings opportunities.

Create Free Account

This policy is supported in Infracost and available in the free trial. Sign up today and scan your code using our entire library of FinOps policies.

Get started free