Bicep Infrastructure

Infrastructure Overview - Bicep¶

Deployment Scope : Subscription & RG¶

The Bicep deployment defines specific scopes—either at the subscription level or at the RG (Resource Group) level.
The deployment begins at the subscription level to create resource groups.
If no scope is defined in a file, it defaults to the RG level.
Each RG uses a different scope to ensure clean separation of resource :
- Network Resources: VNet, subnets, NSGs, and Application Gateway
- Private DNS Zones: DNS resolution for private endpoints
- Custom Models: AKS cluster, Container Registry, SQL Server, and supporting services

Key Design Principles¶

All resources are deployed with private endpoints -> Nothing is exposed to the public internet
Each RG has its own private DNS zone to host the custom models.
The private DNS zone is used to resolve the custom models from the pods.
Network isolation between pods and nodes
Role-based access control (RBAC) for all resources
Web Application Firewall (WAF) protection
Managed identities for secure service-to-service communication

Deployment Structure¶

infrastructure/
├── customModels/
├── network/
├── privateLinkDnsZones/
├── main.bicep
└── pipeline.yaml
└── tst.bicepparam

Network Architecture¶

Core Components¶

VNet is peered with the hub network to enable access to core resources
Application Gateway (AGIC) serves as the entry point and provides WAF protection
Cilium CNI manages pod networking and enables direct VNet IP allocation.
Cilium CNI allows pods to consume IPs directly from the VNet, eliminating the need for direct node-to-pod communication.

Pod and Node Architecture¶

Container Structure
Containers run inside pods
Each pod runs on a node
Each pod gets a unique IP address from the VNet
Pods can communicate directly with Azure services via private endpoints
Subnet Separation
Nodes and pods use separate subnets for enhanced security
Nodes are deployed on a dedicated subnet with a NSG to allow communication between the nodes and the pods.
Pod subnet is delegated to Microsoft.ContainerService/managedClusters for automated management
Gateway subnet provides a way to access the cluster from other networks
Network Security
Subnet : specific subnet for nodes, pods, gateway and bastion subnet
Each subnet has dedicated NSGs with specific security rules
Rules are defined to control inbound access to the nodes and pods
Security rules are applied independently to node and pod subnets
Application Gateway accesses pods directly, bypassing nodes, improving security
All internal communication uses private endpoints
Inbound traffic is controlled exclusively through Application Gateway
Scalability and Performance
Pods scale independently of nodes, benefiting from direct VNet IP assignment for better performance and isolation.
Each pod requires its own VNet IP address
Direct pod-to-gateway communication eliminates node routing overhead
Cilium CNI enables efficient network routing and security policies
Pod IP addresses are automatically managed by AKS
Each pod is bound to a virtual machine (VM) (ex: scaling to 100 VMs will require 100 pods, each having a unique IP address)
Communication Flow
External traffic → Application Gateway → Pods (direct)
Pod-to-pod communication within the VNet
Pod-to-Azure services via private endpoints
No direct external access to nodes
To expose the Kubernetes cluster via Azure Application Gateway, nodes and pods are separated into distinct subnets. This ensures the gateway has direct access to pods for efficient traffic management while securing node-level access.

Subnet Design¶

The infrastructure uses four main subnets:

Nodes Subnet: Hosts AKS nodes
Pods Subnet: Dedicated for pod IP allocation
Delegated to Microsoft.ContainerService/managedClusters
Direct IP allocation from VNet using Cilium CNI
Gateway Subnet: Hosts Application Gateway
Bastion Subnet: Hosts Bastion Host

AKS Configuration¶

Agent Pools¶

System Pool (Default)
Scale: 1-3 nodes
Purpose: System workloads and lightweight applications
VM Size: Standard_DS2_v2
User Pool (GPU)
Scale: 0-1 nodes
Purpose: GPU-intensive workloads
VM Size: Standard_NC24ads_A100_v4
Default Agent Pool: This pool is used for managing system workloads and has auto-scaling enabled, with a range of 1-3 nodes (ex : manages workoad identity, DNS inside kube, AGIC)
We can also deploy application inside the system pool, as we don't need a lot of IPs. Better to host the API which doesn't need GPU on the node pool system.
Workloads inside a cluster can scale independently based on application needs.

Application Gateway Ingress Controller (AGIC)¶

Azure Application Gateway Overview¶

Infrastructure configuration
Frontend IP address configuration
can configure the application gateway to have a public IP address, a private IP address, or both.
Listener configuration
checks for incoming connection requests by using the port, protocol, host, and IP address.
Request routing rules
binds the default listener (appGatewayHttpListener) with the default backend pool (appGatewayBackendPool) and the default backend HTTP settings (appGatewayBackendHttpSettings)
HTTP Settings
Layer 7 Load Balancing
Functions as a web traffic load balancer at OSI layer 7
Differs from traditional load balancers (layer 4 - TCP/UDP) which only route based on source/destination IP and ports
Enables sophisticated traffic management for web applications
Advanced Routing Capabilities
Makes routing decisions based on HTTP request attributes:
- URI paths
- Host headers
- Request parameters
Example: Routes /images traffic to specific server pools configured for image handling
Enables granular control over traffic distribution
Enables flexible traffic distribution rules

Ingress Architecture¶

Ingress Controllers
Operates at layer 7 of OSI model
Routes HTTP traffic based on inbound URLs
Can direct traffic to different microservices based on URL paths
Provides dynamic routing capabilities
Ingress Objects in Kubernetes
Manages external traffic routing
Enforces security settings through:
- Hostname specifications
- Protocol definitions
- Certificate management
Requires hostname usage (not IP addresses) for proper cluster scaling
It is essential to always use hostnames instead of IP addresses when accessing workloads within the cluster (scaling).
Provides configuration framework for traffic management

AGIC Implementation¶

AGIC keeps the Application Gateway in sync with the cluster state, ensuring that any changes in pod IPs are dynamically reflected in the Gateway configuration without manual intervention.

Integration with Kubernetes
Runs as a pod within the AKS cluster
Interacts directly with Kubernetes API
Deployed as an AKS add-on for seamless integration
Synchronization Mechanism
Monitors and processes Ingress object configurations, applying them to the Application Gateway via ARM deployments.
Automatically synchronizes Ingress configurations with the Gateway and updates it with the latest pod IPs.
Ensures real-time synchronization between the cluster state and Gateway configuration.
Continuously monitors Kubernetes Ingress resources, reflecting dynamic configuration changes.
Service Integration
Ingress targets Kubernetes Service :
- act as load balancers
- distributes traffic across multiple pods
- has a fixed IP within the cluster, which helps simplify traffic management.
- simplifies traffic management through service abstraction
Direct Pod Communication
App Gateway communicates directly with pod private IPs
Bypasses NodePorts and kube-proxy -> Enhances performance through direct routing
Reduces network latency
Improves overall communication efficiency

AKS Integration¶

Core Functionality
Manages external HTTP-like traffic access
Provides load balancing services
Handles SSL termination
Supports name-based virtual hosting
Enables seamless service discovery
Operational Benefits
Automated deployment as AKS add-on
Continuous monitoring of Kubernetes resources
Dynamic Gateway configuration updates
Simplified management through Kubernetes native resources
Seamless integration with Azure services

Identity and Access Management¶

Azure built-in role definitions guide: Reference
Role Definition for Identities should be managed through Azure RBAC, with the Key Vault set to use secrets.
Pods in the cluster use federated credentials to act as if they are using an Azure identity, ensuring seamless access to Azure resources.

Why is the aks Identity in one RG and it ’s permission and resources are in another RG ? Why is the aksIdentity created at the root (subscription) but the scope is customModelsRG ? - The AKS identity is set up at the subscription level to avoid cyclic dependencies across different Resource Groups (RGs) and modules even if the scope is still customModelRG - The cluster identity requires permissions within the Virtual Network (VNet) Resource Group to deploy nodes and pods properly. The role assignment is in VnetRG. It needs to be in VnetRG because it needs to be able to deploy (contributor) the nodes and the pods inside the Vnet

Managed Identities¶

AKS Cluster Identity (Needs Key Vault access - KV secrets user)
Application Gateway Identity & AGIC Identity (needs access to Application Gateway)
Inference Model Identity
Ingestion Model Identity
Workflow Model Identity

Key Vault Access¶

RBAC-based access control (preferred over access policies)
Secrets management through Azure DevOps pipelines
Workload Identity federation for pod access
API keys, are stored securely in Key Vault. It is better to retrieve them from the pipeline rather than directly accessing them from within the application.
To access secrets during deployment, use Azure DevOps pipeline tasks that integrate with Key Vault: Reference

Federated Credentials¶

Federated Credentials for kube pods access
Both EntraID and Kubernetes serve as identity providers but have different types of identities:
Kubernetes: Uses Service Accounts to manage permissions within the cluster.
EntraID: Uses Service Principals, User Managed Identities, …
To link these two, a federated credential is established so that a Kubernetes service account is treated equivalently to an EntraID identity
Service Account and Identity Integration:
Establish federated credentials between the identity provided by Azure EntraID and the Kubernetes service account identity. By doing so, pods running under a service account identity will be able to access Azure resources as though they are using the EntraID identity directly.
create a dedicated service account in Kubernetes rather than using the default service account . This ensures that the pods access Azure resources like Key Vault using their own identity instead of the cluster identity, maintaining security and proper access control.
To create the federated credential, we need :
Service Account details
The namespace in which the service account will be defined
The permissions required for the service account.
In kuberneres we need at least one identity at minimum for each application
Workload Identities

Deployment¶

Prerequisites¶

Azure subscription with required permissions
Azure DevOps environment
Service Principal with necessary permissions

Deployment Steps¶

Configure environment parameters in *.bicepparam files
Update pipeline variables in pipeline.yaml

Run the Azure DevOps pipeline:

az deployment sub create \
  --name CustomModels \
  --location <location> \
  --parameters <environment>.bicepparam

Security Considerations¶

All resources use private endpoints
Network isolation between pods and nodes
WAF protection for incoming traffic
RBAC for all resource access
Managed identities for service authentication

Scaling Considerations¶

Pods scale independently with direct IP allocation
Agent pools auto-scale based on demand
Application Gateway scales automatically (WAF_v2 SKU)
Each pod requires a unique IP from the pods subnet

Monitoring and Management (TODO)¶

Container insights enabled on AKS cluster
Application Gateway metrics and logging
Private DNS zone monitoring
Network security group flow logs

Troubleshooting¶

-> Detele RG ati-custom-models and redeploy && Delete nodes and pods in VNET :
Go to subscription -> check for deployments and look for CustomModels

Topics to be addressed¶

Create a service account inside the cluster (to be deployed by the inference model application)
Ensure workload scaling down to zero when not in use, particularly for inference models, to optimize cost
Federated credentials between the identity (inference, ingestion, workflow, ...) from EntraID and the Identity from Kubernetes (need to know Service Account, namespace, permission)