Skip to content

Bicep Infrastructure

Infrastructure Overview - Bicep

Deployment Scope : Subscription & RG

  • The Bicep deployment defines specific scopes—either at the subscription level or at the RG (Resource Group) level.
  • The deployment begins at the subscription level to create resource groups.
  • If no scope is defined in a file, it defaults to the RG level.
  • Each RG uses a different scope to ensure clean separation of resource :
    • Network Resources: VNet, subnets, NSGs, and Application Gateway
    • Private DNS Zones: DNS resolution for private endpoints
    • Custom Models: AKS cluster, Container Registry, SQL Server, and supporting services

Key Design Principles

  • All resources are deployed with private endpoints -> Nothing is exposed to the public internet
  • Each RG has its own private DNS zone to host the custom models.
  • The private DNS zone is used to resolve the custom models from the pods.
  • Network isolation between pods and nodes
  • Role-based access control (RBAC) for all resources
  • Web Application Firewall (WAF) protection
  • Managed identities for secure service-to-service communication

Deployment Structure

infrastructure/
├── customModels/
├── network/
├── privateLinkDnsZones/
├── main.bicep
└── pipeline.yaml
└── tst.bicepparam

Network Architecture

Core Components

  • VNet is peered with the hub network to enable access to core resources
  • Application Gateway (AGIC) serves as the entry point and provides WAF protection
  • Cilium CNI manages pod networking and enables direct VNet IP allocation.
  • Cilium CNI allows pods to consume IPs directly from the VNet, eliminating the need for direct node-to-pod communication.

Pod and Node Architecture

  1. Container Structure
  2. Containers run inside pods
  3. Each pod runs on a node
  4. Each pod gets a unique IP address from the VNet
  5. Pods can communicate directly with Azure services via private endpoints

  6. Subnet Separation

  7. Nodes and pods use separate subnets for enhanced security
  8. Nodes are deployed on a dedicated subnet with a NSG to allow communication between the nodes and the pods.
  9. Pod subnet is delegated to Microsoft.ContainerService/managedClusters for automated management
  10. Gateway subnet provides a way to access the cluster from other networks

  11. Network Security

  12. Subnet : specific subnet for nodes, pods, gateway and bastion subnet
  13. Each subnet has dedicated NSGs with specific security rules
  14. Rules are defined to control inbound access to the nodes and pods
  15. Security rules are applied independently to node and pod subnets
  16. Application Gateway accesses pods directly, bypassing nodes, improving security
  17. All internal communication uses private endpoints
  18. Inbound traffic is controlled exclusively through Application Gateway

  19. Scalability and Performance

  20. Pods scale independently of nodes, benefiting from direct VNet IP assignment for better performance and isolation.
  21. Each pod requires its own VNet IP address
  22. Direct pod-to-gateway communication eliminates node routing overhead
  23. Cilium CNI enables efficient network routing and security policies
  24. Pod IP addresses are automatically managed by AKS
  25. Each pod is bound to a virtual machine (VM) (ex: scaling to 100 VMs will require 100 pods, each having a unique IP address)

  26. Communication Flow

  27. External traffic → Application Gateway → Pods (direct)
  28. Pod-to-pod communication within the VNet
  29. Pod-to-Azure services via private endpoints
  30. No direct external access to nodes
  31. To expose the Kubernetes cluster via Azure Application Gateway, nodes and pods are separated into distinct subnets. This ensures the gateway has direct access to pods for efficient traffic management while securing node-level access.

Subnet Design

The infrastructure uses four main subnets:

  1. Nodes Subnet: Hosts AKS nodes
  2. Pods Subnet: Dedicated for pod IP allocation
  3. Delegated to Microsoft.ContainerService/managedClusters
  4. Direct IP allocation from VNet using Cilium CNI
  5. Gateway Subnet: Hosts Application Gateway
  6. Bastion Subnet: Hosts Bastion Host

AKS Configuration

Agent Pools

  1. System Pool (Default)
  2. Scale: 1-3 nodes
  3. Purpose: System workloads and lightweight applications
  4. VM Size: Standard_DS2_v2

  5. User Pool (GPU)

  6. Scale: 0-1 nodes
  7. Purpose: GPU-intensive workloads
  8. VM Size: Standard_NC24ads_A100_v4

  9. Default Agent Pool: This pool is used for managing system workloads and has auto-scaling enabled, with a range of 1-3 nodes (ex : manages workoad identity, DNS inside kube, AGIC)

  10. We can also deploy application inside the system pool, as we don't need a lot of IPs. Better to host the API which doesn't need GPU on the node pool system.
  11. Workloads inside a cluster can scale independently based on application needs.

Application Gateway Ingress Controller (AGIC)

Azure Application Gateway Overview
  • Infrastructure configuration
  • Frontend IP address configuration
  • can configure the application gateway to have a public IP address, a private IP address, or both.
  • Listener configuration
  • checks for incoming connection requests by using the port, protocol, host, and IP address.
  • Request routing rules
  • binds the default listener (appGatewayHttpListener) with the default backend pool (appGatewayBackendPool) and the default backend HTTP settings (appGatewayBackendHttpSettings)
  • HTTP Settings

  • Layer 7 Load Balancing

  • Functions as a web traffic load balancer at OSI layer 7
  • Differs from traditional load balancers (layer 4 - TCP/UDP) which only route based on source/destination IP and ports
  • Enables sophisticated traffic management for web applications

  • Advanced Routing Capabilities

  • Makes routing decisions based on HTTP request attributes:
    • URI paths
    • Host headers
    • Request parameters
  • Example: Routes /images traffic to specific server pools configured for image handling
  • Enables granular control over traffic distribution
  • Enables flexible traffic distribution rules
Ingress Architecture
  1. Ingress Controllers
  2. Operates at layer 7 of OSI model
  3. Routes HTTP traffic based on inbound URLs
  4. Can direct traffic to different microservices based on URL paths
  5. Provides dynamic routing capabilities

  6. Ingress Objects in Kubernetes

  7. Manages external traffic routing
  8. Enforces security settings through:
    • Hostname specifications
    • Protocol definitions
    • Certificate management
  9. Requires hostname usage (not IP addresses) for proper cluster scaling
  10. It is essential to always use hostnames instead of IP addresses when accessing workloads within the cluster (scaling).
  11. Provides configuration framework for traffic management
AGIC Implementation

AGIC keeps the Application Gateway in sync with the cluster state, ensuring that any changes in pod IPs are dynamically reflected in the Gateway configuration without manual intervention.

  1. Integration with Kubernetes
  2. Runs as a pod within the AKS cluster
  3. Interacts directly with Kubernetes API
  4. Deployed as an AKS add-on for seamless integration

  5. Synchronization Mechanism

  6. Monitors and processes Ingress object configurations, applying them to the Application Gateway via ARM deployments.
  7. Automatically synchronizes Ingress configurations with the Gateway and updates it with the latest pod IPs.
  8. Ensures real-time synchronization between the cluster state and Gateway configuration.
  9. Continuously monitors Kubernetes Ingress resources, reflecting dynamic configuration changes.

  10. Service Integration

  11. Ingress targets Kubernetes Service :

    • act as load balancers
    • distributes traffic across multiple pods
    • has a fixed IP within the cluster, which helps simplify traffic management.
    • simplifies traffic management through service abstraction
  12. Direct Pod Communication

  13. App Gateway communicates directly with pod private IPs
  14. Bypasses NodePorts and kube-proxy -> Enhances performance through direct routing
  15. Reduces network latency
  16. Improves overall communication efficiency
AKS Integration
  1. Core Functionality
  2. Manages external HTTP-like traffic access
  3. Provides load balancing services
  4. Handles SSL termination
  5. Supports name-based virtual hosting
  6. Enables seamless service discovery

  7. Operational Benefits

  8. Automated deployment as AKS add-on
  9. Continuous monitoring of Kubernetes resources
  10. Dynamic Gateway configuration updates
  11. Simplified management through Kubernetes native resources
  12. Seamless integration with Azure services

Identity and Access Management

  • Azure built-in role definitions guide: Reference
  • Role Definition for Identities should be managed through Azure RBAC, with the Key Vault set to use secrets.
  • Pods in the cluster use federated credentials to act as if they are using an Azure identity, ensuring seamless access to Azure resources.

Why is the aks Identity in one RG and it ’s permission and resources are in another RG ? Why is the aksIdentity created at the root (subscription) but the scope is customModelsRG ? - The AKS identity is set up at the subscription level to avoid cyclic dependencies across different Resource Groups (RGs) and modules even if the scope is still customModelRG - The cluster identity requires permissions within the Virtual Network (VNet) Resource Group to deploy nodes and pods properly. The role assignment is in VnetRG. It needs to be in VnetRG because it needs to be able to deploy (contributor) the nodes and the pods inside the Vnet

Managed Identities

  1. AKS Cluster Identity (Needs Key Vault access - KV secrets user)
  2. Application Gateway Identity & AGIC Identity (needs access to Application Gateway)
  3. Inference Model Identity
  4. Ingestion Model Identity
  5. Workflow Model Identity

Key Vault Access

  • RBAC-based access control (preferred over access policies)
  • Secrets management through Azure DevOps pipelines
  • Workload Identity federation for pod access
  • API keys, are stored securely in Key Vault. It is better to retrieve them from the pipeline rather than directly accessing them from within the application.
  • To access secrets during deployment, use Azure DevOps pipeline tasks that integrate with Key Vault: Reference

Federated Credentials

  • Federated Credentials for kube pods access
  • Both EntraID and Kubernetes serve as identity providers but have different types of identities:
  • Kubernetes: Uses Service Accounts to manage permissions within the cluster.
  • EntraID: Uses Service Principals, User Managed Identities, …
  • To link these two, a federated credential is established so that a Kubernetes service account is treated equivalently to an EntraID identity

  • Service Account and Identity Integration:

  • Establish federated credentials between the identity provided by Azure EntraID and the Kubernetes service account identity. By doing so, pods running under a service account identity will be able to access Azure resources as though they are using the EntraID identity directly.
  • create a dedicated service account in Kubernetes rather than using the default service account . This ensures that the pods access Azure resources like Key Vault using their own identity instead of the cluster identity, maintaining security and proper access control.

  • To create the federated credential, we need :

  • Service Account details
  • The namespace in which the service account will be defined
  • The permissions required for the service account.

  • In kuberneres we need at least one identity at minimum for each application

  • Workload Identities

Deployment

Prerequisites

  1. Azure subscription with required permissions
  2. Azure DevOps environment
  3. Service Principal with necessary permissions

Deployment Steps

  1. Configure environment parameters in *.bicepparam files
  2. Update pipeline variables in pipeline.yaml
  3. Run the Azure DevOps pipeline:
    az deployment sub create \
      --name CustomModels \
      --location <location> \
      --parameters <environment>.bicepparam
    

Security Considerations

  1. All resources use private endpoints
  2. Network isolation between pods and nodes
  3. WAF protection for incoming traffic
  4. RBAC for all resource access
  5. Managed identities for service authentication

Scaling Considerations

  • Pods scale independently with direct IP allocation
  • Agent pools auto-scale based on demand
  • Application Gateway scales automatically (WAF_v2 SKU)
  • Each pod requires a unique IP from the pods subnet

Monitoring and Management (TODO)

  • Container insights enabled on AKS cluster
  • Application Gateway metrics and logging
  • Private DNS zone monitoring
  • Network security group flow logs

Troubleshooting

  • -> Detele RG ati-custom-models and redeploy && Delete nodes and pods in VNET :
  • Go to subscription -> check for deployments and look for CustomModels

Topics to be addressed

  • Create a service account inside the cluster (to be deployed by the inference model application)
  • Ensure workload scaling down to zero when not in use, particularly for inference models, to optimize cost
  • Federated credentials between the identity (inference, ingestion, workflow, ...) from EntraID and the Identity from Kubernetes (need to know Service Account, namespace, permission)