Q1. Can we rollback an AKS/Kubernetes upgrade?
"No, AKS upgrades are generally one-way upgrades. Once the control plane and node pools are upgraded, rollback isn't supported. That's why before upgrading production we validate compatibility in lower environments, review deprecated APIs, check ingress controller compatibility, monitoring agents, CSI drivers, and application readiness. If an issue occurs after upgrade, we usually roll back the application version rather than the cluster version."
Q2. During cluster upgrade requests are failing. Why?
"The first thing I would check is whether enough healthy replicas are available during the upgrade. Common reasons are missing Pod Disruption Budgets, incorrect readiness probes, single replica deployments, slow application startup, or insufficient cluster capacity causing delays in pod scheduling."
Q3. Hundreds of requests are failing during upgrade. What else besides PDB?
"Apart from PDB, I would investigate readiness probes, application startup time, connection draining, ingress behavior, resource constraints, and whether traffic is being routed to pods before they become ready. I've seen failures caused by applications taking several minutes to initialize after node replacement."
Q4. How does StatefulSet work?
"StatefulSet is used when applications require stable identity and persistent storage. Unlike Deployments, pod names remain consistent such as database-0, database-1, database-2. Each pod gets its own persistent volume and DNS identity. It's commonly used for databases, Kafka, and other stateful workloads."
Q5. Which Ingress Controller are you using?
"In my recent project we were using NGINX Ingress Controller."
Q6. Is it community supported?
"Yes, NGINX Ingress Controller is actively maintained by the Kubernetes community. However, whenever I use it in production, I always verify the support matrix, Kubernetes compatibility, security advisories, and version lifecycle."
Q7. Why do we use Ingress?
"Ingress provides a centralized way to manage HTTP and HTTPS traffic into Kubernetes. Instead of exposing every service separately, we define routing rules that direct traffic to the appropriate backend services."
Q8. Why not just use Application Gateway?
"Application Gateway and Ingress complement each other. Application Gateway manages external traffic, SSL offloading, and WAF capabilities. Ingress handles Kubernetes-native routing inside the cluster. In many environments both work together."
Q9. How does traffic flow?
"Client request reaches the load balancer or Application Gateway, then goes to the Ingress Controller. The Ingress Controller evaluates the routing rules and forwards the request to the appropriate Kubernetes Service. The Service then routes traffic to one of the healthy pods through its endpoints."
Client
↓
App Gateway
↓
Ingress
↓
Service
↓
Endpoints
↓
Pods
Q10. What configuration do you do in Ingress?
"Typically host-based routing, path-based routing, TLS certificates, redirects, rewrite rules, authentication integration, rate limiting, timeouts, and backend service mappings."
Q11. How do you secure Terraform state?
"State files can contain sensitive information, so we store them in a remote backend with encryption enabled. Access is controlled through RBAC and least privilege principles. We also enable versioning and auditing to track changes."
Q12. How do you ensure Terraform state isn't deleted?
"We use remote backends with versioning enabled. Even if a state file is accidentally modified or deleted, previous versions can be recovered. Access controls also prevent unauthorized deletion."
Q13. How do you structure Terraform code?
"I prefer a modular approach. Reusable components such as networking, storage, IAM, and Kubernetes resources are maintained as modules. Environment-specific configurations consume those modules rather than duplicating code."
Q14. How do you structure Terraform state?
"I prefer separating state files by environment and sometimes by workload. This reduces blast radius and prevents unrelated infrastructure changes from affecting each other."
Example:
dev.tfstate
qa.tfstate
prod.tfstate
Q15. What is SLA?
"SLA is the commitment made to customers regarding service availability or performance. For example, 99.9% availability."
Q16. What is SLO?
"SLO is the internal target used to achieve the SLA. If the SLA is 99.9%, we might operate with an SLO of 99.95% to provide a safety margin."
Q17. Do you know Python or Shell?
"Yes. I primarily use Bash scripting for operational automation and have also used Python for automation tasks, API integrations, and reporting."
Q18. Have you architected an entire system landscape?
"I've participated in designing platform components, CI/CD architecture, cloud infrastructure, Kubernetes platforms, monitoring, and automation frameworks. While architecture is always collaborative, I've been involved in evaluating requirements, identifying dependencies, designing solutions, and implementing platform standards."
Q19. What parameters do you consider while designing a platform?
"I look at scalability, reliability, security, maintainability, cost optimization, observability, disaster recovery, automation, governance, and developer experience."
Q20. What parameters do you consider for sustainability?
"Operational sustainability is important. I focus on automation, standardization, reusable modules, reducing manual effort, documentation, monitoring, and supportability so the platform remains manageable as it grows."
Q21. What parameters do you consider for security?
"Security should be integrated from the beginning. I focus on identity and access management, least privilege access, secrets management, vulnerability scanning, encryption, auditability, network segmentation, and compliance requirements."
Q22. Did you have a separate security team?
"Yes, we had a dedicated security team. However, security was a shared responsibility. Platform, DevOps, cloud, and application teams all worked together to address vulnerabilities and implement security controls."
Q23. DevOps and Security overlap. What's your view?
"I think security should be embedded into the delivery process rather than treated as a separate stage. DevOps teams enable secure delivery through automation, while security teams provide governance, standards, and risk management. Both need to work closely together."