Staff/Principal Cloud Infrastructure Engineer specializing in Kubernetes and OpenStack
Infrastructure Engineer SRE14 years experienceStaffCloud ComputingTechnologyE-commerce
ScreenedIdentity Verified
Connect with Chengzhu
Chengzhu already has a relationship with Reval, so a warm intro from us gets a much better response than cold outreach.
Recommended
Already have an account?
About
Platform/backend engineer focused on Kubernetes at scale: built a Java control-plane service for multi-region cluster provisioning/monitoring/upgrades using Kafka-driven async workers, and solved peak-load provisioning failures by eliminating blocking I/O and dynamically scaling consumers. Also shipped an LLM-assisted Kubernetes troubleshooting/remediation feature that pulls Prometheus logs/metrics into prompts and uses guardrails (confidence thresholds + human-in-the-loop) to prevent risky actions.
Experience
Infrastructure Engineer SRETikTok
Member of Technical Staff IIeBay
Software Engineering Manager & Senior Tech ExpertAnt Group
Built and operated a Java control-plane service managing Kubernetes clusters across multiple cloud regions
Debugged and resolved high-load reliability failures (thread pool exhaustion + blocking I/O) by moving to non-blocking async calls and dynamically scaling workers/consumers
Designed event-driven provisioning/upgrade workflows using Kafka with asynchronous execution
Shipped an LLM-powered cluster troubleshooting feature integrating Prometheus logs/metrics into context-rich prompts with remediation recommendations
Designed multi-step automated cluster maintenance/upgrade workflows with step tracking in SQL + etcd and typed error handling (timeouts, retries, escalation/stop rules)
Improved slow relational queries by adding composite indexes aligned to real query patterns
Discover more candidates like Chengzhu
Search across thousands of pre-screened, high-quality, high-intent candidates on Reval.