Vetted Observability Professionals

Pre-screened and vetted.

RS

Staff Software Engineer specializing in distributed payments and streaming systems

Kirkland, WA14y exp
AmazonUniversity of Illinois Urbana-Champaign
View profile
MM

Senior Software Engineer specializing in data platforms, APIs, and healthcare systems

9y exp
Broad Institute of MIT and HarvardUniversity of Oxford
View profile
BO

Bola Omoniyi

Screened ReferencesStrong rec.

Mid-level Cloud/DevOps Engineer specializing in AWS platform automation and CI/CD

Austin, TX6y exp
AmazonGeorgia State University

Senior infrastructure/platform engineer with deep IBM Power/AIX (Power9, VIOS, HMC, LPAR/DLPAR) and PowerHA production ownership at scale (40 frames / ~300 LPARs), including hands-on outage recovery and performance tuning. Also delivers modern DevOps/IaC capabilities—CI/CD for Kubernetes microservices and Terraform-based multi-account AWS (EKS/VPC/IAM/RDS) with drift detection and safe rollout controls.

View profile
AS

Arif Shaikh

Screened ReferencesStrong rec.

Executive engineering leader specializing in AI-native platforms and distributed systems

Torrance, CA22y exp
Sony Pictures

Engineering leader and hands-on architect from Sony Pictures Entertainment who has led large-scale platform modernization, digital supply chain delivery, and multiple GenAI/RAG initiatives in media and rights-management domains. Notable for combining people leadership with deep AWS/LLM architecture work, including a 1M+ record GenAI title search system and AI-driven contract insights that reportedly helped unlock a 20% revenue increase.

View profile
Shahla Almasri-Hafez - Director-level Engineering Leader specializing in mobile platforms and digital transformation in Remote, USA

Director-level Engineering Leader specializing in mobile platforms and digital transformation

Remote, USA20y exp
GlassdoorTufts University

Mobile engineering leader who built Glassdoor's mobile org from 5 to 35 engineers while overseeing both iOS and Android. Started as a lead iOS engineer at CVS Health, moved into management, and combines hands-on technical depth with org building, talent development, and customer-driven product prioritization.

View profile
CK

Senior Backend Engineer specializing in Node.js, Java, and regulated SaaS platforms

Seattle, WA11y exp
MpathicBrown University

Built a production LLM-powered root cause analysis agent for supply chain alerts that helped operations managers avoid manual dashboard investigation. Demonstrates unusually strong depth in agent reliability, orchestration, and observability, with concrete production practices like hallucination blocking, shadow testing on 500 cases, and data-driven improvements that raised user agreement to 94% while cutting GPT-4 usage by 60%.

View profile
KH

Kishan H

Screened

Senior Full-Stack Engineer specializing in Java microservices and cloud-native platforms

San Jose, CA5y exp
WalmartSaint Louis University

Backend-focused engineer with Walmart Global Tech experience building shipment and seller workflow systems using Spring Boot, GraphQL, Kafka, and async processing. Stands out for improving bulk label API performance by 60-75%, designing item-level partial-failure workflows that improved user clarity, and also exploring AI-powered debugging/RCA platforms with Java, Python, LangChain, and LLM integrations.

View profile
fritzie alex - Executive engineering leader specializing in AI-native transformation and cloud modernization in San Francisco, CA

fritzie alex

Screened

Executive engineering leader specializing in AI-native transformation and cloud modernization

San Francisco, CA20y exp
AmazonDayananda Sagar College of Engineering

Senior software engineering leader who owned the architecture and delivery of a major VerizonWireless.com transformation, migrating an ATG monolith to Spring Boot/Java microservices with an event-driven architecture across 13 domains. Combines hands-on system design depth with business-facing architecture leadership and team scaling in complex enterprise environments.

View profile
CC

Senior Backend Engineer specializing in distributed systems and cloud microservices

Beaverton, Oregon11y exp
NikeUC San Diego

Backend/data engineer with experience at Nike building high-volume order orchestration and validation APIs using FastAPI microservices on AWS EKS with Kafka, Redis, and Postgres. Strong in production reliability (timeouts/retries/idempotency), GitOps (Argo CD) + Terraform deployments, and data pipelines (AWS Glue/S3), with hands-on incident ownership and legacy modernization into API-driven services.

View profile
PP

Pranshu Patel

Screened

Director-level Software Development Manager specializing in cloud DDoS protection

Santa Clara, CA12y exp
Amazon Web ServicesUniversity of Maryland, College Park

AWS Software Development Manager leading globally deployed, production-critical DDoS protection (L3/L4) across AWS. Known for scaling teams and driving cross-org tiger-team initiatives from concept through worldwide rollout, including performance-focused Python architecture changes and a major JDK 8→21 migration while maintaining strict backward compatibility. Also led an internal SDK-like integration framework improving APIs, documentation, and onboarding for major AWS service teams.

View profile
RB

Senior Infrastructure Engineer specializing in cloud, Kubernetes, and MLOps

San Francisco, USA6y exp
ATLANTIA SpaUniversity of Bologna

LLMOps-focused technical leader who took an LLM use case from prototype to production for a non-technical customer by combining trust-building and structured enablement with a robust AWS/Kubernetes-based MLOps stack. Built observability and rollback mechanisms (Grafana + MLflow) to troubleshoot in real time, and scaled delivery by hiring a 5-person team while partnering with sales to manage expectations and drive adoption across departments.

View profile
SA

Shiva Arcot

Screened

Director of Security & Data Platform Engineering specializing in AI-driven cloud security

Sunnyvale, CA24y exp
ProofpointSanta Clara University

Player-coach engineering leader focused on scalable data security scanning and risk detection in hybrid cloud, owning architecture and core implementation of an incremental/parallel DSPM scanning engine. Shipped production improvements including 60% lower scan latency and 30% fewer false positives, with strong emphasis on correctness under concurrency, multi-tenant observability (SLOs/burn-rate alerts), and disciplined rollout practices (feature flags, shadow scans, canaries).

View profile
Sujit Singh - Engineering Director specializing in backend & data platforms for enterprise SaaS and cybersecurity in San Jose, CA

Sujit Singh

Screened

Engineering Director specializing in backend & data platforms for enterprise SaaS and cybersecurity

San Jose, CA21y exp
SplunkHarvard Extension School

Backend/data engineering player-coach on a UEBA cloud security analytics platform who standardized MLOps and detection development for 180+ detections, cutting ship time from 6–7 weeks to ~3 weeks while reducing false positives. Proven at operating large-scale streaming + Spark systems (200K+ events/sec, 100+ TB/day), driving major reliability/cost improvements, and leading incident response and team execution through GA.

View profile
Shant Mardigian - Executive Engineering Leader specializing in scalable streaming, media supply chain, and AI operations

Executive Engineering Leader specializing in scalable streaming, media supply chain, and AI operations

19y exp
DisneyUCLA

Tech executive with Disney experience who has repeatedly scaled and restructured engineering organizations (from 4 to 30 and up to 100+), using OKRs/KPIs to drive business-aligned roadmaps. Hands-on with architecture and platform strategy, including adopting MongoDB Atlas to centralize transactional data and building shared core services (security/permissions, auditing, compliance) to increase product velocity across distributed teams.

View profile
Ravikanth Kasamsetty - Executive engineering leader specializing in AI, data platforms, and cloud SaaS in California, USA

Executive engineering leader specializing in AI, data platforms, and cloud SaaS

California, USA23y exp
PlumePenn State University

Senior engineering executive with a rare mix of VP-level people leadership and deep hands-on architecture across cloud-native and AI platforms. He led secure hybrid SaaS design at Promethium, built GenAI-powered analytics with LLMs/RAG, and translated customer needs into product wins including the largest deal in company history and a PLG motion that generated 100+ leads.

View profile
Kusumita Dasgupta - Director-level Engineering Leader specializing in data platforms, cloud systems, and LLM products in United States

Director-level Engineering Leader specializing in data platforms, cloud systems, and LLM products

United States22y exp
Intuition IntelligenceUSC

Engineering leader/player-coach with recent hands-on work delivering an agentic AI MVP on Amazon Bedrock (conversational UI + supervisor agent routing between internal knowledge and external sources). Previously drove large-scale data platform cost optimization at Twitter, saving ~$3M–$5M annually, and has owned production incidents end-to-end with a focus on analytics/monitoring improvements and team coaching.

View profile
Sanket Patel - Director-level Front-End Engineering Leader specializing in scalable web and mobile apps in Palo Alto, CA

Sanket Patel

Screened

Director-level Front-End Engineering Leader specializing in scalable web and mobile apps

Palo Alto, CA17y exp
AmazonSan Diego State University

Amazon engineer/leader who drove a major modernization of the AWS Database Migration Service Console, migrating a monolithic UI to a micro-frontend architecture while improving performance, reliability, and engineering standards. Operates as a player-coach (80/20 hands-on/management), with demonstrated incident ownership and process improvements across Amazon and Walmart Labs.

View profile
Chinedu Enwere - Mid Software Engineer specializing in distributed backend systems in Redmond, WA

Mid Software Engineer specializing in distributed backend systems

Redmond, WA4y exp
MicrosoftUniversity of Texas at Austin

Engineering candidate deeply embedded in AI-native development, currently using tools like Cursor and Claude Code to generate most of their code and building internal agents for on-call monitoring, anomaly detection, and automated incident mitigation. Particularly interesting for teams exploring AI-first engineering workflows, multi-agent development setups, and operational automation at scale.

View profile
KV

Executive platform engineering leader specializing in multi-cloud SaaS and data platforms

USA29y exp
WP EngineCity University of Seattle

Senior engineering leader with a rare blend of deep platform/infrastructure expertise and large-scale people leadership, spanning hands-on cloud-native architecture through management of globally distributed organizations of up to roughly 180 engineers. Particularly strong in Kubernetes platform modernization, reliability engineering, and scaling high-throughput SaaS systems while aligning platform investments to customer and business outcomes.

View profile
Pavithra Kollu - Mid-level Machine Learning Engineer specializing in LLMs, RAG, and search systems in San Francisco, CA

Mid-level Machine Learning Engineer specializing in LLMs, RAG, and search systems

San Francisco, CA5y exp
PerplexityNortheastern University

Backend/ML infrastructure engineer with experience at Perplexity and Meta building production evaluation, monitoring, and retrieval systems for AI search, autonomous agents, and LLM-powered workflows. Particularly strong in turning messy manual quality-review processes into reusable Python/FastAPI automation with measurable impact, including major gains in search relevance, latency, and grounded answer quality.

View profile
HK

Harish Kasu

Screened

Mid-level AI/ML Engineer specializing in Generative AI, RAG, and MLOps

San Francisco, CA5y exp
NVIDIATexas A&M University-Kingsville

AI/LLM engineer with production experience at NVIDIA and Microsoft, including building a RAG-based enterprise knowledge assistant that improved accuracy by 42% and scaled to thousands of queries. Deep in inference optimization (TensorRT-LLM, Triton, quantization, speculative decoding) and MLOps/observability (Prometheus/Grafana, MLflow, LangSmith), plus orchestration with Kubeflow/Airflow across multi-cloud.

View profile
GU

Engineering executive specializing in production ML systems and enterprise SaaS

San Francisco, CA26y exp
FLYRCarnegie Mellon University

Engineering/data platform leader from FLYR (airline ML forecasting and automated pricing) who built scalable ingestion/ETL and a canonical data model to onboard airlines with highly heterogeneous source systems. Created a golden-metrics layer for airline KPIs and implemented monitoring/backfill capabilities, cutting onboarding time by 50%+ while improving SLA performance and controlling cloud/ML training costs through stronger data quality gates.

View profile
Yishi Wang - Junior Machine Learning & Data Science professional specializing in LLMs and analytics in Chicago, IL

Yishi Wang

Screened

Junior Machine Learning & Data Science professional specializing in LLMs and analytics

Chicago, IL3y exp
MintelNorthwestern University

Amazon internship experience building production GenAI analytics for the returns organization: a multi-agent LLM+RAG system that let analysts query multiple heterogeneous data sources in natural language without hand-written SQL. Also built and operationalized four Apache Airflow DAGs for large-scale ETL, emphasizing observability and freshness-aware metadata to keep outputs accurate and up to date.

View profile
Jun Ouyang - Principal Software Engineer / Tech Lead specializing in distributed systems, payments, and reliability in San Francisco, CA

Jun Ouyang

Screened

Principal Software Engineer / Tech Lead specializing in distributed systems, payments, and reliability

San Francisco, CA20y exp
DoorDashZhejiang University

Backend engineer with DoorDash experience building production-critical systems spanning LLM-based real-time safety moderation (SendBird callbacks + ChatGPT risk scoring with automated actions) and large-scale payments data pipelines (Kafka to CockroachDB with aggregation APIs). Also led cross-team reliability work to standardize SLOs and drove an incident redesign from batch pull to real-time push callbacks to eliminate critical-event latency.

View profile

Need someone specific?

AI Search