Vetted Observability Professionals

Pre-screened and vetted.

TW

Tianyi Wang

Screened

Entry-Level Backend/Cloud Engineer specializing in distributed systems and AI platforms

Seattle, WA1y exp
AmazonUniversity of Michigan

Full-stack engineer with deep serverless AWS experience who built VidToNote, an AI video analysis platform, end-to-end using Next.js App Router/TypeScript and an event-driven pipeline (API Gateway, Lambda, DynamoDB, S3, Step Functions, SQS). Strong on production reliability and observability (CloudWatch, X-Ray, structured logging), plus data/analytics work in Postgres with measurable query optimizations and durable LLM evaluation workflows. Amazon background; integrated 22 AWS services and completed AWS Solutions Architect Professional certification within a month.

View profile
TH

Mid-level Software Engineer specializing in backend systems, IoT, and AI security

Pittsburgh, PA3y exp
NapticCarnegie Mellon University

Full-stack engineer in the investment tracking/financial reporting space who built an automated reporting dashboard and compliance/reporting pipeline end-to-end using Next.js (App Router, server/client components), REST, and Postgres. Demonstrated measurable performance wins (~30% faster loads) through caching and query optimization, and built durable orchestrated workflows in n8n with retries, idempotency, and reconciliation checks.

View profile
PY

Mid-level Software Development Engineer specializing in AWS telemetry and DDoS mitigation

Seattle, WA3y exp
Amazon Web ServicesTexas A&M University-Commerce

Amazon engineer who built an Amazon Bedrock-powered summarization layer over large-scale network/service telemetry (“top talker” insights) to help security engineers triage anomalies faster. Emphasizes production-grade design patterns for LLM features—non-blocking enrichment, deterministic fallbacks, strict structured outputs, and monitoring to preserve trust in source-of-truth telemetry.

View profile
Andrew Liang - Intern Software Engineer specializing in full-stack and AI/ML systems

Andrew Liang

Screened

Intern Software Engineer specializing in full-stack and AI/ML systems

2y exp
AmazonUCLA

Software engineer with experience at Amazon and Agora building end-to-end systems: a knowledge-base AI chatbot (React/TypeScript UI + retrieval/response backend + Docker deployment) and an internal approval governance platform using AWS Step Functions and DynamoDB. Emphasizes fast iteration without sacrificing trust via feature-flag rollouts, citation-required answers, abstention on low-confidence retrieval, regression query sets, and strong observability (request IDs, structured logs, latency/error monitoring).

View profile
EX

Elizabeth Xu

Screened

Entry-Level Software Engineer specializing in ML/NLP and security

Evanston, IL1y exp
RakutenNorthwestern University

Early-career engineer (internship background) who built a production-style notes product using Next.js App Router with Server Components/Server Actions and a Postgres-backed analytics model. Demonstrates strong performance and reliability instincts—measured DB latency improvements via indexing and cursor pagination, plus durable orchestration with Temporal using idempotency and deterministic workflows.

View profile
Jacqueline Zhang - Mid-level Machine Learning Engineer specializing in LLMs, fairness, and healthcare ML in Illinois, USA

Mid-level Machine Learning Engineer specializing in LLMs, fairness, and healthcare ML

Illinois, USA4y exp
iSchool Statistical ML & AI LabUniversity of Illinois Urbana-Champaign

ML/NLP practitioner with a master’s thesis focused on domain-adaptive knowledge distillation for LLMs (LLaMA2/sheared LLaMA), showing improved perplexity and ROUGE-L on biomedical data. Also built real-world data linking and search systems: integrated ClinicalTrials.gov with FAERS using fuzzy matching + embeddings, and delivered an LLM-powered FAQ recommender at Hyperledger using sentence-transformers, FAISS, and fine-tuning to mitigate embedding drift.

View profile
Yeshwanth Sai Pala - Mid-level Full-Stack Developer specializing in cloud microservices and AI-driven FinTech in Remote, USA

Mid-level Full-Stack Developer specializing in cloud microservices and AI-driven FinTech

Remote, USA4y exp
StripeSouthern Arkansas University

Stripe engineer who shipped an end-to-end merchant fraud insights dashboard, spanning Spring Boot/Kafka risk-scoring services and a React+TypeScript UI. Focused on low-latency, high-volume transaction processing and production operations on AWS (EKS/CloudWatch), including handling a real traffic-spike latency incident via query optimization, indexing, and rate limiting.

View profile
Likhitha Bethi - Mid-level Software Engineer specializing in backend systems, distributed systems, and applied AI in Stony Brook, NY

Mid-level Software Engineer specializing in backend systems, distributed systems, and applied AI

Stony Brook, NY4y exp
Stony Brook UniversityStony Brook University

Goldman Sachs engineer who owned end-to-end features for an internal onboarding and case management platform, spanning React/TypeScript UI, a GraphQL gateway, and Node + Spring WebFlux microservices. Built and operated a Kafka-based ingestion and search pipeline with DLQs, retries, idempotency, and strong observability, and improved developer experience via backward-compatible GraphQL API design and schema-driven documentation.

View profile
XL

Xicheng Liang

Screened

Intern AI/Full-Stack Engineer specializing in backend systems and applied machine learning

Chicago, IL1y exp
Becker’s HealthcareUniversity of Pennsylvania

Built and shipped a production agentic RAG system for healthcare analysts that automated compliance/operations knowledge retrieval across PDFs, reports, and databases. Emphasizes production reliability (monitoring, retries, fallbacks, async queues), strong evaluation/iteration loops, and measurable impact (3–10s responses and ~98% top-k retrieval accuracy).

View profile
SS

Steven Schoen

Screened

Staff Android Engineer specializing in mobile platform and design systems

Berkeley, CA12y exp
RedditUniversity of Central Florida

Built and shipped a production internal framework-adoption agent for design system leadership, using Temporal, Google ADK, and a Slack bot interface. They appear to be an early internal builder of agentic systems at their company, with practical experience in prompt/process design, lightweight orchestration, and reliability tradeoffs for real-world LLM workflows.

View profile
SK

Intern Software Engineer specializing in developer productivity and data/AI systems

Los Angeles, California1y exp
IntuitUC Berkeley

Internship experience at Intuit building an LLM-grounded QA system for internal microservice data across 100+ microservices, using a graph database approach (evaluated Neo4j and selected AWS Neptune for production alignment). Also has UC Berkeley research experience (including work with Prof. Dawn Song / Berkeley Eye Research Lab) and cross-functional collaboration with bioinformatics/biology teams to deploy software systems on research servers.

View profile
YY

Yue Yang

Screened

Intern Data Scientist specializing in GenAI (LLMs, RAG) and ML model optimization

Sunnyvale, CA1y exp
SynopsysColumbia University

Built and deployed a production LLM-powered risk assistant for KPMG and Freddie Mac that lets analysts query a confidential Neo4j risk graph in natural language (no Cypher), turning multi-day analysis into minutes with traceable, cited answers. Implemented rigorous guardrails, deterministic verification, RBAC/security controls, and a full eval/observability stack, cutting query error rate by ~50% and iterating through weekly UAT with non-technical risk analysts.

View profile
JR

Senior Software Engineer specializing in distributed systems and AI workflow orchestration

Austin, TX5y exp
AppleUniversity of Central Missouri

Backend owner at Apple for an AI workflow orchestration service, with hands-on experience stabilizing peak-traffic production systems using OpenTelemetry-style tracing, bounded async concurrency, and database performance tuning. Built and shipped a Python LLM-agent orchestration layer to automate multi-step operational workflows, emphasizing guardrails, auditability, and deterministic fallbacks to keep non-deterministic AI behavior production-safe.

View profile
SB

Mid-level Backend & Reliability Engineer specializing in AWS, Kubernetes, and automation

New Mexico, US5y exp
MetaUniversity of North Carolina at Charlotte

Meta engineer focused on reliability/operations tooling who built a unified real-time health dashboard and scalable telemetry pipelines (AWS + Datadog) for thousands of devices. Also shipped an internal LLM-powered knowledge assistant using RAG over wikis/runbooks/logs with strong guardrails and a rigorous eval loop that drove measurable accuracy improvements via automated doc ingestion and embedding updates.

View profile
PG

Pankaj Gautam

Screened

Senior Cloud Infrastructure & TechOps Leader specializing in AWS, Kubernetes, and SRE

San Francisco, CA27y exp
AmazonCal State East Bay

Infrastructure/platform engineer with hands-on experience running production and non-production Amazon EKS clusters, including upgrade processes and reliability monitoring via Prometheus/Grafana. Also administered on-prem VMware vSphere/vCloud Director and handled a significant vSwitch/VLAN outage, and uses Terraform + Terragrunt with S3 remote state and release-based drift detection across dev/stage/prod.

View profile
YR

Senior Data Engineer specializing in cloud-native data pipelines and lakehouse platforms

6y exp
MicrosoftUniversity of North Texas

Data engineer at Microsoft who owned an end-to-end subscription analytics platform processing 7TB+ daily across 40+ pipelines, combining ADF batch ingestion with Kafka/Spark streaming and rigorous Great Expectations quality gates. Built a Fabric-based self-service ingestion platform with CI/CD and observability, plus a Databricks feature store serving near-real-time ML inference with Delta Lake reliability and versioning.

View profile
Dheeraj Kumar - Intern Data Scientist specializing in marketing analytics and data engineering in Tucson, Arizona

Dheeraj Kumar

Screened

Intern Data Scientist specializing in marketing analytics and data engineering

Tucson, Arizona2y exp
RochePurdue University

AI/LLM practitioner with internships at Dell Technologies and Roche who built and deployed a healthcare-focused "Doctor LLM" by fine-tuning Meta Llama 3.2 on healthcaremagic.json, emphasizing safety guardrails to prevent harmful medical advice. Experienced in productionizing AI workflows with monitoring, testing, and orchestration (Airflow, Kubernetes), and in delivering AI-agent-driven competitive landscape insights to non-technical business stakeholders.

View profile
Zheng Wu - Junior Software Engineer specializing in backend systems and cloud messaging in Mountain View, CA

Zheng Wu

Screened

Junior Software Engineer specializing in backend systems and cloud messaging

Mountain View, CA1y exp
NewsBreakRice University

Data/ML engineer who has owned end-to-end systems across email deliverability/segmentation and production LLM apps. Built a Spark+Airflow segmentation engine that materially improved deliverability (99.9%) and open rates (>50%), and shipped a PDF-to-quiz RAG product using LangChain/Vertex AI/Chroma with strong guardrails and an eval loop that cut hallucinations to <5%.

View profile
Asrith Velireddy - Mid-level AI/ML Engineer specializing in MLOps, LLMs, and scalable ML systems in Harrison, NJ

Mid-level AI/ML Engineer specializing in MLOps, LLMs, and scalable ML systems

Harrison, NJ4y exp
AdobeNJIT

ML/LLM engineer at Adobe who deployed a transformer-based personalization and campaign-targeting recommender system end-to-end, including PySpark/Airflow pipelines processing 12M+ events/day and containerized inference on AWS SageMaker (Docker/Kubernetes). Also has hands-on LLM workflow experience (RAG, semantic search, prompt optimization, hallucination mitigation) with a metrics-driven approach to reliability, drift monitoring, and reproducible retraining via MLflow.

View profile
Jehanzeb Khan - Director-level Engineering Manager specializing in large-scale data and compute platforms in Sunnyvale, CA

Jehanzeb Khan

Screened

Director-level Engineering Manager specializing in large-scale data and compute platforms

Sunnyvale, CA20y exp
AmazonInstitute of Business Administration

Platform and distributed-systems leader (player-coach) who owned architecture and reliability for an Amazon analytics/data platform serving ~100K internal users at exabyte scale. Built an ML-driven “Lakeflow” optimization layer that cut pipeline completion times ~20–25% and reduced compute waste >15%, and led major incident response/redesign efforts (e.g., deletion storm) with strong rollout/observability/rollback practices.

View profile
Jingyao Chen - Junior Backend/Platform Engineer specializing in AI microservices and cloud-native systems in Pittsburgh, PA

Jingyao Chen

Screened

Junior Backend/Platform Engineer specializing in AI microservices and cloud-native systems

Pittsburgh, PA2y exp
MeowyAICarnegie Mellon University

Cofounder at MeowyAI who shipped a production multimodal (vision/voice/text) AI task manager using Gemini, tackling real-world issues like hallucinations, tool-calling safety, and RAG-based preference memory. Also built a production multi-agent RAG system orchestrated with LangGraph (and contributes to LangChain), with strong emphasis on latency optimization, observability (OpenTelemetry), and rigorous testing/evaluation including A/B tests and adversarial prompting.

View profile
Meredith Ma - Entry-level AI/ML Software Engineer specializing in generative AI and computer vision in Pittsburgh, PA

Meredith Ma

Screened

Entry-level AI/ML Software Engineer specializing in generative AI and computer vision

Pittsburgh, PA2y exp
Magna InternationalCarnegie Mellon University

Built and owned a production RAG coding assistant at Magna International used by 200 engineers, with hands-on work across React/TypeScript, retrieval infrastructure, and Postgres observability. Also brings an unusual blend of product UX thinking from AR game onboarding work, showing strength in both technical systems reliability and user activation.

View profile

Need someone specific?

AI Search