Vetted Site Reliability Engineering Professionals

Pre-screened and vetted.

IT

Mid-level Software Engineer specializing in AWS serverless and AI support automation

New York, NY2y exp
Project LoadGuard Inc.NYU
View profile
ZW

Senior Software Engineer specializing in distributed systems and cloud platforms

Plano, TX8y exp
JPMorgan ChaseRochester Institute of Technology
View profile
MM

Senior DevSecOps & Observability Engineer specializing in cloud automation and STIG compliance

Orlando, FL7y exp
IBMMercer University
View profile
CM

Executive Engineering Leader specializing in cloud, DevSecOps, and large-scale platform modernization

Tampa, FL17y exp
PwCOregon Institute of Technology

Co-founded a Digital Loss Prevention (DLP) startup and raised $6M in seed funding by showcasing a controlled, laptop-based technology demo. Post-funding, drove MVP planning and execution by sequencing operations and assembling a team to build an appliance MVP, using an iterative build/evaluate/visualize approach.

View profile
MP

Senior Director of Software Engineering specializing in cloud-native microservices for streaming platforms

San Jose, CA20y exp
XperiAnna University

Engineering leader who drove TiVo IPTV’s client-facing API modernization from a monolith to AWS-based microservices (API Gateway, Lambda, EKS, Kafka, DynamoDB/RDS), including phased/blue-green production routing of millions of calls. Emphasizes org scaling through skill-based hiring, mentorship, and a you-build-you-run ownership culture, while balancing technical leadership with executive stakeholder communication and budgeting.

View profile
Alexander Goldberg - Executive Technology & Product Leader specializing in AI/AR SaaS and cybersecurity in Los Angeles, CA

Executive Technology & Product Leader specializing in AI/AR SaaS and cybersecurity

Los Angeles, CA30y exp
ZugaraLomonosov Moscow State University

Engineering/technology leader with mission-critical experience at JPL NASA on the Mars Curiosity Rover, delivering an AI-driven navigation system designed for zero tolerance for mistakes and reportedly operating with no failures for 15+ years. Also led a monolith-to-microservices, cloud-native migration that improved scalability by 300% and cut deployments from days to hours, and is comfortable switching between executive fundraising/stakeholder communication and deep technical leadership.

View profile
Aaron Li - Junior AI/ML Engineer specializing in production LLM systems and RAG in Atlanta, GA

Aaron Li

Screened

Junior AI/ML Engineer specializing in production LLM systems and RAG

Atlanta, GA2y exp
Georgia Institute of TechnologyUniversity of Chicago

LLM/document AI engineer who owned a production-grade contract extraction pipeline at CORAMA.AI, ingesting PDFs and dynamic JavaScript sites from 1,000+ government sources. Built a hybrid deterministic+LLM system with two-phase prompting, Pydantic guardrails, confidence scoring, and human-in-the-loop review—cutting error rates from ~35% to <5% and processing 50k+ documents at ~95% accuracy. Also built clinician-in-the-loop orchestration in research, reducing manual labeling time from 3–4 hours to ~50 minutes.

View profile
AD

Mid Backend Software Engineer specializing in FinTech platforms

Jersey City, NJ3y exp
JPMorgan ChaseNYU

Frontend-leaning full-stack engineer with hands-on experience building financial operations and transaction monitoring products from 0→1 through production scale. They stand out for owning React UI architecture, backend/API integration, and data-layer performance decisions while making pragmatic startup tradeoffs and improving features post-launch based on latency, error, and user feedback.

View profile
KJ

Krishi Jain

Screened

Junior Implementation Manager / Solution Engineer specializing in AI, ERP integrations, and predictive maintenance

Chicago, IL2y exp
Continuum AIWestcliff University

LLM/agentic workflow practitioner (Continuum AI) who productionized an LLM system for manufacturing RMA intake and warranty claims by moving from a brittle prompt to a modular pipeline with RAG, function-calling extraction, deterministic validation, and strong observability. Also diagnosed and fixed an agentic ticket-triage misrouting issue by tracing failures to retrieval timeouts, adding guardrails/fallbacks, and implementing retries plus continuous evaluation—bringing misroutes near zero while creating a repeatable debugging playbook.

View profile
MB

Manoj Bagul

Screened

Executive Engineering & AI Platform Leader in Enterprise SaaS

New York, NY25y exp
Qlaws.aiSavitribai Phule Pune University

Healthcare data platform builder with experience at Aetion delivering a rule-based EMR/EHR ingestion and validation framework that cut onboarding from 8–10 weeks to hours and unlocked $30M+ in revenue over ~3 years. Motivated to found an AI/agent-driven healthcare solution, with a specific interest in using PET scans, doctor notes, and treatment data with LLMs to help predict cancer progression and guide next-step treatments.

View profile
NM

Nathan Moore

Screened

Principal Architect specializing in SRE, DevOps, and large-scale cloud/CDN platforms

Dallas, Texas14y exp
Inertia LabsUCLA

Engineering leader who drove the conception, PRD, architecture, and delivery of MaxCDN’s next-generation CDN platform ("E2"), including control plane work, hardware deployment planning, and observability/billing data processing. Also built Krypton Labs’ engineering team from the first hires, using a flat Agile structure and emphasizing constructive conflict, strong documentation, and remote-team accountability.

View profile
Alex Vo - Staff Backend Software Engineer specializing in telemetry pipelines and observability in San Jose, CA

Alex Vo

Screened

Staff Backend Software Engineer specializing in telemetry pipelines and observability

San Jose, CA3y exp
VMwareUC Irvine

Backend engineer from VMware focused on proprietary enterprise systems (monitoring tools, data pipelines, and APIs). Drove a ClickHouse migration POC (local to remote host) using a dual-write/cutover approach and source-level debugging across Node/driver differences during a Node 12→20 upgrade, and delivered measurable performance gains (~20% CPU/memory improvement) through batching and streaming ingestion.

View profile
Shruti Krishnagiri - Executive Engineering Leader & Technical Founder specializing in AI automation platforms in San Francisco Bay Area, California

Executive Engineering Leader & Technical Founder specializing in AI automation platforms

San Francisco Bay Area, California20y exp
BundledStanford University

Founder/CTO who built and shipped a consumer subscription-bundling platform end-to-end (architecture, implementation, testing) and scaled it to thousands of customers and major partners. Previously led a major reliability overhaul at Chan Zuckerberg Initiative for a Google-Docs-like ed-tech product—boosted observability, introduced incident management, and migrated to a Docker-based scalable architecture. Heavy user of AI tools (Cursor/Claude) for development, testing, and code review, with a strong bias toward lightweight, fast-moving execution.

View profile
Harsh Sanas - Intern-level Software Engineer specializing in GenAI, RAG, and backend systems in San Francisco, CA

Harsh Sanas

Screened

Intern-level Software Engineer specializing in GenAI, RAG, and backend systems

San Francisco, CA2y exp
Scale AIUSC

AI/LLM engineer focused on shipping production-grade agents that automate support, sales intake, and ERP-connected workflows. Stands out for combining strong orchestration and guardrails with measurable business outcomes, including 45% faster support handling, ~$1.2M annual savings, 18% higher customer satisfaction, and 99.5%+ reliability in production.

View profile
RM

Ruby Medeiros

Screened

Staff SRE and Software Engineer specializing in distributed systems and cloud reliability

11y exp
ArenaNOVA University Lisbon

Built a production B2C behavioral interview system for job seekers using LangGraph/LangChain on AWS Bedrock with Nova models, plus a FastAPI backend and Vercel AI SDK frontend. Stands out for practical agent reliability work: local stress testing, OpenTelemetry-to-Datadog observability, token/cost monitoring, and guardrails to keep conversations on track and resistant to instruction override.

View profile
HY

Mid-level AI/ML Engineer specializing in telematics, embedded systems, and MLOps

Mossville, IL5y exp
CaterpillarGeorgia Tech

Built and deployed a retail customer review intelligence platform by fine-tuning BERT for sentiment/topic extraction and pairing it with a recommendation component. Demonstrates strong production ML rigor (error analysis, relabeling/active sampling, thresholding/guardrails, OOD checks) and AWS-based orchestration at scale (Lambda + SageMaker with batching and concurrency controls), plus proven ability to align non-technical stakeholders on measurable outcomes.

View profile
PS

Palak Siroya

Screened

Senior Site Reliability Engineer specializing in Azure cloud reliability and data analytics

Renton, WA10y exp
MicrosoftCentral Washington University

AppSec-focused customer advisor with hands-on experience integrating SAST/DAST/SCA into production CI/CD (Azure DevOps) and designing secure agent/scanning deployments in AWS (least-privilege IAM, private subnets, VPC endpoints). Demonstrates strong incident troubleshooting using logs/metrics/traces to diagnose load-related failures (timeouts/retry storms) and drive durable fixes, while tailoring risk/tradeoff communication across engineering, security, and leadership stakeholders.

View profile
Lamar Petty - Mid-level Full-Stack Product Engineer specializing in data-driven web apps and healthcare systems in San Francisco, CA

Lamar Petty

Screened

Mid-level Full-Stack Product Engineer specializing in data-driven web apps and healthcare systems

San Francisco, CA13y exp
Wikimedia FoundationGeorgia Tech

Full-stack engineer with production experience shipping a healthcare-focused web app (Pregnancy-Pal) using Next.js/TypeScript on GCP, integrating a Python/Flask middleware and FHIR server for patient/practitioner dashboards and messaging. Former Wikimedia Foundation Android engineer who led the end-to-end 'Year in Review' feature and built robust automated testing/CI practices (Espresso, GitHub Actions matrix). Strong emphasis on reliability via rigorous validation, comprehensive Postman testing, and detailed API documentation.

View profile
PS

Senior Software Engineer specializing in backend infrastructure, cloud automation, and reliability

Mountain View, CA8y exp
OracleStony Brook University

End-to-end deployment owner for Oracle document delivery/print services in a hospital-like production environment, focused on reliability/performance at scale (thousands of systems). Also describes implementing event-driven RAG/agentic LLM workflows with attention to embeddings/index consistency, latency, and measurable improvements in response relevance and operational efficiency.

View profile
VS

Mid-Level Software Engineer specializing in LLM agents and real-time data streaming

8y exp
AmazonRutgers University–New Brunswick

Software engineer with experience at Striim and Amazon who ships end-to-end production systems across UI, backend, ML, and operations. Built a real-time PII detection capability for a streaming data platform by integrating Python ML inference into a Java monolith via gRPC sidecars, achieving ~3M events/hour throughput and ~93% accuracy, and helped drive enterprise adoption (Fiserv, CVS). Also modernized internal Amazon tooling for multi-region scale with modularization and fully automated deployments.

View profile
SP

Mid-level Backend Software Engineer specializing in Python APIs and payment systems

USA6y exp
StripeSouthern Illinois University Carbondale

Backend/ML systems engineer with Stripe payments experience who built an asynchronous processing upgrade handling millions of API requests, cutting peak latency ~20–25% while preserving strict financial consistency via idempotency-safe retries and robust validation/fallbacks. Also built scalable ETL pipelines for messy CSV/Excel/API data with strong observability (structured logging/monitoring) and reliability mechanisms.

View profile
Deepika Gotla - Senior Technical Support Engineer specializing in Azure Cloud & Generative AI in Bellevue, WA

Deepika Gotla

Screened

Senior Technical Support Engineer specializing in Azure Cloud & Generative AI

Bellevue, WA7y exp
MicrosoftSUNY New Paltz

Microsoft cloud/infra engineer with 5+ years supporting enterprise Azure environments, specializing in security-focused networking (private endpoints, DNS) and production troubleshooting across Azure Front Door/App Gateway WAF/AKS. Has implemented posture improvements via Defender for Cloud, Azure Policy, and RBAC tightening, and also designs secure AWS agent/scanner integrations and modern EKS/GitHub Actions/Secrets Manager observability-enabled SDK rollouts.

View profile
Shriya Bannikop - Mid-level Software Engineer specializing in cloud platforms, data engineering, and distributed systems in Seattle, WA

Mid-level Software Engineer specializing in cloud platforms, data engineering, and distributed systems

Seattle, WA5y exp
Amazon Web ServicesKLE Technological University

Full-stack engineer who built and owned an AI-assisted job-matching dashboard in Next.js App Router/TypeScript, keeping LLM logic server-side and improving performance via deduplication, caching/revalidation, and streaming (35% fewer duplicate LLM calls; 40% faster first render). Also has strong data/backend chops: designed Postgres models and optimized queries at million-record scale (1.8s to 120ms) and built durable AWS multi-region telemetry workflows with idempotency, retries, and monitoring.

View profile

Need someone specific?

AI Search