Browse Talent Find Talent Open Jobs Pricing FAQsGet Started

Vetted PySpark Professionals

Pre-screened and vetted.

PySpark Python SQL Docker AWS CI/CD

Tate Mara

Senior Data Engineer specializing in cloud data platforms and big data pipelines

Austin, TX11y exp

Accenture

Agile Amazon CloudFront Amazon CloudWatch Amazon DynamoDB Amazon EC2 Amazon ECS+208

View profile

Leela Tikkisetty

Screened

Mid-level Software Engineer specializing in ML platforms and cloud-native backend systems

San Francisco, CA5y exp

City and County of San FranciscoSan Francisco State University

“Software engineer with experience at Google and the City and County of San Francisco building production AI systems, including a RAG-based internal support chatbot and ML-driven ticket priority tagging. Has scaled data/ML platforms with Airflow on GCP (1M+ records/day, 99.9% SLA) and deployed multi-component systems with Docker and Kubernetes (GKE), using modern LLM tooling (LangChain/CrewAI, Claude/OpenAI, Pinecone/ChromaDB, Bedrock/Ollama).”

A/B Testing Agile Amazon Bedrock Amazon EKS Amazon Redshift Authentication+198

View profile

Yuqi Lei

Screened

Mid-level Software Engineer specializing in financial data platforms and quantitative research tooling

New York City, NY3y exp

BloombergWashington University in St. Louis

“Owned and built Bloomberg’s end-to-end bitemporal dividend & dividend-forecast data platform powering BQL for 400k+ terminal users. Architected real-time Kafka ingestion (5k–10k msgs/sec) across 100k+ tickers with strong correctness guarantees (PIT/bitemporal time-travel, immutable history to avoid look-ahead bias) and achieved sub-100ms p95 query latency through indexing and caching, deployed with Kubernetes + DLQ and robust monitoring.”

Python SQL Java JavaScript C++Pandas+60

View profile

Jisvitha Athaluri

Screened

Mid-level AI/ML Engineer specializing in NLP, RAG, and MLOps

McKinney, TX6y exp

Globe LifeTexas A&M University

“Built a production LLM/RAG-based “model excellence scoring” system at Uber to automatically evaluate hundreds of ML models, standardizing quality assessment and cutting evaluation time from days to minutes on GCP. Also delivered an NLP document classification solution for insurance claims at Globe Life, partnering closely with compliance/operations and improving routing accuracy from ~85% manual to 93% with the model.”

A/B Testing Apache Spark BERT ChromaDB Data Engineering Data Pipelines+90

View profile

Sahithi K

Screened

Mid-level Data Engineer specializing in cloud data platforms and streaming pipelines

Boston, MA4y exp

ModernaUniversity of Massachusetts Dartmouth

“Data engineer with experience at Moderna and Block owning high-volume (≈10TB/day) production pipelines on AWS, using Kafka/S3/Glue/dbt/Snowflake with strong data quality and observability practices (schema validation, anomaly detection, CloudWatch monitoring). Also built external financial API ingestion with Airflow retries, throttling/token rotation, and schema versioning, and helped stand up an early-stage biomedical data platform with CI/CD and incident debugging.”

Python SQL PySpark Apache Spark Apache Kafka Amazon Kinesis+94

View profile

Lalithya Manasa Patri

Screened

Senior Data Engineer specializing in cloud ETL and real-time streaming pipelines

Austin, TX5y exp

eBayTexas Tech University

“Data engineer with eBay experience owning end-to-end pipelines for real-time order and user behavior analytics at 10M+ records/day. Strong in PySpark/SQL transformations, Airflow reliability patterns, and production observability (CloudWatch), with measurable outcomes including improved data quality and 30–40% query performance gains. Also built Python data APIs for analytics/ML consumers with versioning and backward compatibility.”

Python SQL Java Scala R Apache Spark+97

View profile

Travoy Spelling

Screened

Senior Data Scientist / ML Engineer specializing in GenAI, LLMs, and NLP

Texarkana, TX10y exp

TredenceUniversity of Texas at Austin

“ML/NLP engineer focused on production GenAI and data linking systems: built a large-scale RAG pipeline over millions of support docs using LangChain/Pinecone and added a LangGraph-based validation layer to cut hallucinations ~40%. Also built scalable PySpark entity resolution (95%+ accuracy) and fine-tuned Sentence-BERT embeddings with contrastive learning for ~30% relevance lift, with strong CI/CD and observability practices (OpenTelemetry, Prometheus/Grafana).”

A/B Testing API Development AWS AWS Lambda AWS Step Functions Azure Data Factory+247

View profile

Byron Pineda

Screened

Staff/Lead Data Scientist specializing in Generative AI, NLP/LLMs, and MLOps

Pascagoula, MS10y exp

TuringMississippi State University

“Lead Data Scientist (10+ years) with recent work in healthcare data: built production pipelines that unify EHR, genomics, and clinical notes using NLP (spaCy/BERT/BioBERT) and scalable Spark-based processing. Also led development of domain-specific LLM/NLP systems for chatbots and semantic search, deploying models via FastAPI/Flask and improving retrieval with FAISS-backed, fine-tuned clinical embeddings and RAG-style workflows.”

Python R SQL Pandas NumPy Scikit-learn+132

View profile

Rishitha Madipelli

Screened

Mid-level Software Engineer specializing in cloud-native distributed systems and streaming data

Austin, TX7y exp

TeslaGeorge Mason University

“Backend/product engineer with Tesla experience building and operating a real-time OTA update monitoring and fleet analytics platform at massive scale (telemetry from 3M+ vehicles). Delivered end-to-end systems across Kafka-based ingestion, TimescaleDB/Postgres analytics modeling, FastAPI/GraphQL APIs, and React/TypeScript dashboards, and handled production scaling incidents on AWS EKS during major rollout spikes.”

Python Java TypeScript SQL Angular Spring Boot+114

View profile

Saiteja Gaddam

Screened

Mid-Level Data Engineer specializing in cloud data platforms and streaming analytics

3y exp

IntuitUniversity at Buffalo

“Data engineer (Intuit) who owned an end-to-end telemetry and subscription analytics platform processing ~22M events/day, built on Kinesis/S3/Glue/Spark/Airflow/Redshift. Strong focus on reliability and data quality (schema drift controls, quarantine layers, idempotent reruns) and performance tuning, achieving a reporting latency reduction from ~15 minutes to under 4 minutes while enabling revenue and churn analytics for business teams.”

Scala Hibernate JDBC JSON HTML CSS+120

View profile

Hari Kiran Reddy Rommala

Screened

Mid-level Full-Stack Software Engineer specializing in cloud and data platforms

Boston, MA5y exp

Northeastern UniversityPenn State University

“Full-stack engineer with experience spanning Amazon IMDb and Northeastern’s NeuroJSON portal, combining consumer product work with complex scientific data applications. Built IMDb’s streaming providers feature—described as the company’s most impactful feature of 2023—and has hands-on experience with React/Angular, GraphQL, AWS, Python services, and production monitoring.”

React TypeScript SQL PostgreSQL Docker Kubernetes+283

View profile

Prafull Prajapati

Screened

Senior Backend Software Engineer specializing in cloud, microservices, and AI systems

Richardson, TX8y exp

The University of Texas at DallasUniversity of Texas at Dallas

“Built an AI-powered job outreach application for his own job search and took it from idea to production use, owning architecture, FastAPI backend, retrieval/generation pipeline, frontend workflow, deployment, and iteration. Especially compelling for teams needing a pragmatic full-stack engineer who can turn LLM-based product ideas into usable, maintainable tools with measurable workflow impact.”

C C++JavaScript Java Python TypeScript+162

View profile

Timothy Yeav

Screened

Senior AI/ML Engineer specializing in Generative AI and FinTech

Bronx, NY8y exp

InsitroNew York City College of Technology (CUNY)

“Built end-to-end LLM/RAG systems for biological data and scientific literature analysis in a drug discovery setting, helping researchers explore disease insights and treatment hypotheses faster. Combines applied GenAI product work with strong production engineering, including monitoring, retrieval optimization, reusable Python services, and scalable deployment on AWS/Kubeflow.”

Generative AI LLaMA GPT Agentic AI BERT Transformers+204

View profile

Manoj Bagul

Screened

Executive Engineering & AI Platform Leader in Enterprise SaaS

New York, NY25y exp

Qlaws.aiSavitribai Phule Pune University

“Healthcare data platform builder with experience at Aetion delivering a rule-based EMR/EHR ingestion and validation framework that cut onboarding from 8–10 weeks to hours and unlocked $30M+ in revenue over ~3 years. Motivated to found an AI/agent-driven healthcare solution, with a specific interest in using PET scans, doctor notes, and treatment data with LLMs to help predict cancer progression and guide next-step treatments.”

AI Agents Analytics AWS Budget Management Campaign Management CI/CD+98

View profile

Yukta Kulkarni

Screened

Junior AI/ML Engineer specializing in applied LLMs, security, and reinforcement learning

New York, USA2y exp

New York UniversityNYU

“Built and shipped a production LLM-powered investor research feature for a fintech product, focused on grounded answers and minimizing hallucinations. Implemented retrieval-quality and evidence-coverage gating with clear refusal fallbacks, and evaluates systems with regression tests and metrics like correct-refusal rate, hallucination rate, and latency. Comfortable orchestrating workflows with LangChain or custom Python depending on production needs.”

Python C C++SQL TypeScript JavaScript+82

View profile

Venkata Sai Pavan Dema

Screened

Mid-level Data Scientist/ML Engineer specializing in GenAI agents and MLOps

5y exp

Capital OneUniversity of the Cumberlands

“AI/LLM engineer at Capital One who deployed a production RAG-powered fraud analysis and document intelligence platform using LangChain, OpenAI, Pinecone, Kafka, and AWS. Focused on reliability in real-time investigations via hybrid retrieval, schema-validated outputs, and LLM verification loops, reporting review-time reduction from hours to minutes and ~99% fraud detection precision.”

A/B Testing Amazon EC2 Amazon Redshift Amazon S3 Amazon SageMaker Azure App Service+163

View profile

Lakshmi Kiranmayi Chelluboyina

Screened

Junior Full-Stack & Data Engineer specializing in cloud platforms and cybersecurity ML

New York, NY2y exp

AccentureNYU

“Built a hackathon "Patient Summary Assistant" backend focused on healthcare workflows, combining RAG-based summarization with HIPAA-minded privacy controls (NER redaction + encryption). Demonstrated strong infra skills by deploying on Kubernetes with Helm/HPA and GitOps (ArgoCD), plus migrating from OpenAI to an on-prem Llama 3 stack (vLLM, quantization, shadow-mode testing) and adding real-time Kafka ingestion for patient vitals/anomaly alerts.”

Agile Apache Spark C C#C++CI/CD+93

View profile

John Joji Melel

Screened

Intern Generative AI Engineer specializing in RAG and multi-agent systems

Chicago, IL2y exp

NeuraFlashUniversity of Chicago

“Built and deployed a production RAG-based multi-agent chatbot during an internship to help consultants answer client questions and guide users through new IT systems with step-by-step instructions. Demonstrates hands-on experience with LangGraph/LangChain/Google ADK, unstructured document parsing and chunking for RAG, and a reliability-first approach to agent workflows (metrics, fallbacks, human-in-the-loop, guardrails).”

Python SQL R C++Kubernetes Docker+87

View profile

Yeshwanth Pulapa

Screened

Mid-level AI/ML Engineer specializing in Databricks, MLOps, and real-time fraud detection

The Colony, TX4y exp

DatabricksUniversity of North Texas

“ML/LLM engineer building production, real-time fraud detection for financial transactions using a two-tier architecture (fast ML + GPT) to deliver both low-latency decisions and analyst-friendly risk explanations. Experienced orchestrating end-to-end retraining, drift monitoring, and automated model promotion with Databricks Jobs/Workflows and MLflow, and partnering closely with fraud analysts to tune alerts, thresholds, and dashboards.”

A/B Testing Apache Airflow Apache Kafka Apache Spark AWS AWS Lambda+93

View profile

Zufeshan Imran

Screened

Senior Machine Learning Engineer specializing in LLMs, RAG, and computer vision

San Diego, CA10y exp

SOTER AIUC San Diego

“Built an "AskMyVideo" system that turns YouTube videos into queryable knowledge graphs by transcribing audio (Whisper), chunking and embedding content, and enabling traceable answers back to exact timestamps. Strong in entity resolution (rules + fuzzy matching + TF-IDF/cosine with PR-curve thresholding) and modern retrieval stacks (FAISS, hybrid dense/sparse, domain fine-tuning with ~12% precision gain), with a production mindset using Airflow/Prefect, Docker/FastAPI, and LangSmith/Prometheus/Grafana observability.”

Machine Learning Deep Learning Generative AI Transformers Large Language Models (LLMs)Retrieval-Augmented Generation (RAG)+120

View profile

sai venkata

Screened

Senior Data Engineer specializing in cloud lakehouse and real-time streaming pipelines

Texas, USA6y exp

CVS HealthUniversity of Central Missouri

“Senior data engineer with experience in both healthcare (CVS Health) and financial services (Bank of America), building large-scale Azure lakehouse pipelines (30+ EHR sources, ~5TB) and real-time streaming services (Event Hubs/Kafka) for patient vitals. Strong focus on reliability and data quality (Great Expectations, monitoring/alerting, schema drift automation), with measurable outcomes like 50% runtime reduction and 99%+ uptime for regulatory reporting pipelines.”

Python SQL Scala Java Shell Scripting Apache Spark+117

View profile

jahnavi Vasala

Screened

Mid-level Data Engineer specializing in cloud data platforms and streaming pipelines

San Diego, CA6y exp

IntuitCleveland State University

“Data engineer with Intuit experience owning end-to-end, high-volume financial data pipelines (API/S3 ingestion, Airflow orchestration, Spark/PySpark + SQL transforms, Snowflake marts). Strong focus on reliability and data quality—achieved 99.8% SLA and cut discrepancies by 35% using Great Expectations, reconciliation, schema versioning, and automated backfills; also built near real-time Kafka/API data services with CI/CD and observability.”

Python SQL PySpark Scala Shell scripting Apache Spark+87

View profile

Rohit Kumar

Screened

Mid-level Data Engineer specializing in large-scale analytics platforms

San Jose, CA5y exp

NutanixUSC

“Data/Backend engineer with experience at Naukri building large-scale analytics products over a 130M+ user base, including Spark/Airflow pipelines and Kafka-based clickstream validation with Confluent Schema Registry. Also built an audience segmentation backend (Athena/S3 + Spring Boot APIs) for non-technical internal teams and recently shipped a GenAI customer data audit system (FastAPI/Postgres/Llama) that cut sales-planning validation from ~3 months to ~1 week.”

Algorithms Amazon Athena Amazon S3 Apache Hadoop Apache Hive Apache Kafka+95

View profile

Machine Learning Engineers Software Engineers Data Scientists Data Engineers Data Analysts AI Engineers AI & Machine Learning Data & Analytics Engineering Education

Need someone specific?

AI Search

Related

Need someone specific?