Vetted PySpark Professionals

Pre-screened and vetted.

TM

Senior Data Engineer specializing in cloud data platforms and big data pipelines

Austin, TX11y exp
Accenture
View profile
LT

Mid-level Software Engineer specializing in ML platforms and cloud-native backend systems

San Francisco, CA5y exp
City and County of San FranciscoSan Francisco State University

Software engineer with experience at Google and the City and County of San Francisco building production AI systems, including a RAG-based internal support chatbot and ML-driven ticket priority tagging. Has scaled data/ML platforms with Airflow on GCP (1M+ records/day, 99.9% SLA) and deployed multi-component systems with Docker and Kubernetes (GKE), using modern LLM tooling (LangChain/CrewAI, Claude/OpenAI, Pinecone/ChromaDB, Bedrock/Ollama).

View profile
YL

Yuqi Lei

Screened

Mid-level Software Engineer specializing in financial data platforms and quantitative research tooling

New York City, NY3y exp
BloombergWashington University in St. Louis

Owned and built Bloomberg’s end-to-end bitemporal dividend & dividend-forecast data platform powering BQL for 400k+ terminal users. Architected real-time Kafka ingestion (5k–10k msgs/sec) across 100k+ tickers with strong correctness guarantees (PIT/bitemporal time-travel, immutable history to avoid look-ahead bias) and achieved sub-100ms p95 query latency through indexing and caching, deployed with Kubernetes + DLQ and robust monitoring.

View profile
JA

Mid-level AI/ML Engineer specializing in NLP, RAG, and MLOps

McKinney, TX6y exp
Globe LifeTexas A&M University

Built a production LLM/RAG-based “model excellence scoring” system at Uber to automatically evaluate hundreds of ML models, standardizing quality assessment and cutting evaluation time from days to minutes on GCP. Also delivered an NLP document classification solution for insurance claims at Globe Life, partnering closely with compliance/operations and improving routing accuracy from ~85% manual to 93% with the model.

View profile
SK

Sahithi K

Screened

Mid-level Data Engineer specializing in cloud data platforms and streaming pipelines

Boston, MA4y exp
ModernaUniversity of Massachusetts Dartmouth

Data engineer with experience at Moderna and Block owning high-volume (≈10TB/day) production pipelines on AWS, using Kafka/S3/Glue/dbt/Snowflake with strong data quality and observability practices (schema validation, anomaly detection, CloudWatch monitoring). Also built external financial API ingestion with Airflow retries, throttling/token rotation, and schema versioning, and helped stand up an early-stage biomedical data platform with CI/CD and incident debugging.

View profile
LM

Senior Data Engineer specializing in cloud ETL and real-time streaming pipelines

Austin, TX5y exp
eBayTexas Tech University

Data engineer with eBay experience owning end-to-end pipelines for real-time order and user behavior analytics at 10M+ records/day. Strong in PySpark/SQL transformations, Airflow reliability patterns, and production observability (CloudWatch), with measurable outcomes including improved data quality and 30–40% query performance gains. Also built Python data APIs for analytics/ML consumers with versioning and backward compatibility.

View profile
Travoy Spelling - Senior Data Scientist / ML Engineer specializing in GenAI, LLMs, and NLP in Texarkana, TX

Senior Data Scientist / ML Engineer specializing in GenAI, LLMs, and NLP

Texarkana, TX10y exp
TredenceUniversity of Texas at Austin

ML/NLP engineer focused on production GenAI and data linking systems: built a large-scale RAG pipeline over millions of support docs using LangChain/Pinecone and added a LangGraph-based validation layer to cut hallucinations ~40%. Also built scalable PySpark entity resolution (95%+ accuracy) and fine-tuned Sentence-BERT embeddings with contrastive learning for ~30% relevance lift, with strong CI/CD and observability practices (OpenTelemetry, Prometheus/Grafana).

View profile
Byron Pineda - Staff/Lead Data Scientist specializing in Generative AI, NLP/LLMs, and MLOps in Pascagoula, MS

Byron Pineda

Screened

Staff/Lead Data Scientist specializing in Generative AI, NLP/LLMs, and MLOps

Pascagoula, MS10y exp
TuringMississippi State University

Lead Data Scientist (10+ years) with recent work in healthcare data: built production pipelines that unify EHR, genomics, and clinical notes using NLP (spaCy/BERT/BioBERT) and scalable Spark-based processing. Also led development of domain-specific LLM/NLP systems for chatbots and semantic search, deploying models via FastAPI/Flask and improving retrieval with FAISS-backed, fine-tuned clinical embeddings and RAG-style workflows.

View profile
Rishitha Madipelli - Mid-level Software Engineer specializing in cloud-native distributed systems and streaming data in Austin, TX

Mid-level Software Engineer specializing in cloud-native distributed systems and streaming data

Austin, TX7y exp
TeslaGeorge Mason University

Backend/product engineer with Tesla experience building and operating a real-time OTA update monitoring and fleet analytics platform at massive scale (telemetry from 3M+ vehicles). Delivered end-to-end systems across Kafka-based ingestion, TimescaleDB/Postgres analytics modeling, FastAPI/GraphQL APIs, and React/TypeScript dashboards, and handled production scaling incidents on AWS EKS during major rollout spikes.

View profile
Saiteja Gaddam - Mid-Level Data Engineer specializing in cloud data platforms and streaming analytics

Mid-Level Data Engineer specializing in cloud data platforms and streaming analytics

3y exp
IntuitUniversity at Buffalo

Data engineer (Intuit) who owned an end-to-end telemetry and subscription analytics platform processing ~22M events/day, built on Kinesis/S3/Glue/Spark/Airflow/Redshift. Strong focus on reliability and data quality (schema drift controls, quarantine layers, idempotent reruns) and performance tuning, achieving a reporting latency reduction from ~15 minutes to under 4 minutes while enabling revenue and churn analytics for business teams.

View profile
HK

Mid-level Full-Stack Software Engineer specializing in cloud and data platforms

Boston, MA5y exp
Northeastern UniversityPenn State University

Full-stack engineer with experience spanning Amazon IMDb and Northeastern’s NeuroJSON portal, combining consumer product work with complex scientific data applications. Built IMDb’s streaming providers feature—described as the company’s most impactful feature of 2023—and has hands-on experience with React/Angular, GraphQL, AWS, Python services, and production monitoring.

View profile
PP

Senior Backend Software Engineer specializing in cloud, microservices, and AI systems

Richardson, TX8y exp
The University of Texas at DallasUniversity of Texas at Dallas

Built an AI-powered job outreach application for his own job search and took it from idea to production use, owning architecture, FastAPI backend, retrieval/generation pipeline, frontend workflow, deployment, and iteration. Especially compelling for teams needing a pragmatic full-stack engineer who can turn LLM-based product ideas into usable, maintainable tools with measurable workflow impact.

View profile
TY

Timothy Yeav

Screened

Senior AI/ML Engineer specializing in Generative AI and FinTech

Bronx, NY8y exp
InsitroNew York City College of Technology (CUNY)

Built end-to-end LLM/RAG systems for biological data and scientific literature analysis in a drug discovery setting, helping researchers explore disease insights and treatment hypotheses faster. Combines applied GenAI product work with strong production engineering, including monitoring, retrieval optimization, reusable Python services, and scalable deployment on AWS/Kubeflow.

View profile
MB

Manoj Bagul

Screened

Executive Engineering & AI Platform Leader in Enterprise SaaS

New York, NY25y exp
Qlaws.aiSavitribai Phule Pune University

Healthcare data platform builder with experience at Aetion delivering a rule-based EMR/EHR ingestion and validation framework that cut onboarding from 8–10 weeks to hours and unlocked $30M+ in revenue over ~3 years. Motivated to found an AI/agent-driven healthcare solution, with a specific interest in using PET scans, doctor notes, and treatment data with LLMs to help predict cancer progression and guide next-step treatments.

View profile
YK

Junior AI/ML Engineer specializing in applied LLMs, security, and reinforcement learning

New York, USA2y exp
New York UniversityNYU

Built and shipped a production LLM-powered investor research feature for a fintech product, focused on grounded answers and minimizing hallucinations. Implemented retrieval-quality and evidence-coverage gating with clear refusal fallbacks, and evaluates systems with regression tests and metrics like correct-refusal rate, hallucination rate, and latency. Comfortable orchestrating workflows with LangChain or custom Python depending on production needs.

View profile
VS

Mid-level Data Scientist/ML Engineer specializing in GenAI agents and MLOps

5y exp
Capital OneUniversity of the Cumberlands

AI/LLM engineer at Capital One who deployed a production RAG-powered fraud analysis and document intelligence platform using LangChain, OpenAI, Pinecone, Kafka, and AWS. Focused on reliability in real-time investigations via hybrid retrieval, schema-validated outputs, and LLM verification loops, reporting review-time reduction from hours to minutes and ~99% fraud detection precision.

View profile
LK

Junior Full-Stack & Data Engineer specializing in cloud platforms and cybersecurity ML

New York, NY2y exp
AccentureNYU

Built a hackathon "Patient Summary Assistant" backend focused on healthcare workflows, combining RAG-based summarization with HIPAA-minded privacy controls (NER redaction + encryption). Demonstrated strong infra skills by deploying on Kubernetes with Helm/HPA and GitOps (ArgoCD), plus migrating from OpenAI to an on-prem Llama 3 stack (vLLM, quantization, shadow-mode testing) and adding real-time Kafka ingestion for patient vitals/anomaly alerts.

View profile
JJ

Intern Generative AI Engineer specializing in RAG and multi-agent systems

Chicago, IL2y exp
NeuraFlashUniversity of Chicago

Built and deployed a production RAG-based multi-agent chatbot during an internship to help consultants answer client questions and guide users through new IT systems with step-by-step instructions. Demonstrates hands-on experience with LangGraph/LangChain/Google ADK, unstructured document parsing and chunking for RAG, and a reliability-first approach to agent workflows (metrics, fallbacks, human-in-the-loop, guardrails).

View profile
YP

Mid-level AI/ML Engineer specializing in Databricks, MLOps, and real-time fraud detection

The Colony, TX4y exp
DatabricksUniversity of North Texas

ML/LLM engineer building production, real-time fraud detection for financial transactions using a two-tier architecture (fast ML + GPT) to deliver both low-latency decisions and analyst-friendly risk explanations. Experienced orchestrating end-to-end retraining, drift monitoring, and automated model promotion with Databricks Jobs/Workflows and MLflow, and partnering closely with fraud analysts to tune alerts, thresholds, and dashboards.

View profile
ZI

Senior Machine Learning Engineer specializing in LLMs, RAG, and computer vision

San Diego, CA10y exp
SOTER AIUC San Diego

Built an "AskMyVideo" system that turns YouTube videos into queryable knowledge graphs by transcribing audio (Whisper), chunking and embedding content, and enabling traceable answers back to exact timestamps. Strong in entity resolution (rules + fuzzy matching + TF-IDF/cosine with PR-curve thresholding) and modern retrieval stacks (FAISS, hybrid dense/sparse, domain fine-tuning with ~12% precision gain), with a production mindset using Airflow/Prefect, Docker/FastAPI, and LangSmith/Prometheus/Grafana observability.

View profile
SV

sai venkata

Screened

Senior Data Engineer specializing in cloud lakehouse and real-time streaming pipelines

Texas, USA6y exp
CVS HealthUniversity of Central Missouri

Senior data engineer with experience in both healthcare (CVS Health) and financial services (Bank of America), building large-scale Azure lakehouse pipelines (30+ EHR sources, ~5TB) and real-time streaming services (Event Hubs/Kafka) for patient vitals. Strong focus on reliability and data quality (Great Expectations, monitoring/alerting, schema drift automation), with measurable outcomes like 50% runtime reduction and 99%+ uptime for regulatory reporting pipelines.

View profile
JV

Mid-level Data Engineer specializing in cloud data platforms and streaming pipelines

San Diego, CA6y exp
IntuitCleveland State University

Data engineer with Intuit experience owning end-to-end, high-volume financial data pipelines (API/S3 ingestion, Airflow orchestration, Spark/PySpark + SQL transforms, Snowflake marts). Strong focus on reliability and data quality—achieved 99.8% SLA and cut discrepancies by 35% using Great Expectations, reconciliation, schema versioning, and automated backfills; also built near real-time Kafka/API data services with CI/CD and observability.

View profile
RK

Rohit Kumar

Screened

Mid-level Data Engineer specializing in large-scale analytics platforms

San Jose, CA5y exp
NutanixUSC

Data/Backend engineer with experience at Naukri building large-scale analytics products over a 130M+ user base, including Spark/Airflow pipelines and Kafka-based clickstream validation with Confluent Schema Registry. Also built an audience segmentation backend (Athena/S3 + Spring Boot APIs) for non-technical internal teams and recently shipped a GenAI customer data audit system (FastAPI/Postgres/Llama) that cut sales-planning validation from ~3 months to ~1 week.

View profile

Need someone specific?

AI Search