Vetted PySpark Professionals

Pre-screened and vetted.

NR

Mid-level AI Engineer specializing in LLMs, RAG, and MLOps

5y exp
Wells FargoSouthern Methodist University

Built and deployed a production RAG-based internal knowledge assistant that let analysts query company documents in natural language, using LangChain/LangGraph with Pinecone and a FastAPI service for integration. Emphasizes reliability in production through hallucination mitigation (retrieval tuning + prompt guardrails) and measurable evaluation/monitoring (accuracy, latency, task completion, hallucination rate), iterating based on user feedback.

View profile
DB

Mid-level AI/ML Engineer specializing in LLMs, RAG, and enterprise AI

Fairfax, VA5y exp
Freddie MacGeorge Mason University

Built an enterprise RAG-based document intelligence system at Freddie Mac for regulatory and financial documents, helping analysts cut search time from hours to minutes while improving retrieval accuracy by ~30%. Stands out for combining LLM product delivery with compliance-grade auditability, production monitoring, and scalable Python/FastAPI service design.

View profile
Sai Sri Kolanu - Mid-level AI Engineer specializing in LLMs, RAG, and production ML systems in Dearborn, MI

Mid-level AI Engineer specializing in LLMs, RAG, and production ML systems

Dearborn, MI4y exp
FordUniversity at Buffalo

Built and shipped an AI-powered RAG diagnostic assistant at Ford for EV technicians, integrating GPT-based models with LangChain, FAISS, and SageMaker into real technician workflows. Stands out for combining strong production LLM architecture with practical safety guardrails, monitoring, and measurable impact: 45% better diagnostic accuracy and roughly 30 minutes saved per case.

View profile
AS

Adit Shah

Screened

Mid AI/ML Engineer specializing in computer vision, NLP, and LLM systems

USA4y exp
Omnic.AINortheastern University

AI/full-stack engineer in gaming analytics who joined Omnic.ai at a 2-person stage, helped grow with the company, and built both backend and frontend for real-time gameplay analysis products. He combines computer vision production experience with LLM/RAG systems work, and has already led 4 employees while shipping 12 models in a fast-moving startup environment.

View profile
DP

Dev PARIKH

Screened

Mid-level Software Engineer specializing in backend systems and applied AI

Baltimore, MD4y exp
QualcommUniversity of Maryland, Baltimore County

Backend/full-stack engineer at Qualcomm who built and operated a drift monitoring platform for 10k+ edge AI models. Stands out for combining strong TypeScript/React/Node execution with production-grade systems thinking across PostgreSQL tuning, Redis caching, ECS deployments, and Kafka-based architectural improvements that measurably improved reliability and release speed.

View profile
MS

Manali Shetye

Screened

Mid-level Applied AI & Data Engineer specializing in automation and enterprise analytics

Irving, Texas4y exp
Trend MicroUniversity of Texas at Arlington

Backend engineer with experience evolving a high-volume agricultural loan processing platform (APMS) at HDFC Bank, emphasizing transactional integrity, auditability, and modularity while integrating with credit bureaus, document management, and risk engines. Also improved automation/reporting robustness at Trend Micro by catching duplicate-event retry edge cases and adding idempotency safeguards.

View profile
KP

Mid-level Data Analytics & ML Engineer specializing in NLP, LLMs, and cloud data platforms

Dallas, TX5y exp
MattelKennesaw State University

At KPMG, built and productionized a secure RAG-based LLM assistant that lets business and risk stakeholders query data warehouses in natural language, reducing dependence on data engineers for ad-hoc analysis. Demonstrates strong production rigor (Airflow orchestration, CI/CD, containerization), retrieval/embedding tuning (rechunking, semantic abstraction for structured data), and reliability controls (confidence thresholds, refusal behavior, monitoring and canary evals).

View profile
SB

Sharath Bandi

Screened

Mid-level Generative AI Engineer specializing in LLMs, RAG, and multimodal generation

Saint Louis, Missouri4y exp
LSEGAvila University

Open-source JavaScript contributor focused on performance and maintainability in data visualization libraries—refactored legacy ES5 into modular ES6, added tests/docs, and delivered ~30% faster load times with positive community adoption. Also optimized a React dashboard (~40% load-time reduction) and took ownership in an ambiguous AI product initiative by setting milestones, standing up an initial ML pipeline, and shipping a prototype in ~6 weeks that became the basis for production.

View profile
MB

Manav Bhasin

Screened

Junior Full-Stack Machine Learning Engineer specializing in production ML systems

San Jose, CA2y exp
AgroFocal Technologies IncSan José State University

Software engineer who owned end-to-end delivery of customer-facing agricultural forecast reporting (crop yield/health) and iterated quickly via rigorous edge-case testing and customer feedback. Also built an internal ML training platform (TypeScript/React + Flask/Python + MongoDB) used by every developer, with architecture designed to stay responsive under heavy compute load.

View profile
SK

Mid-level Data Scientist / ML Engineer specializing in streaming ML systems for healthcare and IoT

Urbandale, IA4y exp
John DeereAuburn University at Montgomery

ML/GenAI engineer with production experience building an LLM-powered governance layer that summarizes verified drift/performance signals into validation reports and release notes, designed for regulated environments with de-identification and non-blocking fallbacks. Strong Airflow-based orchestration background across healthcare and finance, integrating Databricks/Spark and MLflow for scalable retraining/monitoring. Demonstrated ability to partner with non-technical healthcare operations teams to deliver actionable risk-scoring outputs via dashboards and automated reporting.

View profile
SS

Sowmya Sree

Screened

Mid-level Machine Learning Engineer specializing in LLM agents, RAG, and MLOps

Dallas, TX5y exp
Bank of AmericaUniversity of North Texas

Built production LLM systems including a real-time customer feedback analysis and workflow automation platform using RAG and multi-agent orchestration with confidence-based human escalation, addressing privacy and legacy integration challenges. Also automated ML operations with Airflow/Kubernetes (e.g., daily churn model retraining) cutting retraining time to under 30 minutes, and demonstrates a rigorous testing/monitoring approach plus strong non-technical stakeholder collaboration.

View profile
HK

Mid-level Data Scientist specializing in Generative AI and NLP

USA6y exp
CVS HealthUniversity of Central Missouri

ML/GenAI engineer with recent CVS Health experience building a production RAG system over unstructured financial/research documents using LangChain, FAISS, and Pinecone, plus LoRA/PEFT fine-tuning of GPT/LLaMA for domain-aware summarization. Demonstrates strong applied MLOps and data engineering skills (Airflow/Prefect, Docker/Kubernetes, CI/CD, MLflow) and measurable impact (sub-second retrieval, ~40% better context retrieval, ~25% entity matching improvement).

View profile
SL

Mid-level Data Engineer specializing in cloud ETL/ELT and healthcare analytics

Dallas, TX5y exp
Lightbeam Health SolutionsSyracuse University

Healthcare-focused data engineer/ML practitioner with experience at Lightbeam Health Solutions and Humana building production entity-resolution and semantic similarity pipelines across EMR, lab, and claims data. Uses NLP/ML (spaCy, scikit-learn, BioBERT/LightGBM) plus Snowflake/Airflow and vector search (Pinecone) to improve linkage accuracy (reported 90%) and semantic match quality (reported +12–15%), while reducing manual cleanup by 40%+.

View profile
MK

Senior Data Analyst specializing in data pipelines, web scraping, and legal data enrichment

Illinois, USA5y exp
The HartfordIndiana Wesleyan University

Data engineer focused on reliable, scalable analytics pipelines and external data collection. Has owned end-to-end pipelines processing 5–10M records/day, serving Snowflake data marts to Power BI/Tableau, and reports ~99% reliability through strong validation/monitoring. Also shipped versioned REST APIs for curated data with query optimization and caching.

View profile
AH

Ansh Harjai

Screened

Junior Software Engineer specializing in AI, RAG systems, and backend development

Brooklyn, NY1y exp
New York UniversityNYU

Built an NYU software engineering capstone called “Smart Cash AI,” a multi-agent LLM-powered web app that curates offline-ready podcasts/articles/videos/news based on user preferences and commute schedules. Architected agent orchestration (discovery/downloader/summarizer), real-time progress via WebSockets, and an ETL normalization layer across RSS/YouTube and other sources with GUID-based deduplication, retries, and failure isolation to keep the system predictable.

View profile
Rishitha reddy katamareddy - Mid-level Generative AI & Machine Learning Engineer specializing in agentic LLM systems in USA

Mid-level Generative AI & Machine Learning Engineer specializing in agentic LLM systems

USA4y exp
OptumUniversity at Buffalo

Built and deployed a production agentic LLM knowledge assistant that answers complex questions over internal documents, APIs, and databases using a RAG architecture (FAISS/Pinecone) and LangChain/LangGraph orchestration. Emphasizes production-grade reliability and hallucination control through grounding, confidence thresholds, validation, retries/fallbacks, and full observability (logging/metrics/traces) with continuous evaluation and feedback loops.

View profile
Daniel Jin - Intern Site Reliability Engineer specializing in Kubernetes, AWS, and observability in New York, NY

Daniel Jin

Screened

Intern Site Reliability Engineer specializing in Kubernetes, AWS, and observability

New York, NY1y exp
Woori America BankNYU

Backend/data engineering candidate specializing in Python/Flask services and ML-enabled systems, deploying containerized workloads on AWS ECS/EKS with strong observability (Prometheus/Grafana) and PostgreSQL performance tuning. Built multi-tenant architectures with row- and schema-level isolation and optimized a Kubernetes-based Airflow + Spark nightly ETL pipeline for an e-commerce client, improving performance by 250%+ and reliably beating morning reporting deadlines; also contributed to Apache Airflow (SQLAlchemy/PostgreSQL area).

View profile
Sai Nekkanti - Mid-level Data Scientist / ML Engineer specializing in secure GenAI and financial compliance in Mount Laurel, NJ

Sai Nekkanti

Screened

Mid-level Data Scientist / ML Engineer specializing in secure GenAI and financial compliance

Mount Laurel, NJ4y exp
MetLifeRowan University

Built a production "sentinel insight engine" to tame information overload from millions of product reviews and support transcripts, combining Azure OpenAI (GPT-3.5) zero-shot classification with a fine-tuned T5 summarizer to generate weekly actionable product insights. Demonstrated strong MLOps/production engineering by adding drift monitoring with embedding-based detection, integrating REST with legacy SOAP/queue-based CRM via FastAPI middleware, and scaling reliably on Kubernetes with HPA.

View profile
Nishad Kane - Mid-level Data Scientist & AI Engineer specializing in RAG, agentic AI, and production ML

Nishad Kane

Screened

Mid-level Data Scientist & AI Engineer specializing in RAG, agentic AI, and production ML

5y exp
Xtrium AIArizona State University

AI/data engineer who built a production LLM-powered schema drift detection system (LangChain/LangGraph) to catch semantic data changes before they break downstream analytics/ML. Deployed on AWS with Docker/S3 and implemented an LLM-as-a-judge evaluation framework to improve trust, reduce hallucinations, and control false positives/alert fatigue. Collaborated with non-technical risk/business analytics stakeholders at EY by delivering human-readable drift explanations that improved confidence in financial analytics dashboards.

View profile
Revanth Goli - Senior Data & Backend Engineer specializing in cloud data pipelines and LLM/RAG systems in Morrisville, NC

Revanth Goli

Screened

Senior Data & Backend Engineer specializing in cloud data pipelines and LLM/RAG systems

Morrisville, NC6y exp
Syneos HealthUniversity of Alabama at Birmingham

Data engineer with end-to-end ownership of large-scale retail and clinical data ingestion/processing on AWS, including real-time streaming and batch pipelines. Delivered measurable outcomes: 20M daily transactions processed, latency cut from 4 hours to 5 minutes, ~70% fewer failures, and 120+ pipelines running at 99.8% reliability with full audit compliance.

View profile
BK

Mid-level Data Engineer specializing in big data pipelines and real-time streaming

Dallas, TX6y exp
Johnson & JohnsonUniversity of North Texas

Data engineer who has owned end-to-end production pipelines processing a few million records/day, using Python/Airflow/SQL/PySpark with Snowflake serving to BI (Power BI). Built resilient external web data collection systems (anti-bot, schema-change detection, backfills) and shipped versioned REST APIs for internal consumers, improving pipeline success rates to 99% through monitoring, retries, and idempotent design.

View profile
SV

Mid-Level Data Engineer specializing in cloud data platforms and governed analytics

5y exp
OptumUniversity of Central Missouri

Data engineer with Optum experience building end-to-end healthcare data pipelines for HL7/FHIR, processing millions of records daily across Kafka streaming and Databricks/Spark batch. Strong focus on data quality (schema enforcement/validations), reliability (Airflow monitoring/alerts), and analytics-ready serving in Snowflake powering Power BI/Tableau, with CI/CD via Git and Jenkins.

View profile
TD

Mid-level Cloud Data Engineer specializing in Azure/AWS pipelines and medallion architecture

USA4y exp
UnitedHealth GroupSouthern Illinois University Carbondale

Data engineer focused on reliability and data quality, owning end-to-end pipelines processing ~100k–300k records/day. Implemented robust validation and monitoring that cut reporting issues by ~30%, and built stable external data collection with anti-bot measures, backfills, and schema-change detection while maintaining backward-compatible internal data services.

View profile
PS

Senior Data Analyst specializing in marketing, BI, and financial analytics

Illinois, USA6y exp
WPPDePaul University

Marketing analytics candidate with experience at WPP and on a global Coca-Cola campaign, focused on turning messy multi-platform media data into trusted reporting and decision systems. They combine hands-on SQL/Python pipeline building with stakeholder KPI alignment, and cite a 22% improvement in media effectiveness plus faster budget reallocation through daily automated reporting.

View profile

Need someone specific?

AI Search