Vetted PySpark Professionals

Pre-screened and vetted.

PS

Mid-level Data Engineer specializing in AWS lakehouse platforms and scalable ETL/ELT

Texas, USA4y exp
HumanaUniversity of Texas at Dallas

Data engineer focused on reliable, production-grade pipelines and data services: has owned end-to-end ingestion-to-serving workflows processing millions of records/day, using Airflow, Python/SQL, and PySpark. Demonstrates strong operational rigor (monitoring, retries, idempotency, backfills) and measurable outcomes (98% stability, ~30% faster processing), plus experience exposing curated warehouse data via versioned REST APIs.

View profile
VK

Varshitha K

Screened

Mid-level Data Engineer specializing in cloud data platforms and lakehouse architectures

Lakewood, CO4y exp
First BankUniversity of Central Missouri

Data engineer in a banking context who has owned end-to-end Azure lakehouse pipelines ingesting financial/vendor data from APIs, Azure SQL, and flat files into Databricks/Delta (bronze-silver-gold). Emphasizes production reliability via schema-drift validation, data quality controls, monitoring/alerting, retries/checkpointing, and Spark/Delta performance tuning, with outputs served to BI/reporting teams (e.g., Tableau).

View profile
Chandan Chalumuri - Mid-level Data Scientist specializing in ML, NLP, and Generative AI in Tempe, AZ

Mid-level Data Scientist specializing in ML, NLP, and Generative AI

Tempe, AZ4y exp
MetLifeArizona State University

Data engineering / ML practitioner with experience at MetLife building transformer-based sentiment analysis over large unstructured datasets and productionizing pipelines with Airflow/PySpark/Hadoop (reported 52% efficiency gain). Also implemented embedding-based semantic search using Pinecone/Weaviate to improve retrieval relevance and enable RAG for customer support and document matching use cases.

View profile
Manpreet Kour - Senior Data Scientist specializing in Generative AI and NLP in Seattle, USA

Manpreet Kour

Screened

Senior Data Scientist specializing in Generative AI and NLP

Seattle, USA6y exp
SOTIDr. B. R. Ambedkar National Institute of Technology, Jalandhar

ML/NLP engineer with recent Scotiabank experience building production-grade indexing automation over large-scale emails and customer databases, combining LLM fine-tuning (Mistral, XLM-R) with fuzzy matching to exceed 95% accuracy under strict banking constraints. Also built a RAG-based chat agent using Gecko embeddings, Vertex AI Search, Gemini, and cross-encoder reranking, and delivered a text-to-SQL chatbot at SOTI through iterative fine-tuning and benchmark-driven experimentation.

View profile
Vedang Jadhav - Mid-Level Software Engineer specializing in cloud-native microservices on AWS in New York City, NY

Vedang Jadhav

Screened

Mid-Level Software Engineer specializing in cloud-native microservices on AWS

New York City, NY5y exp
CitigroupIndiana University Bloomington

Backend engineer with experience across healthcare and fintech platforms (Anthem, Citia) building high-throughput Python microservices with strong compliance/security focus (HIPAA, tenant isolation). Has integrated ML workflows into production systems (ResNet embedding-based image similarity) using async pipelines (Celery/Redis) and AWS (Lambda/S3/ECS), delivering measurable performance and fraud/content-integrity improvements at scale.

View profile
vineetha Pulipati - Mid-level Software Engineer specializing in backend microservices and cloud data pipelines in MO, USA

Mid-level Software Engineer specializing in backend microservices and cloud data pipelines

MO, USA4y exp
Morgan StanleyWebster University

Backend engineer with Morgan Stanley experience building and owning an end-to-end Python FastAPI microservice for high-volume market data used by trading and risk systems. Strong in performance tuning and reliability (PySpark, Redis caching, async APIs), real-time streaming with Kafka, and production operations (Docker/Kubernetes, GitOps-style CI/CD, monitoring). Has led cloud/on-prem migration work across AWS and Azure, including fixing Azure Synapse performance issues via query and pipeline redesign.

View profile
Sharanya Rao - Mid-level AI/ML Engineer specializing in NLP, LLMs, and RAG for finance and healthcare in Remote, USA

Sharanya Rao

Screened

Mid-level AI/ML Engineer specializing in NLP, LLMs, and RAG for finance and healthcare

Remote, USA3y exp
Ally FinancialUniversity of Maryland, Baltimore County

Built an AI lending assistant (RAG + DeBERTa) used by credit analysts to retrieve policies and past loan decisions, tackling real production issues like hallucinations, document quality, and sub-second latency. Deployed a modular, Dockerized AWS architecture (ECS/EMR + load balancer) with load testing, caching/precomputed embeddings, and CloudWatch monitoring, and used Airflow to automate scheduled data/embedding/vector DB refresh pipelines with retries and alerts.

View profile
srilekha pothula - Mid-level Data Engineer specializing in cloud data pipelines for healthcare and financial services in Bloomfield, CT

Mid-level Data Engineer specializing in cloud data pipelines for healthcare and financial services

Bloomfield, CT4y exp
CignaPace University

Data engineer with ~4 years of experience (Cigna) building and operating Azure Data Factory pipelines for healthcare claims/member/provider data at 2–3M records/day. Emphasizes reliability and downstream safety via schema/data-quality validation, quarantine workflows, idempotent processing, and backfills; also improved runtime ~20% through SQL optimization and served curated datasets through versioned views and well-documented, analyst-friendly interfaces.

View profile
FM

Senior AI/ML Engineer specializing in healthcare AI and MLOps

Mansfield, TX16y exp
McKessonSam Houston State University

Healthcare AI engineer with hands-on ownership of production ML and LLM systems at McKesson, spanning clinical risk prediction and RAG-based documentation tools. Stands out for combining deep clinical-data experience, HIPAA-aware deployment practices, and measurable impact through reduced readmissions, clinician workflow gains, and 20% to 30% faster ML delivery for engineering teams.

View profile
Sri Teja - Mid-level AI Engineer specializing in LLM systems and enterprise data platforms in Phoenix, AZ

Sri Teja

Screened

Mid-level AI Engineer specializing in LLM systems and enterprise data platforms

Phoenix, AZ5y exp
AAA The Auto Club GroupUniversity of Arizona

Built and owned key parts of Ripley, an AI-powered multi-agent operations platform for roadside assistance that automates high-volume customer service workflows at production scale. They designed the orchestration, evaluation, monitoring, and enterprise integrations, helping drive 70-80% automation and ~99% reliability across thousands of weekly interactions and millions of annual requests.

View profile
NG

Mid-level Data Scientist specializing in MLOps and Generative AI

Illinois, IL4y exp
BNY MellonIllinois Institute of Technology

Robotics software/ML engineer who built perception and navigation-related ML systems for autonomous supermarket carts, including object detection, shelf recognition, and obstacle avoidance. Strong ROS/ROS2 practitioner who optimized real-time performance (reported 50% latency reduction) and deployed containerized ROS/ML pipelines at scale using Docker, Kubernetes, and CI/CD.

View profile
TN

Junior Software Engineer specializing in Cloud, Full-Stack, and Data Engineering

Virginia, USA2y exp
Strategy INCUniversity of Texas at Dallas

Software engineer with experience across data engineering and backend/platform work: owned a Databricks/PySpark real-time pipeline powering customer dashboards with a 15-minute SLA, and helped modernize an investor web app from JSP to React/TypeScript with API + SQL/materialized-view performance improvements. Also contributed to breaking a Java monolith into microservices (Redis + gRPC on AWS EKS) and built an EC2-deployed Play Store/App Store crawler that reduced third-party data costs.

View profile
SK

Mid-level Data Scientist specializing in real-time fraud detection and MLOps

San Francisco, CA5y exp
Charles SchwabCUNY Graduate Center

ML/NLP engineer with experience at Charles Schwab building an NLP + graph (Neo4j) entity-resolution system to unify fragmented user/device/transaction data and improve downstream model quality and analyst querying. Has applied embeddings (SentenceTransformers + FAISS) with domain fine-tuning to boost hard-case matching recall by ~12% while maintaining precision, and has a track record of hardening scalable Python/Spark pipelines and productionizing fraud models via A/B tests and shadow-mode monitoring.

View profile
AB

Senior Data & Platform Engineer specializing in cloud-native streaming and distributed systems

USA10y exp
JPMorgan ChaseNew York Institute of Technology

Financial data engineer who has built and operated high-volume batch + streaming pipelines (200–300 GB/day; 5–10k events/sec) using AWS, Spark/Delta, Airflow, Kafka, and Snowflake, with strong emphasis on data quality and reliability. Demonstrated measurable impact via 99.9% SLA adherence, major reductions in bad records/nulls, MTTR improvements, and significant latency/runtime/query performance gains; also built a distributed web-scraping system processing 5–10M records/day with anti-bot and schema-drift defenses.

View profile
MS

Mid-level Data Engineer specializing in multi-cloud data platforms for healthcare and finance

USA6y exp
CignaUniversity of Cincinnati

Data engineer with Cigna experience building and operating an end-to-end AWS-based healthcare claims pipeline processing ~2TB/day, using Glue/Kafka/PySpark/SQL into Redshift. Strong focus on data quality and reliability (schema validation, monitoring/alerting, retries/checkpointing/backfills), reporting improved accuracy (~99%) and reduced latency, plus experience serving real-time Kafka/Spark data to downstream analytics with documented data contracts.

View profile
AG

Mid-level Data Engineer specializing in cloud ETL and real-time streaming

New York, NY6y exp
PNCRochester Institute of Technology

Data engineer focused on AWS + Spark/Databricks pipelines, including an end-to-end nightly loan-data ingestion flow (~2.2M records) from Postgres/S3 through Glue and Databricks into a DWH with layered validation and alerting. Also built real-time streaming with Kafka + Spark Structured Streaming and a master’s project streaming Reddit data for sentiment analysis under ambiguous requirements and tight budget constraints.

View profile
Harideep Balusa - Mid-level AI/ML Engineer specializing in FinTech risk, fraud detection, and GenAI/RAG systems in USA

Mid-level AI/ML Engineer specializing in FinTech risk, fraud detection, and GenAI/RAG systems

USA6y exp
Freddie MacUniversity of Wisconsin

Built and productionized Azure-based LLM/RAG systems for regulatory/compliance use cases, including automating analyst research and compliance report generation across large unstructured document sets. Demonstrates strong practical depth in hallucination mitigation, hybrid retrieval tuning (BM25 + embeddings), and production MLOps (Databricks, Cognitive Search, AKS, Airflow/MLflow), plus proven ability to deliver auditable, explainable solutions with non-technical compliance teams.

View profile
Srilekha Jakkula - Senior Data Engineer specializing in scalable data pipelines and API-driven data services in Chicago, IL

Senior Data Engineer specializing in scalable data pipelines and API-driven data services

Chicago, IL5y exp
Northern TrustNorthern Illinois University

Data engineer focused on building scalable, reliable end-to-end data pipelines and backend REST data services, spanning API ingestion plus batch/stream processing with Airflow, Kafka, Spark/PySpark, and SQL. Emphasizes strong data quality validation, monitoring/fault tolerance, and performance tuning for large datasets, with experience deploying in cloud environments using containerization and CI/CD.

View profile
AB

Alekya Battu

Screened

Mid-level Data Scientist specializing in machine learning, MLOps, and cloud analytics

USA5y exp
Wells FargoWilmington University

Senior data scientist with ~5 years’ experience building production ML/NLP systems in finance (Wells Fargo) and deep learning for sensor analytics in connected vehicles (Medtronic). Has delivered end-to-end platforms combining time-series forecasting with transformer-based NLP, including automated drift monitoring/retraining (MLflow + Airflow) and standardized Docker/CI/CD deployments; achieved a reported 22% precision improvement after domain fine-tuning.

View profile
TM

Tarun Majhi

Screened

Mid-level AI Software Engineer specializing in FinTech and LLM systems

Massachusetts, USA4y exp
State StreetClark University

Engineer with hands-on experience designing and leading multi-agent AI development workflows, including a LangGraph-based system that automated parts of a RAG pipeline and significantly reduced development time. Stands out for treating AI agents like an engineering team, with clear architecture, handoff schemas, validation, and supervisor-driven conflict resolution.

View profile
HP

Hard Parikh

Screened

Mid-level Software Engineer specializing in data platforms, distributed systems, and applied AI

Austin, TX3y exp
Compass GroupUC Riverside

AI/full-stack product engineer currently owning Fleck Intelligent Survey Chatbot at E15, a production RAG analytics assistant embedded in Compass Group dashboards for 300+ field operators. Stands out for combining LLM orchestration, analytics engineering, and strong systems thinking—cutting hallucinated numeric answers from 14% to 2%, reducing backlog 62%, and previously delivering a low-level protocol redesign at Amadeus that cut P99 latency by 56%.

View profile
MK

Mid-level Full-Stack Engineer specializing in cloud-native microservices and AI automation

USA5y exp
Fuel AICalifornia State University

Software engineer/product owner who has led end-to-end delivery of AI and content-management platforms, including building RAG-based reliability improvements and migrating fragile systems to containerized AWS ECS/Kubernetes with Terraform-managed CI/CD. Experienced designing event-driven microservices (SQS/SNS/RabbitMQ), scaling queue consumers with autoscaling, and creating internal Python tooling to standardize data connectors (e.g., BigQuery/Airtable/internal APIs) to speed iteration.

View profile
BS

Mid-level Data Engineer specializing in Lakehouse, Streaming, and ML/LLM data systems

Remote, USA3y exp
DiscoverUniversity of South Dakota

Built and productionized an enterprise retrieval-augmented generation platform for internal knowledge over large unstructured corpora, emphasizing trust via strict citation/grounding and hybrid retrieval (BM25 + FAISS + cross-encoder re-ranking). Demonstrates strong scaling and cost/latency optimization through incremental indexing/embedding and index partitioning, plus disciplined evaluation/observability practices. Has experience operationalizing pipelines with Airflow/Databricks/GitHub Actions and partnering closely with risk & compliance stakeholders on auditability requirements.

View profile
TT

Mid-level AI/ML Engineer specializing in MLOps and LLM applications

New York, NY4y exp
BNY MellonUniversity at Albany

BNY Mellon engineer who has built and operated production AI systems end-to-end: a LangChain/Pinecone RAG platform scaled via FastAPI + Kubernetes to 1000 RPM with 99.9% uptime, supported by monitoring and data-drift detection. Also deep in data/infra orchestration (Airflow, Dagster, Terraform on AWS/EMR/EC2), processing 500GB+ daily and delivering measurable reliability and performance gains, plus strong compliance-facing model explainability using SHAP and Tableau.

View profile

Need someone specific?

AI Search