Browse Talent Find Talent Open Jobs Pricing FAQsGet Started

Vetted PySpark Professionals

Pre-screened and vetted.

PySpark Python SQL Docker AWS CI/CD

Parvinder Singh

Screened

Mid-level Data Engineer specializing in AWS lakehouse platforms and scalable ETL/ELT

Texas, USA4y exp

HumanaUniversity of Texas at Dallas

“Data engineer focused on reliable, production-grade pipelines and data services: has owned end-to-end ingestion-to-serving workflows processing millions of records/day, using Airflow, Python/SQL, and PySpark. Demonstrates strong operational rigor (monitoring, retries, idempotency, backfills) and measurable outcomes (98% stability, ~30% faster processing), plus experience exposing curated warehouse data via versioned REST APIs.”

Data Engineering Data Pipelines AWS Databricks Snowflake ETL+88

View profile

Varshitha K

Screened

Mid-level Data Engineer specializing in cloud data platforms and lakehouse architectures

Lakewood, CO4y exp

First BankUniversity of Central Missouri

“Data engineer in a banking context who has owned end-to-end Azure lakehouse pipelines ingesting financial/vendor data from APIs, Azure SQL, and flat files into Databricks/Delta (bronze-silver-gold). Emphasizes production reliability via schema-drift validation, data quality controls, monitoring/alerting, retries/checkpointing, and Spark/Delta performance tuning, with outputs served to BI/reporting teams (e.g., Tableau).”

Python Scala Java C++SQL PL/SQL+173

View profile

Chandan Chalumuri

Screened

Mid-level Data Scientist specializing in ML, NLP, and Generative AI

Tempe, AZ4y exp

MetLifeArizona State University

“Data engineering / ML practitioner with experience at MetLife building transformer-based sentiment analysis over large unstructured datasets and productionizing pipelines with Airflow/PySpark/Hadoop (reported 52% efficiency gain). Also implemented embedding-based semantic search using Pinecone/Weaviate to improve retrieval relevance and enable RAG for customer support and document matching use cases.”

A/B Testing Agile Apache Airflow Apache Hadoop Apache Kafka Apache Spark+170

View profile

Manpreet Kour

Screened

Senior Data Scientist specializing in Generative AI and NLP

Seattle, USA6y exp

SOTIDr. B. R. Ambedkar National Institute of Technology, Jalandhar

“ML/NLP engineer with recent Scotiabank experience building production-grade indexing automation over large-scale emails and customer databases, combining LLM fine-tuning (Mistral, XLM-R) with fuzzy matching to exceed 95% accuracy under strict banking constraints. Also built a RAG-based chat agent using Gecko embeddings, Vertex AI Search, Gemini, and cross-encoder reranking, and delivered a text-to-SQL chatbot at SOTI through iterative fine-tuning and benchmark-driven experimentation.”

Machine Learning Deep Learning Generative AI Computer Vision PyTorch PySpark+92

View profile

Vedang Jadhav

Screened

Mid-Level Software Engineer specializing in cloud-native microservices on AWS

New York City, NY5y exp

CitigroupIndiana University Bloomington

“Backend engineer with experience across healthcare and fintech platforms (Anthem, Citia) building high-throughput Python microservices with strong compliance/security focus (HIPAA, tenant isolation). Has integrated ML workflows into production systems (ResNet embedding-based image similarity) using async pipelines (Celery/Redis) and AWS (Lambda/S3/ECS), delivering measurable performance and fraud/content-integrity improvements at scale.”

Python Java JavaScript TypeScript C++SQL+116

View profile

vineetha Pulipati

Screened

Mid-level Software Engineer specializing in backend microservices and cloud data pipelines

MO, USA4y exp

Morgan StanleyWebster University

“Backend engineer with Morgan Stanley experience building and owning an end-to-end Python FastAPI microservice for high-volume market data used by trading and risk systems. Strong in performance tuning and reliability (PySpark, Redis caching, async APIs), real-time streaming with Kafka, and production operations (Docker/Kubernetes, GitOps-style CI/CD, monitoring). Has led cloud/on-prem migration work across AWS and Azure, including fixing Azure Synapse performance issues via query and pipeline redesign.”

Python SQL Bash Shell Scripting TypeScript C+++129

View profile

Sharanya Rao

Screened

Mid-level AI/ML Engineer specializing in NLP, LLMs, and RAG for finance and healthcare

Remote, USA3y exp

Ally FinancialUniversity of Maryland, Baltimore County

“Built an AI lending assistant (RAG + DeBERTa) used by credit analysts to retrieve policies and past loan decisions, tackling real production issues like hallucinations, document quality, and sub-second latency. Deployed a modular, Dockerized AWS architecture (ECS/EMR + load balancer) with load testing, caching/precomputed embeddings, and CloudWatch monitoring, and used Airflow to automate scheduled data/embedding/vector DB refresh pipelines with retries and alerts.”

Python PySpark SQL Pandas NumPy Scikit-learn+133

View profile

srilekha pothula

Screened

Mid-level Data Engineer specializing in cloud data pipelines for healthcare and financial services

Bloomfield, CT4y exp

CignaPace University

“Data engineer with ~4 years of experience (Cigna) building and operating Azure Data Factory pipelines for healthcare claims/member/provider data at 2–3M records/day. Emphasizes reliability and downstream safety via schema/data-quality validation, quarantine workflows, idempotent processing, and backfills; also improved runtime ~20% through SQL optimization and served curated datasets through versioned views and well-documented, analyst-friendly interfaces.”

Apache Airflow Apache Kafka Apache Spark AWS AWS Glue AWS Lambda+71

View profile

Fernando Mosqueda

Screened

Senior AI/ML Engineer specializing in healthcare AI and MLOps

Mansfield, TX16y exp

McKessonSam Houston State University

“Healthcare AI engineer with hands-on ownership of production ML and LLM systems at McKesson, spanning clinical risk prediction and RAG-based documentation tools. Stands out for combining deep clinical-data experience, HIPAA-aware deployment practices, and measurable impact through reduced readmissions, clinician workflow gains, and 20% to 30% faster ML delivery for engineering teams.”

Python R JavaScript Go PyTorch TensorFlow+193

View profile

Sri Teja

Screened

Mid-level AI Engineer specializing in LLM systems and enterprise data platforms

Phoenix, AZ5y exp

AAA The Auto Club GroupUniversity of Arizona

“Built and owned key parts of Ripley, an AI-powered multi-agent operations platform for roadside assistance that automates high-volume customer service workflows at production scale. They designed the orchestration, evaluation, monitoring, and enterprise integrations, helping drive 70-80% automation and ~99% reliability across thousands of weekly interactions and millions of annual requests.”

Generative AI Retrieval-Augmented Generation Prompt Engineering LoRA Hugging Face OpenAI API+156

View profile

Nishchal Gante

Screened

Mid-level Data Scientist specializing in MLOps and Generative AI

Illinois, IL4y exp

BNY MellonIllinois Institute of Technology

“Robotics software/ML engineer who built perception and navigation-related ML systems for autonomous supermarket carts, including object detection, shelf recognition, and obstacle avoidance. Strong ROS/ROS2 practitioner who optimized real-time performance (reported 50% latency reduction) and deployed containerized ROS/ML pipelines at scale using Docker, Kubernetes, and CI/CD.”

A/B Testing Agile Amazon API Gateway Amazon Bedrock Amazon EC2 Amazon RDS+133

View profile

Tanishq Nimale

Screened

Junior Software Engineer specializing in Cloud, Full-Stack, and Data Engineering

Virginia, USA2y exp

Strategy INCUniversity of Texas at Dallas

“Software engineer with experience across data engineering and backend/platform work: owned a Databricks/PySpark real-time pipeline powering customer dashboards with a 15-minute SLA, and helped modernize an investor web app from JSP to React/TypeScript with API + SQL/materialized-view performance improvements. Also contributed to breaking a Java monolith into microservices (Redis + gRPC on AWS EKS) and built an EC2-deployed Play Store/App Store crawler that reduced third-party data costs.”

AWS AWS Lambda Apache Kafka API Development Authentication C#+84

View profile

Sai Krishna Chittanuri

Screened

Mid-level Data Scientist specializing in real-time fraud detection and MLOps

San Francisco, CA5y exp

Charles SchwabCUNY Graduate Center

“ML/NLP engineer with experience at Charles Schwab building an NLP + graph (Neo4j) entity-resolution system to unify fragmented user/device/transaction data and improve downstream model quality and analyst querying. Has applied embeddings (SentenceTransformers + FAISS) with domain fine-tuning to boost hard-case matching recall by ~12% while maintaining precision, and has a track record of hardening scalable Python/Spark pipelines and productionizing fraud models via A/B tests and shadow-mode monitoring.”

Python R SQL Pandas NumPy PySpark+120

View profile

Ankush Banthia

Screened

Senior Data & Platform Engineer specializing in cloud-native streaming and distributed systems

USA10y exp

JPMorgan ChaseNew York Institute of Technology

“Financial data engineer who has built and operated high-volume batch + streaming pipelines (200–300 GB/day; 5–10k events/sec) using AWS, Spark/Delta, Airflow, Kafka, and Snowflake, with strong emphasis on data quality and reliability. Demonstrated measurable impact via 99.9% SLA adherence, major reductions in bad records/nulls, MTTR improvements, and significant latency/runtime/query performance gains; also built a distributed web-scraping system processing 5–10M records/day with anti-bot and schema-drift defenses.”

Team Building Onboarding Mentoring Agile Scrum Jira+150

View profile

Madhupal Singu

Screened

Mid-level Data Engineer specializing in multi-cloud data platforms for healthcare and finance

USA6y exp

CignaUniversity of Cincinnati

“Data engineer with Cigna experience building and operating an end-to-end AWS-based healthcare claims pipeline processing ~2TB/day, using Glue/Kafka/PySpark/SQL into Redshift. Strong focus on data quality and reliability (schema validation, monitoring/alerting, retries/checkpointing/backfills), reporting improved accuracy (~99%) and reduced latency, plus experience serving real-time Kafka/Spark data to downstream analytics with documented data contracts.”

Python Pandas PySpark SQL Scala Java+88

View profile

Abhishek Gawali

Screened

Mid-level Data Engineer specializing in cloud ETL and real-time streaming

New York, NY6y exp

PNCRochester Institute of Technology

“Data engineer focused on AWS + Spark/Databricks pipelines, including an end-to-end nightly loan-data ingestion flow (~2.2M records) from Postgres/S3 through Glue and Databricks into a DWH with layered validation and alerting. Also built real-time streaming with Kafka + Spark Structured Streaming and a master’s project streaming Reddit data for sentiment analysis under ambiguous requirements and tight budget constraints.”

SDLC Agile Waterfall Python SQL R+105

View profile

Harideep Balusa

Screened

Mid-level AI/ML Engineer specializing in FinTech risk, fraud detection, and GenAI/RAG systems

USA6y exp

Freddie MacUniversity of Wisconsin

“Built and productionized Azure-based LLM/RAG systems for regulatory/compliance use cases, including automating analyst research and compliance report generation across large unstructured document sets. Demonstrates strong practical depth in hallucination mitigation, hybrid retrieval tuning (BM25 + embeddings), and production MLOps (Databricks, Cognitive Search, AKS, Airflow/MLflow), plus proven ability to deliver auditable, explainable solutions with non-technical compliance teams.”

Python R SQL Scala Machine Learning Deep Learning+125

View profile

Srilekha Jakkula

Screened

Senior Data Engineer specializing in scalable data pipelines and API-driven data services

Chicago, IL5y exp

Northern TrustNorthern Illinois University

“Data engineer focused on building scalable, reliable end-to-end data pipelines and backend REST data services, spanning API ingestion plus batch/stream processing with Airflow, Kafka, Spark/PySpark, and SQL. Emphasizes strong data quality validation, monitoring/fault tolerance, and performance tuning for large datasets, with experience deploying in cloud environments using containerization and CI/CD.”

Python SQL REST APIs API Integration JSON XML+51

View profile

Alekya Battu

Screened

Mid-level Data Scientist specializing in machine learning, MLOps, and cloud analytics

USA5y exp

Wells FargoWilmington University

“Senior data scientist with ~5 years’ experience building production ML/NLP systems in finance (Wells Fargo) and deep learning for sensor analytics in connected vehicles (Medtronic). Has delivered end-to-end platforms combining time-series forecasting with transformer-based NLP, including automated drift monitoring/retraining (MLflow + Airflow) and standardized Docker/CI/CD deployments; achieved a reported 22% precision improvement after domain fine-tuning.”

Python SQL R Classification XGBoost Random Forest+171

View profile

Tarun Majhi

Screened

Mid-level AI Software Engineer specializing in FinTech and LLM systems

Massachusetts, USA4y exp

State StreetClark University

“Engineer with hands-on experience designing and leading multi-agent AI development workflows, including a LangGraph-based system that automated parts of a RAG pipeline and significantly reduced development time. Stands out for treating AI agents like an engineering team, with clear architecture, handoff schemas, validation, and supervisor-driven conflict resolution.”

Python Java SQL JavaScript FastAPI Flask+97

View profile

Hard Parikh

Screened

Mid-level Software Engineer specializing in data platforms, distributed systems, and applied AI

Austin, TX3y exp

Compass GroupUC Riverside

“AI/full-stack product engineer currently owning Fleck Intelligent Survey Chatbot at E15, a production RAG analytics assistant embedded in Compass Group dashboards for 300+ field operators. Stands out for combining LLM orchestration, analytics engineering, and strong systems thinking—cutting hallucinated numeric answers from 14% to 2%, reducing backlog 62%, and previously delivering a low-level protocol redesign at Amadeus that cut P99 latency by 56%.”

Python SQL C++Java TypeScript JavaScript+113

View profile

Mrunal Kakirwar

Screened

Mid-level Full-Stack Engineer specializing in cloud-native microservices and AI automation

USA5y exp

Fuel AICalifornia State University

“Software engineer/product owner who has led end-to-end delivery of AI and content-management platforms, including building RAG-based reliability improvements and migrating fragile systems to containerized AWS ECS/Kubernetes with Terraform-managed CI/CD. Experienced designing event-driven microservices (SQS/SNS/RabbitMQ), scaling queue consumers with autoscaling, and creating internal Python tooling to standardize data connectors (e.g., BigQuery/Airtable/internal APIs) to speed iteration.”

Python JavaScript TypeScript Shell Scripting Java SQL+108

View profile

BHEEMA SABILLA

Screened

Mid-level Data Engineer specializing in Lakehouse, Streaming, and ML/LLM data systems

Remote, USA3y exp

DiscoverUniversity of South Dakota

“Built and productionized an enterprise retrieval-augmented generation platform for internal knowledge over large unstructured corpora, emphasizing trust via strict citation/grounding and hybrid retrieval (BM25 + FAISS + cross-encoder re-ranking). Demonstrates strong scaling and cost/latency optimization through incremental indexing/embedding and index partitioning, plus disciplined evaluation/observability practices. Has experience operationalizing pipelines with Airflow/Databricks/GitHub Actions and partnering closely with risk & compliance stakeholders on auditability requirements.”

Python PySpark SQL Scala Pandas NumPy+157

View profile

Thrinesh Thode

Screened

Mid-level AI/ML Engineer specializing in MLOps and LLM applications

New York, NY4y exp

BNY MellonUniversity at Albany

“BNY Mellon engineer who has built and operated production AI systems end-to-end: a LangChain/Pinecone RAG platform scaled via FastAPI + Kubernetes to 1000 RPM with 99.9% uptime, supported by monitoring and data-drift detection. Also deep in data/infra orchestration (Airflow, Dagster, Terraform on AWS/EMR/EC2), processing 500GB+ daily and delivering measurable reliability and performance gains, plus strong compliance-facing model explainability using SHAP and Tableau.”

A/B Testing Agentic AI Apache Kafka Apache Spark AWS AWS Lambda+86

View profile

Machine Learning Engineers Software Engineers Data Scientists Data Engineers Data Analysts AI Engineers AI & Machine Learning Data & Analytics Engineering Education

Need someone specific?

AI Search

Related

Need someone specific?