Vetted PySpark Professionals

Pre-screened and vetted.

Anvesh Reddy Narra

Screened

Mid-level AI/ML Engineer specializing in Generative AI, RAG, and MLOps

3y exp

State FarmCleveland State University

“Built a secure, on-prem/private GPT assistant to replace manual SharePoint-style search across thousands of policies/SOPs/engineering docs, using a production RAG stack (LangChain/LangGraph, FAISS/Chroma, PyMuPDF+OCR, vLLM). Implemented layout-aware ingestion (including table-to-JSON) and a multi-agent retrieval/generation/verification workflow with strong observability and compliance guardrails, delivering ~70% reduction in search time.”

Anomaly Detection Ansible Apache Kafka Apache Spark AWS BERT+184

View profile

Rahul Karanam

Screened

Senior Computer Vision & Robotics Engineer specializing in perception and warehouse automation

San Jose, CA5y exp

RoboteonUniversity of Maryland, College Park

“Robotics engineer with hands-on experience scaling a multi-vendor heterogeneous warehouse robot fleet, building a distributed “traffic manager” for collision avoidance and real-time rerouting using CBS/MAPF and DCOP-style negotiation. Strong real-time/safety-critical systems background (RTOS, deterministic lock-free multithreading) plus modern perception and simulation tooling (CNN-LSTM/transformers, CARLA/Isaac Sim, VIO/GTSAM, camera-IMU calibration). Startup-oriented and comfortable moving quickly from prototype to production.”

Angular AWS AWS Lambda C++CI/CD Computer Vision+147

View profile

Manpreet Kour

Screened

Senior Data Scientist specializing in Generative AI and NLP

Seattle, USA6y exp

SOTIDr. B. R. Ambedkar National Institute of Technology, Jalandhar

“ML/NLP engineer with recent Scotiabank experience building production-grade indexing automation over large-scale emails and customer databases, combining LLM fine-tuning (Mistral, XLM-R) with fuzzy matching to exceed 95% accuracy under strict banking constraints. Also built a RAG-based chat agent using Gecko embeddings, Vertex AI Search, Gemini, and cross-encoder reranking, and delivered a text-to-SQL chatbot at SOTI through iterative fine-tuning and benchmark-driven experimentation.”

Machine Learning Deep Learning Generative AI Computer Vision PyTorch PySpark+92

View profile

Sri Niyati Kompella

Screened

Senior Data Engineer specializing in cloud data platforms and ML pipelines

Atlanta, GA8y exp

Berkshire HathawayUniversity of Alabama at Birmingham

“Data engineer focused on AWS-based enterprise data platforms, owning end-to-end pipelines from multi-source batch/stream ingestion (Glue/Kinesis/StreamSets/Airflow) through PySpark transformations into curated datasets for Redshift/Snowflake. Emphasizes production reliability with strong monitoring/observability and data quality gates, and reports ~30% performance improvement plus improved SLAs and latency after optimization.”

Amazon DynamoDB Amazon EMR Amazon EKS Amazon Kinesis Amazon Redshift Amazon S3+138

View profile

Nishchal Gante

Screened

Mid-level Data Scientist specializing in MLOps and Generative AI

Illinois, IL4y exp

BNY MellonIllinois Institute of Technology

“Robotics software/ML engineer who built perception and navigation-related ML systems for autonomous supermarket carts, including object detection, shelf recognition, and obstacle avoidance. Strong ROS/ROS2 practitioner who optimized real-time performance (reported 50% latency reduction) and deployed containerized ROS/ML pipelines at scale using Docker, Kubernetes, and CI/CD.”

A/B Testing Agile Amazon API Gateway Amazon Bedrock Amazon EC2 Amazon RDS+133

View profile

Tanishq Nimale

Screened

Junior Software Engineer specializing in Cloud, Full-Stack, and Data Engineering

Virginia, USA2y exp

Strategy INCUniversity of Texas at Dallas

“Software engineer with experience across data engineering and backend/platform work: owned a Databricks/PySpark real-time pipeline powering customer dashboards with a 15-minute SLA, and helped modernize an investor web app from JSP to React/TypeScript with API + SQL/materialized-view performance improvements. Also contributed to breaking a Java monolith into microservices (Redis + gRPC on AWS EKS) and built an EC2-deployed Play Store/App Store crawler that reduced third-party data costs.”

AWS AWS Lambda Apache Kafka API Development Authentication C#+84

View profile

Harideep Balusa

Screened

Mid-level AI/ML Engineer specializing in FinTech risk, fraud detection, and GenAI/RAG systems

USA6y exp

Freddie MacUniversity of Wisconsin

“Built and productionized Azure-based LLM/RAG systems for regulatory/compliance use cases, including automating analyst research and compliance report generation across large unstructured document sets. Demonstrates strong practical depth in hallucination mitigation, hybrid retrieval tuning (BM25 + embeddings), and production MLOps (Databricks, Cognitive Search, AKS, Airflow/MLflow), plus proven ability to deliver auditable, explainable solutions with non-technical compliance teams.”

Python R SQL Scala Machine Learning Deep Learning+125

View profile

Alekya Battu

Screened

Mid-level Data Scientist specializing in ML, NLP, and MLOps

USA5y exp

Wells FargoWilmington University

“Senior data scientist with ~5 years’ experience building production ML/NLP systems in finance (Wells Fargo) and deep learning for sensor analytics in connected vehicles (Medtronic). Has delivered end-to-end platforms combining time-series forecasting with transformer-based NLP, including automated drift monitoring/retraining (MLflow + Airflow) and standardized Docker/CI/CD deployments; achieved a reported 22% precision improvement after domain fine-tuning.”

Agile Scrum Kanban SDLC CI/CD Waterfall+144

View profile

Sai Krishna Chittanuri

Screened

Mid-level Data Scientist specializing in real-time fraud detection and MLOps

San Francisco, CA5y exp

Charles SchwabCUNY Graduate Center

“ML/NLP engineer with experience at Charles Schwab building an NLP + graph (Neo4j) entity-resolution system to unify fragmented user/device/transaction data and improve downstream model quality and analyst querying. Has applied embeddings (SentenceTransformers + FAISS) with domain fine-tuning to boost hard-case matching recall by ~12% while maintaining precision, and has a track record of hardening scalable Python/Spark pipelines and productionizing fraud models via A/B tests and shadow-mode monitoring.”

Python R SQL Pandas NumPy PySpark+120

View profile

Ankush Banthia

Screened

Senior Data & Platform Engineer specializing in cloud-native streaming and distributed systems

USA10y exp

JPMorgan ChaseNew York Institute of Technology

“Financial data engineer who has built and operated high-volume batch + streaming pipelines (200–300 GB/day; 5–10k events/sec) using AWS, Spark/Delta, Airflow, Kafka, and Snowflake, with strong emphasis on data quality and reliability. Demonstrated measurable impact via 99.9% SLA adherence, major reductions in bad records/nulls, MTTR improvements, and significant latency/runtime/query performance gains; also built a distributed web-scraping system processing 5–10M records/day with anti-bot and schema-drift defenses.”

Onboarding Mentoring Agile Scrum Jira Confluence+150

View profile

Hritvik Gupta

Screened

Mid-level AI Engineer specializing in LLMs, RAG, and healthcare AI

San Francisco, CA3y exp

Penn MedicineUC Riverside

“Built and scaled an AI-powered voice/chat patient engagement platform at Penn Medicine from early prototype into production clinical workflows, focusing on latency, edge cases, and user trust. Strong in LLM reliability engineering (structured prompts, validation/fallbacks), real-time troubleshooting with observability, and cross-functional enablement through pilots, demos, and sales/customer partnership.”

AWS AWS Lambda C++CI/CD Communication Data Engineering+78

View profile

Mrunal Kakirwar

Screened

Mid-level Full-Stack Engineer specializing in cloud-native microservices and AI automation

USA5y exp

Fuel AICalifornia State University

“Software engineer/product owner who has led end-to-end delivery of AI and content-management platforms, including building RAG-based reliability improvements and migrating fragile systems to containerized AWS ECS/Kubernetes with Terraform-managed CI/CD. Experienced designing event-driven microservices (SQS/SNS/RabbitMQ), scaling queue consumers with autoscaling, and creating internal Python tooling to standardize data connectors (e.g., BigQuery/Airtable/internal APIs) to speed iteration.”

Python JavaScript TypeScript Shell Scripting Java SQL+108

View profile

BHEEMA SABILLA

Screened

Mid-level Data Engineer specializing in Lakehouse, Streaming, and ML/LLM data systems

Remote, USA3y exp

DiscoverUniversity of South Dakota

“Built and productionized an enterprise retrieval-augmented generation platform for internal knowledge over large unstructured corpora, emphasizing trust via strict citation/grounding and hybrid retrieval (BM25 + FAISS + cross-encoder re-ranking). Demonstrates strong scaling and cost/latency optimization through incremental indexing/embedding and index partitioning, plus disciplined evaluation/observability practices. Has experience operationalizing pipelines with Airflow/Databricks/GitHub Actions and partnering closely with risk & compliance stakeholders on auditability requirements.”

Python PySpark SQL Scala Pandas NumPy+157

View profile

Thrinesh Thode

Screened

Mid-level AI/ML Engineer specializing in MLOps and LLM applications

New York, NY4y exp

BNY MellonUniversity at Albany

“BNY Mellon engineer who has built and operated production AI systems end-to-end: a LangChain/Pinecone RAG platform scaled via FastAPI + Kubernetes to 1000 RPM with 99.9% uptime, supported by monitoring and data-drift detection. Also deep in data/infra orchestration (Airflow, Dagster, Terraform on AWS/EMR/EC2), processing 500GB+ daily and delivering measurable reliability and performance gains, plus strong compliance-facing model explainability using SHAP and Tableau.”

A/B Testing Apache Kafka Apache Spark AWS AWS Lambda BERT+86

View profile

Nikshitha Aella

Screened

Mid-level Full-Stack Software Engineer specializing in AI platforms and microservices

Mooresville, NC6y exp

Lowe'sUniversity of North Carolina at Charlotte

“Backend engineer currently building an AWS Lambda/FastAPI inventory recommendation system using a LangChain + GPT-4 RAG pipeline and MongoDB vector search; drove major cost optimization via Redis caching (60% reduction) while sustaining 10k+ daily requests under 2s latency. Previously deployed Node.js microservices on AWS OpenShift with Jenkins/Helm at UnitedHealth Group and led a zero-downtime monolith-to-microservices migration at Verizon, including RabbitMQ-based real-time messaging with DLQs and idempotency.”

Agile Angular API Gateway AWS AWS Lambda CI/CD+83

View profile

Varun Kumar Kota

Screened

Mid-level Software Engineer specializing in cloud, data engineering, and AI/ML

Remote3y exp

HandshakeUniversity at Buffalo

“Backend/platform engineer who owned an AI-powered resume optimization service end-to-end (FastAPI + Celery + Redis/Postgres) and optimized it for unpredictable LLM task latency. Strong Kubernetes/GitOps practitioner (Helm, autoscaling, probes, ArgoCD rollbacks) with experience in on-prem-to-cloud migrations using Terraform and CDC-based replication, plus real-time Kafka pipelines monitored via Prometheus/Grafana.”

Python SQL R Java JavaScript Jira+125

View profile

Koushik Gunjala

Screened

Senior AI Engineer specializing in Agentic AI and distributed systems

Charlotte, NC4y exp

UnitedHealth GroupUniversity of North Carolina at Charlotte

“LLM/agentic workflow engineer with healthcare domain experience who built a HIPAA-compliant multi-agent RAG system for clinical review automation at UnitedHealth Group, achieving 92% precision and cutting latency 40% through async orchestration and Redis semantic caching. Also has strong data engineering orchestration background (Airflow on AWS EMR with Great Expectations) and a proven clinician-in-the-loop feedback process that improved model faithfulness by 18%.”

Distributed Systems Retrieval-Augmented Generation (RAG)GPT-4 LangChain LangGraph Hugging Face+95

View profile

Bala Venkateswarlu K

Screened

Mid-level Data Scientist specializing in Generative AI, NLP, and MLOps

USA5y exp

MetLifeHarrisburg University of Science and Technology

“Built and deployed an LLM-powered claims-document summarization system (insurance domain) that cut agent review time from 4–5 minutes to under 2 minutes and saved 1,200+ hours per quarter. Hands-on across orchestration and production infrastructure (Airflow retraining DAGs, Kubernetes, SageMaker endpoints, FastAPI) and recent RAG workflows using n8n + Pinecone, with a strong focus on reliability, cost, and explainability for non-technical stakeholders.”

A/B Testing Agile Apache Kafka Apache Spark Auto Scaling AWS+148

View profile

Hema Edavalapati

Screened

Mid-level AI/ML Engineer specializing in cloud data engineering and GenAI

Florida, USA6y exp

LexisNexisUniversity of South Florida

“AI/LLM engineer with production experience in legal tech: built a GPT-4 + LangChain RAG summarization system at Govpanel that reduced legal case-file review time by 50%+. Previously at LexisNexis, orchestrated end-to-end Airflow data/AI pipelines processing 5M+ legal documents daily, improving ETL runtime by 35% with robust validation, monitoring, and SLAs.”

SQL SQL query optimization Python Pandas NumPy PySpark+159

View profile

Vardhan Addakattu

Screened

Mid-level Data Scientist specializing in Generative AI and NLP for financial risk

Glassboro, NJ4y exp

S&P GlobalRowan University

“Built and shipped production generative AI/RAG assistants in regulated financial contexts (S&P Global), automating compliance-oriented Q&A over earnings reports/filings with grounded answers and citations. Experienced across the full stack—AWS-based ingestion (PySpark/Glue), vector retrieval + LangChain agents, GPT-4/Claude model selection, and production reliability (monitoring, caching, retries) plus rigorous evaluation and regression testing.”

Python R SQL PySpark Pandas Apache Spark+111

View profile

Sridharan Kairmaknoda

Screened

Mid-level Data Engineer specializing in cloud data platforms and real-time analytics

Saint Louis, MO5y exp

CignaSaint Louis University

“Customer-facing data engineering professional who builds and deploys real-time reporting/dashboard solutions, gathering reporting and compliance requirements through direct stakeholder engagement. Experienced with Google Cloud IAM governance, secure integrations (encryption, audit logging), and fast production troubleshooting of ETL/pipeline failures with follow-on monitoring and automated recovery improvements; motivated by hands-on, travel-oriented customer work.”

SDLC Agile Waterfall Python SQL Jupyter Notebook+137

View profile

Chethan Thimapuram

Screened

Mid-level AI/ML Engineer specializing in LLM systems, RAG, and MLOps

5y exp

HCA HealthcareUniversity of South Florida

“Built a production, real-time clinical documentation system at HCA that converts doctor–patient conversations into structured clinical summaries using speech-to-text, LLM summarization, and RAG. Demonstrated measurable gains from medical-domain fine-tuning (clinical concept recall +18%, ROUGE-L 0.62 to 0.74) while meeting HIPAA constraints via PHI anonymization and encryption, and deployed via Docker/FastAPI with CI/CD and monitoring.”

Amazon CloudWatch Apache Airflow Apache Kafka Apache Spark AWS Glue AWS IAM+125

View profile

Niranjaan Munuswamy

Screened

Mid-level Full-Stack & Data Engineer specializing in AWS cloud and real-time streaming

Chicago, IL4y exp

CignaIllinois Institute of Technology

“Backend engineer with experience at Cigna evolving REST API services backed by PostgreSQL, emphasizing reliability/correctness, scalability, and observability. Has hands-on production experience with FastAPI (contract-first design, Pydantic schemas), performance tuning (indexes, caching), and secure auth patterns (OAuth/JWT, RBAC, row-level security via Supabase), plus low-risk incremental rollouts using feature flags and dual writes.”

Python JavaScript TypeScript SQL Java Redux+105

View profile

Aishwarya Thorat

Screened

Intern Data Scientist specializing in ML engineering and LLM agentic workflows

San Francisco, CA6y exp

ContentstackSan José State University

“Built an agentic, multi-step LLM system that generates full-stack code for API integrations using LangChain orchestration, Pinecone/SentenceBERT RAG, and a human-in-the-loop feedback loop for iterative code refinement. Also collaborated with non-technical content writers and PMs during a Contentstack internship to deliver a Slack-based AI workflow that generates and brand-checks articles with one-click approvals.”

A/B Testing Amazon Redshift Amazon S3 API Integration AWS AWS Glue+129

View profile

Machine Learning Engineers Data Scientists Software Engineers Data Engineers AI Engineers Data Analysts AI & Machine Learning Engineering Data & Analytics Education

Need someone specific?

AI Search

Related

Need someone specific?