Vetted PySpark Professionals

Pre-screened and vetted.

Nishad Kane

Screened

Mid-level Data Scientist & AI Engineer specializing in RAG, agentic AI, and production ML

5y exp

Xtrium AIArizona State University

“AI/data engineer who built a production LLM-powered schema drift detection system (LangChain/LangGraph) to catch semantic data changes before they break downstream analytics/ML. Deployed on AWS with Docker/S3 and implemented an LLM-as-a-judge evaluation framework to improve trust, reduce hallucinations, and control false positives/alert fatigue. Collaborated with non-technical risk/business analytics stakeholders at EY by delivering human-readable drift explanations that improved confidence in financial analytics dashboards.”

A/B Testing Amazon EC2 Amazon EKS Amazon Redshift Amazon S3 Amazon SageMaker+104

View profile

Revanth Goli

Screened

Senior Data & Backend Engineer specializing in cloud data pipelines and LLM/RAG systems

Morrisville, NC6y exp

Syneos HealthUniversity of Alabama at Birmingham

“Data engineer with end-to-end ownership of large-scale retail and clinical data ingestion/processing on AWS, including real-time streaming and batch pipelines. Delivered measurable outcomes: 20M daily transactions processed, latency cut from 4 hours to 5 minutes, ~70% fewer failures, and 120+ pipelines running at 99.8% reliability with full audit compliance.”

Python Pandas PySpark FastAPI LangChain SQL+97

View profile

Keerthi Kalluri

Screened

Senior Full-Stack & GenAI Engineer specializing in healthcare and financial services

6y exp

Kaiser PermanenteTexas Tech University

“Built and deployed a production LLM-powered customer support assistant using a RAG backend in Python, focused on deflecting repetitive Tier-1 tickets and reducing resolution time. Demonstrates strong production engineering instincts around reliability (confidence scoring + human fallback), scalability/cost optimization (multi-stage pipelines), and workflow orchestration/observability (LangChain, custom DAGs, structured logging, step metrics).”

Agile AJAX Amazon EC2 Amazon EKS Amazon RDS Amazon Redshift+220

View profile

Farhath Banu

Screened

Senior Software Engineer specializing in AI-driven marketing and data platforms

Boston, MA7y exp

PostscriptShadan College of Engineering and Technology

“Backend/data engineer who builds production FastAPI microservices and AWS serverless/Glue pipelines for SMS analytics and marketing segmentation. Led a legacy batch modernization into modular services (FastAPI + Glue/Athena + ClickHouse) using shadow-mode parity checks, feature flags, and incremental rollout. Demonstrated measurable performance wins (12s to sub-second SQL; ~40% CPU reduction) and strong incident ownership with proactive schema-drift prevention.”

Python TypeScript Java C C++FastAPI+127

View profile

Shanmukha Jayavarapu

Screened

Mid-level AI/ML Engineer specializing in fraud detection and healthcare predictive analytics

Missouri, USA4y exp

KPMGUniversity of Central Missouri

“Built and deployed a production LLM-powered calorie-counting chatbot that turns plain-English meal descriptions into normalized food entities, quantities, and calorie estimates using a hybrid transformer + rule-engine pipeline. Emphasizes reliability with schema/constraint guardrails, confidence-based routing (including embedding similarity search fallbacks), and strong observability/metrics (hallucination rate, calibration, latency, cost). Partnered closely with nutritionists to encode domain standards into mappings and validation logic.”

Python PyTorch TensorFlow Scikit-learn XGBoost LightGBM+97

View profile

Pravalika Kasojjala

Screened

Mid-level AI/ML Engineer specializing in LLM, RAG/GraphRAG, and fraud analytics

Charlotte, NC5y exp

Bank of AmericaUniversity of Wisconsin–Milwaukee

“LLM/agent engineer who has deployed a production internal assistant to reduce employee inquiry resolution time while maintaining regulatory compliance. Experienced with RAG, hallucination risk triage, and graph-based orchestration (LangGraph) for enterprise/banking-style workflows, emphasizing schema-validated, citation-backed, tool-constrained agent designs and tight collaboration with non-technical business/compliance stakeholders.”

A/B Testing Agile Amazon Bedrock Amazon CloudWatch Amazon EC2 Amazon ECS+190

View profile

Saniya Shinde

Screened

Mid-level Data Scientist specializing in NLP, LLMs, and RAG systems

Washington, DC4y exp

World BankGeorge Washington University

“Built and deployed a production-style vision-language pipeline that generates structured medical reports from chest X-rays using BioViLT embeddings, an image-text alignment module, and BiGPT fine-tuned with LoRA, delivered via Streamlit and hosted on AWS EC2. Also collaborating experience presenting EDA findings, feature importance, and model performance to Ford managers while working with vehicle parts data at Bimcon.”

Python SQL R C++PyTorch TensorFlow+93

View profile

Alex Manni

Screened

Senior Agile/Product Delivery Leader specializing in enterprise transformation, data and cybersecurity

London, UK39y exp

OfcomPolitecnico di Milano

“Built a web-based online Sudoku game in JavaScript (multiplayer format supporting up to 6 teams with up to 5 players each) and demonstrates strong product/analytics orientation. Uses a KPI-driven approach (DAU/WAU, ARPU, session duration, LTV) and structured prioritization methods (MoSCoW, story mapping, cost of delay, DFV) to iterate toward targets; seeking a remote role around $70k/year.”

Agile Scrum Kanban Change Management Stakeholder Management Vendor Management+331

View profile

Gowthami chilukuru

Screened

Mid-Level Full-Stack Software Engineer specializing in healthcare, cloud, and data platforms

Sunnyvale, CA5y exp

Intuitive SurgicalStevens Institute of Technology

“Backend/platform engineer who owned a real-time customer analytics microservice stack in Python/FastAPI with Kafka streaming into PostgreSQL, including schema enforcement (Avro) and high-throughput optimizations. Strong Kubernetes + GitOps practitioner (EKS/GKE, Helm, Argo CD) who has handled CI/CD reliability issues with automated pre-deploy checks and rollbacks, and supported major migrations (on-prem to AWS; VM to EKS) with blue-green cutover planning.”

Python R Java C JavaScript TypeScript+200

View profile

Sharan Raj Sivakumar

Screened

Senior Software Developer specializing in AI/ML automation and cloud-native systems

New York City, NY6y exp

EricssonUniversity at Buffalo

“ML/MLOps practitioner who built production systems for telecom network analytics, including an automated labeling + multi-label Random Forest solution that cut labeling effort by 90% and sped up RCA. Led an Ericsson auto-deployment platform using Airflow, Azure IoT Hub, Docker, and Celery to orchestrate 120+ containerized ML/rule-based deployments, saving ~80 hours of setup per deployment.”

Python SQL MongoDB Redis MySQL SQLite+86

View profile

Saloni Patadia

Screened

Mid-level Machine Learning Engineer specializing in LLM systems and healthcare data automation

California, USA2y exp

Prime HealthcareUSC

“React performance-focused engineer who contributed performance patches back to an open-source context+reducer state helper after profiling and fixing excessive re-renders in an enterprise project management platform at Easley Dunn Productions. Also built an end-to-end LLM-driven pipeline at Prime Healthcare to normalize millions of supply-chain records, reducing defects by 80% and saving 160+ hours/month.”

LangChain LlamaIndex FAISS Vector Search Semantic Search Prompt Engineering+100

View profile

Neeraj Jawahirani

Screened

Mid-level Data & AI Engineer specializing in healthcare data pipelines and MLOps

FL, USA4y exp

HumanaFlorida State University

“Built and deployed a production LLM-powered clinical note summarization system used by care managers to speed review of 5–20 page unstructured medical records. Implemented safety-focused validation (prompt constraints, rule-based and section-level checks, human-in-the-loop) to reduce hallucinations while maintaining low latency and meeting privacy/regulatory constraints, integrating via APIs into existing clinical tools.”

Agile Amazon CloudWatch Amazon EMR Amazon Redshift Amazon S3 Amazon SageMaker+122

View profile

Siva Manikanta Lakumarapu

Screened

Mid-level AI/ML Engineer specializing in Generative AI and NLP

Dallas, TX5y exp

Gilead SciencesUniversity of North Texas

“AI/LLM engineer with production experience building secure, scalable compliance-focused generative AI systems (GPT-3/4, BERT) including RAG over internal regulatory document bases. Has delivered end-to-end pipelines on AWS with PySpark/Airflow/Kubernetes/FastAPI, emphasizing privacy controls, monitoring, and iterative evaluation (A/B testing). Also partnered closely with bank compliance officers using prototypes to refine NLP summarization/classification and reduce document review time.”

A/B Testing Agile Amazon EC2 Amazon Redshift Amazon S3 Apache Airflow+164

View profile

Hrishikesh Raghunath

Screened

Mid-level Data Engineer specializing in scalable ETL, streaming analytics, and cloud data platforms

Remote, USA7y exp

Dreamline AICalifornia State University, Fullerton

“At Dreamline AI, built and productionized an AWS-based incentive intelligence platform that uses Llama-2/GPT-4 to extract eligibility rules from unstructured state policy documents into structured JSON, then processes them with Glue/PySpark and serves results via Lambda/SageMaker/API Gateway. Designed state-specific ingestion connectors plus schema validation and automated checks/alerts to handle frequent policy/format changes without breaking the pipeline, and partnered with business/analytics stakeholders to deliver interpretable eligibility decisions via explanations and dashboards.”

A/B Testing Amazon CloudWatch Amazon Kinesis Amazon Redshift Amazon S3 Amazon SageMaker+114

View profile

Rakesh Kolagani

Screened

Mid-level AI/ML Engineer specializing in MLOps and LLM-powered applications

Mountain View, CA5y exp

IntuitUniversity of Central Missouri

“AI/ML engineer with production experience building a RAG-based internal analytics assistant (Databricks + ADF ingestion, Pinecone vector store, LangChain orchestration) deployed via Docker on AWS SageMaker with CI/CD and MLflow. Strong focus on real-world constraints—latency/cost optimization (LoRA ~60% compute reduction), hallucination control with citation grounding, and enterprise security/governance. Previously at Intuit, delivered an interpretable churn prediction system (PySpark/Databricks, Airflow/Azure ML) that improved retention targeting ~12%.”

A/B Testing Amazon S3 Apache Airflow AWS Glue AWS Lambda AWS Step Functions+126

View profile

Pooja Murigappa

Screened

Mid-level AI/ML Engineer specializing in NLP, Generative AI, and MLOps in Financial Services

Austin, TX5y exp

Charles SchwabUniversity of Central Missouri

“ML/LLM engineer at Charles Schwab who built a production loan-advisor chatbot integrated with internal knowledge and loan-calculator APIs, adding strict numeric validation to prevent rate hallucinations and optimizing context to control costs. Also runs ~40 Airflow DAGs orchestrating retraining/ETL/drift monitoring with an automated Snowflake→SageMaker→auto-deploy pipeline, and uses rigorous testing plus canary rollouts tied to business metrics and compliance constraints.”

Amazon DynamoDB Apache Airflow Apache Kafka Apache Spark AWS AWS Glue+183

View profile

SUMIT MAMTANI

Screened

Mid-level Data Scientist specializing in ML, MLOps, and customer analytics

Tempe, AZ4y exp

QlikArizona State University

“ML/NLP practitioner focused on insurance/claims analytics for a large financial firm, working with millions of fragmented structured and unstructured records. Built production-grade pipelines for entity extraction, entity resolution, and semantic search using Sentence-BERT + vector DB, including fine-tuning with contrastive learning (reported ~15% recall lift) and scalable ETL/containerized deployment on Kubernetes.”

Python Pandas NumPy Scikit-learn TensorFlow PyTorch+117

View profile

Nandini Kalita

Screened

Senior Data Scientist / ML Engineer specializing in NLP, anomaly detection, and cloud ML platforms

Remote, CA10y exp

EmotionallNMIMS University

“ML/NLP practitioner who built customer-feedback topic modeling (NMF + TF-IDF) to diagnose chatbot-to-agent handovers and drove product/ops changes that reduced operational costs by 20%. Also developed LSTM-based intent recognition using Word2Vec/GloVe embeddings for semantic linking, and deployed an LSTM autoencoder for fraud anomaly detection that cut false positives by 25% while capturing 15% more fraud in A/B testing.”

A/B Testing Agile Anomaly Detection AWS BigQuery Bitbucket+116

View profile

Molli Dinesh

Screened

Mid-level AI/ML Engineer specializing in NLP, LLMs, and MLOps

Remote, USA4y exp

Marsh McLennanIllinois Institute of Technology

“Built an AI-driven insurance policy summarization platform at Marsh, taking it end-to-end from messy PDF ingestion/OCR and custom extraction through LLM fine-tuning and AWS SageMaker deployment. Delivered measurable impact (25% reduction in manual review time, 99% uptime) and demonstrated strong production MLOps/LLMOps practices with Airflow/Step Functions orchestration, rigorous evaluation (ROUGE + human review), and continuous monitoring for drift, latency, and hallucinations.”

Python Pandas NumPy Scikit-learn R SQL+132

View profile

Shweta Gupta

Screened

Senior Backend Software Engineer specializing in Java microservices, Kafka, and AWS

Seattle, WA6y exp

EasyBee AIUC Irvine

“AI engineer who shipped a production chat assistant for a storage company by building the underlying RAG-style knowledge base (document ingestion, chunking/embeddings, FAISS vector store) and an admin update interface to keep content current. Also has full-stack delivery experience (Python REST APIs + React/TypeScript UI) and AWS operations using Terraform/Jenkins, including handling a real production performance incident by optimizing DB queries and adding auto-scaling.”

A/B Testing Agile API Testing AWS Bash Batch Processing+111

View profile

Supriya Mattapelly

Screened

Mid-level AI/ML Engineer specializing in GenAI agents, RAG pipelines, and MLOps

USA6y exp

UnitedHealthcareKent State University

“AI/ML engineer who built a production RAG-based internal document intelligence assistant (LangChain + Pinecone) to let employees query enterprise reports in natural language. Demonstrated hands-on pipeline orchestration with Apache Airflow and tackled real production issues like retrieval grounding and latency using tuning, caching, and token optimization, while partnering closely with non-technical business stakeholders through iterative demos.”

A/B Testing Amazon CloudWatch Amazon EC2 Amazon EMR Amazon Redshift Amazon S3+152

View profile

Ruijing Wang

Screened

Intern Data Scientist specializing in healthcare AI and experimentation

Boulder, CO1y exp

EchoPlus AIStevens Institute of Technology

“Human-AI Design Lab practitioner who productionized a wearable-health anomaly detection system by evolving a standalone autoencoder into a hybrid autoencoder + GPT-based approach, backed by PySpark ETL and MLOps on AWS SageMaker/MLflow. Also has applied LLM troubleshooting experience (fine-tuned FLAN-T5 summarization) and partnered with BI teams to run A/B tests and improve retention via feature stores and experimentation.”

Python Pandas Scikit-Learn PyTorch TensorFlow SQL+97

View profile

Deepika Dhanajayan

Screened

Mid-level Data Scientist specializing in Generative AI, RAG systems, and ML engineering

Amherst, MA6y exp

University of Massachusetts AmherstUniversity of Massachusetts Amherst

“AI/LLM engineer who built a production QA RAG for a University of Massachusetts faculty success initiative, cutting service tickets by 70%. Strong end-to-end RAG implementation skills (LangChain, Qdrant, hybrid/HyDE retrieval, FastAPI) with rigorous evaluation (RAGAS, LLM-as-judge) and practical handling of constraints like API rate limits and cost. Prior cross-functional delivery experience collaborating with SMEs and business owners at TCS and IBM.”

AWS Azure Blob Storage BERT ChromaDB CI/CD Computer Vision+125

View profile

HarshaSree gudapati

Screened

Senior Data Engineer specializing in cloud-native data platforms for finance and healthcare

Charlotte, NC4y exp

Bank of AmericaUniversity of Cincinnati

“Data engineer/backend data services practitioner with Bank of America experience building real-time and batch transaction-monitoring pipelines and APIs (Kafka + databases, REST/GraphQL). Highlights include a reported 45% response-time improvement through performance optimizations and use of Delta Lake schema evolution plus CI/CD (GitHub Actions/Jenkins) and operational reliability patterns like CloudWatch monitoring and dead-letter queues.”

Azure Data Factory AWS Amazon S3 AWS Glue Amazon Redshift Amazon EMR+125

View profile

Machine Learning Engineers Data Scientists Software Engineers Data Engineers AI Engineers Data Analysts AI & Machine Learning Engineering Data & Analytics Education

Need someone specific?

AI Search

Related

Need someone specific?