Vetted Data Cleaning Professionals

Pre-screened and vetted.

HC

Junior economics and statistics analyst specializing in healthcare and market research

Berkeley, CA2y exp
Berkeley Economic ReviewUC Berkeley

Candidate brings a cross-functional mix of early-stage startup consulting, marketing analytics, and outbound/GTM exposure, including work with a radiology startup on market positioning and investor-facing materials. They stand out for combining research and data analysis with clear communication, and have a strong self-driven interest in B2B SaaS, workflow automation, and scalable outbound systems.

View profile
YD

Senior Project Manager specializing in Healthcare IT and SaaS implementations

South Brunswick, NJ13y exp
CaseWorthySavitribai Phule Pune University

PMP-certified implementation project manager with 7+ years leading enterprise SaaS rollouts in government-funded healthcare and human services environments. At Caseworthy, they have owned full-lifecycle deployments of ClientTrack and the Azure OpenAI-based CARA AI copilot, managing complex data migration, compliance, and stakeholder governance challenges across multiple concurrent projects.

View profile
KP

Mid-level Data Analytics & ML Engineer specializing in NLP, LLMs, and cloud data platforms

Dallas, TX5y exp
MattelKennesaw State University

At KPMG, built and productionized a secure RAG-based LLM assistant that lets business and risk stakeholders query data warehouses in natural language, reducing dependence on data engineers for ad-hoc analysis. Demonstrates strong production rigor (Airflow orchestration, CI/CD, containerization), retrieval/embedding tuning (rechunking, semantic abstraction for structured data), and reliability controls (confidence thresholds, refusal behavior, monitoring and canary evals).

View profile
RR

Mid-level Data Scientist & Machine Learning Engineer specializing in fraud and forecasting

USA5y exp
JPMorgan ChaseUniversity of Texas at Dallas

ML/LLM practitioner who has shipped production RAG systems (summarization + Q&A) and end-to-end Airflow-orchestrated demand forecasting pipelines at NEON IT. Strong focus on reliability—uses evaluation scripts, retrieval/chunking tuning, validation/retries/alerts, and stakeholder-driven iteration to make AI workflows consistent and usable.

View profile
IK

Junior ML Engineer specializing in energy forecasting and battery optimization

San Carlos, CA3y exp
ElecricFishUniversity of Michigan

Backend/ML engineer working on a battery energy storage system operations dashboard: built a Flask backend integrated with OAuth and a separate FastAPI optimization/simulation service, deployed via Docker CI/CD to Azure Container Apps. Strong in productionizing ML (AzureML to batch endpoints) and in performance/scalability patterns (Postgres indexing/JSONB, per-unit data isolation, async throttling + caching for year-long CPU-intensive simulations across 40+ scenarios).

View profile
NP

Neha P

Screened

Mid-level Full-Stack Java Developer specializing in cloud-native microservices

Texas, State4y exp
Bank of AmericaUniversity of Central Missouri

Full-stack engineer with Bank of America experience modernizing a large-scale financial reporting platform. Built React frontends and Java/Spring Boot microservice APIs end-to-end, optimized data-heavy SQL performance (indexing/caching/pagination), and implemented an AI feature for forecasting and anomaly detection using Python/scikit-learn, with deployments supported on AWS.

View profile
HK

Mid-level Data Scientist specializing in Generative AI and NLP

USA6y exp
CVS HealthUniversity of Central Missouri

ML/GenAI engineer with recent CVS Health experience building a production RAG system over unstructured financial/research documents using LangChain, FAISS, and Pinecone, plus LoRA/PEFT fine-tuning of GPT/LLaMA for domain-aware summarization. Demonstrates strong applied MLOps and data engineering skills (Airflow/Prefect, Docker/Kubernetes, CI/CD, MLflow) and measurable impact (sub-second retrieval, ~40% better context retrieval, ~25% entity matching improvement).

View profile
SL

Mid-level Data Engineer specializing in cloud ETL/ELT and healthcare analytics

Dallas, TX5y exp
Lightbeam Health SolutionsSyracuse University

Healthcare-focused data engineer/ML practitioner with experience at Lightbeam Health Solutions and Humana building production entity-resolution and semantic similarity pipelines across EMR, lab, and claims data. Uses NLP/ML (spaCy, scikit-learn, BioBERT/LightGBM) plus Snowflake/Airflow and vector search (Pinecone) to improve linkage accuracy (reported 90%) and semantic match quality (reported +12–15%), while reducing manual cleanup by 40%+.

View profile
VA

Senior AI/ML Engineer specializing in Generative AI, RAG, and agentic systems

6y exp
Wellmark Blue Cross and Blue ShieldIndiana Wesleyan University

GenAI/LLM ML engineer (currently at Webprobo) building an enterprise GenAI platform with document intelligence and automation on AWS and blockchain. Has hands-on experience with RAG, LLM evaluation tooling, and orchestrating production LLM workflows with Apache Airflow, plus deep exposure to reliability challenges in globally distributed/edge deployments. Also partnered with business/marketing stakeholders at a banking client to deliver an AI-driven customer retention insights solution.

View profile
MK

Senior Data Analyst specializing in data pipelines, web scraping, and legal data enrichment

Illinois, USA5y exp
The HartfordIndiana Wesleyan University

Data engineer focused on reliable, scalable analytics pipelines and external data collection. Has owned end-to-end pipelines processing 5–10M records/day, serving Snowflake data marts to Power BI/Tableau, and reports ~99% reliability through strong validation/monitoring. Also shipped versioned REST APIs for curated data with query optimization and caching.

View profile
Fangjian Xiong - Junior Machine Learning Engineer specializing in NLP and biomedical entity extraction in Boston, MA

Junior Machine Learning Engineer specializing in NLP and biomedical entity extraction

Boston, MA2y exp
Northeastern UniversityNortheastern University

Built and deployed a production LLM-powered biomedical knowledge extraction pipeline that processed millions of papers to identify tools/techniques and produce a unified knowledge graph via active learning NER (Prodigy + spaCy transformers) and entity linking (Bio-tools/Wikidata). Addressed hard NLP engineering challenges like WordPiece span-offset alignment and scaled inference over ~1.5M documents using batching/caching, containerized services, async workers, and orchestration with Prefect/Airflow.

View profile
SV

Mid-Level Data Engineer specializing in cloud data platforms and governed analytics

5y exp
OptumUniversity of Central Missouri

Data engineer with Optum experience building end-to-end healthcare data pipelines for HL7/FHIR, processing millions of records daily across Kafka streaming and Databricks/Spark batch. Strong focus on data quality (schema enforcement/validations), reliability (Airflow monitoring/alerts), and analytics-ready serving in Snowflake powering Power BI/Tableau, with CI/CD via Git and Jenkins.

View profile
TD

Mid-level Cloud Data Engineer specializing in Azure/AWS pipelines and medallion architecture

USA4y exp
UnitedHealth GroupSouthern Illinois University Carbondale

Data engineer focused on reliability and data quality, owning end-to-end pipelines processing ~100k–300k records/day. Implemented robust validation and monitoring that cut reporting issues by ~30%, and built stable external data collection with anti-bot measures, backfills, and schema-change detection while maintaining backward-compatible internal data services.

View profile
SS

Sriraj Samala

Screened

Mid-level Data Analyst specializing in business analytics and BI

Dayton, OH3y exp
University of DaytonUniversity of Dayton

Analytics professional with higher education experience at the University of Dayton, focused on turning inconsistent operational data into standardized metrics and recurring dashboards. They combine SQL, Python, and Power BI to automate reporting, improve data integrity, and reduce manual reporting by 30%, with outputs adopted in semester planning and cross-department performance tracking.

View profile
NT

Intern-level Software Engineer specializing in AI/ML systems

Frankfort, KY2y exp
UPSPurdue University

Built production LLM/RAG systems during a UPS internship, including a shipment knowledge agent used across 15+ hubs worldwide and a multi-agent PDF RAG workflow. Stands out for combining hands-on enterprise integration with rigorous evaluation, hallucination reduction, and efficient fine-tuning techniques like LoRA.

View profile
DP

Dhruv Pandoh

Screened

Junior Full-Stack Software Engineer specializing in AI, FinTech, and e-commerce

New York, USA2y exp
MIO PartnersNYU

Built both traditional internal tooling and LLM-powered systems during an internship, including a React/Python/AWS calculator onboarding platform and a production-style ROS2 RAG assistant over 10K+ documents. Stands out for combining full-stack delivery, stakeholder coordination, and practical AI reliability work like retrieval tuning, source-grounded answers, and low-confidence fallbacks.

View profile
PS

Polam Srija

Screened

Mid-level AI/ML Engineer specializing in Generative AI and FinTech

Texas, USA3y exp
Fidelity InvestmentsUniversity of North Carolina at Charlotte

AI Engineer with hands-on ownership of a production multi-agent RAG platform in financial services, spanning experimentation, architecture, deployment, monitoring, and iterative optimization. Stands out for measurable impact: 35% retrieval relevance improvement and nearly 50% reduction in manual operational analysis effort, plus strong experience making enterprise LLM systems safer and more reliable in production.

View profile
Supreet Purthpli - Mid-level AI/ML Software Engineer specializing in cloud-native MLOps and FinTech in San Francisco, CA

Mid-level AI/ML Software Engineer specializing in cloud-native MLOps and FinTech

San Francisco, CA4y exp
JPMorgan ChaseUniversity of Kansas

Software engineer with JPMorgan Chase experience delivering end-to-end fintech features (Next.js/React/Node/Postgres on AWS) and measurable performance gains. Built and productionized an AI-native credit decisioning workflow combining LLMs, vector retrieval, and a rules engine with strong governance (bias checks, auditability, human-in-loop), improving precision and cutting underwriting turnaround time by 40%.

View profile
VB

Entry Data Scientist specializing in data engineering and automotive analytics

Bangalore, India1y exp
Tata ElxsiUniversity of Cincinnati

Frontend-focused candidate with hands-on experience building React and TypeScript dashboards for searching, filtering, and analyzing large datasets in real time. Demonstrates practical performance tuning skills using React DevTools, memoization, debouncing, and pagination, and has also built a Mapbox-based location data dashboard with interactive markers and popups.

View profile
SK

Soham Kukkar

Screened

Mid-level Software Engineer specializing in AI and FinTech backend systems

Oakland, CA4y exp
Capital OneClark University

Full-stack and AI engineer with Capital One experience spanning real-time customer dashboards and production fraud-analysis systems. They combine TypeScript/Next.js/Node.js product engineering with LangChain-based RAG architecture over a 400 GB credit-report corpus, delivering measurable impact including 35% lower frontend latency and 45% faster analyst workflows.

View profile
KP

Senior AI Engineer specializing in Generative AI and RAG applications

8y exp
Keurig Dr PepperGeorge Mason University

AI engineer who has shipped production LLM systems across customer service and marketing use cases—building a RAG app on Azure OpenAI and speeding retrieval with Redis caching tied to Okta sessions. Also implemented a LangGraph multi-agent workflow that pulls image context from Figma to generate structured HTML marketing emails, adding a verification agent to improve image-selection accuracy while optimizing solution cost for business stakeholders.

View profile
JL

Junior Machine Learning Engineer specializing in LLMs, NLP, and computer vision

Bengaluru, Karnataka2y exp
PwCArizona State University

Built a production, agentic multi-agent pharmaceutical intelligence system for US oncology (breast cancer) conference/news intelligence, automating MSL-style information gathering and summarization for pharma and healthcare stakeholders. Uses CrewAI + LangChain orchestration, custom scraping across ~15 pharma newsrooms, and a grounding-score evaluation approach (sentence transformers/cosine similarity) to mitigate hallucinations.

View profile
NM

Mid-level Data Scientist/ML Engineer specializing in healthcare AI and MLOps

USA4y exp
CVS HealthUniversity at Buffalo

Designed and deployed an enterprise LLM-powered clinical/pharmacy policy knowledge assistant at CVS Health, replacing manual searches across PDFs/Word/SharePoint with a HIPAA-compliant RAG system. Built end-to-end ingestion and orchestration (Airflow + Azure ML/Data Lake + vector index) with PHI masking, versioned re-embedding, and production monitoring (Prometheus/Grafana), and partnered closely with clinicians/compliance to ensure policy-grounded, auditable answers.

View profile
RQ

Ramiz Qudsi

Screened

Principal Data Scientist & Software Engineer specializing in space mission data systems

Boston, MA13y exp
Boston UniversityUniversity of Delaware

Space/heliophysics ML engineer who built a PyTorch GRU model to propagate solar wind from L1 to the magnetopause with probabilistic outputs for uncertainty quantification, achieving ~25% better CRPS than standard approaches. Also developed production-grade Python ETL and an open-source telemetry processing package for a mission (LEXI), using Docker and GitHub Actions CI/CD and iterating with scientist/engineer stakeholders.

View profile

Need someone specific?

AI Search