Vetted Data Cleaning Professionals

Pre-screened and vetted.

KL

Kangjie Lu

Screened

Intern Full-Stack Software Engineer specializing in data pipelines and AI/ML systems

Beijing, China1y exp
Shanghai Wanwu Zhiyun Industrial Technology Co., Ltd.Carnegie Mellon University

Software engineer with experience building a Vue.js/TypeScript internal component library (with Jest testing standards) and improving JS runtime performance via profiling, code splitting, and lazy loading. Also led documentation and community support for a Python ML utility library, diagnosing metric-calculation bugs for imbalanced datasets and driving large reductions in support inquiries through targeted docs, tests, and rapid hotfixes in a startup environment.

View profile
UK

Mid-level Generative AI Engineer specializing in LLM agents and RAG systems

4y exp
Capital OneLindsey Wilson College

Built and deployed a production LLM/RAG knowledge assistant integrating internal docs, wikis, and ticket histories to reduce tribal-knowledge dependency and repetitive questions. Emphasizes reliability via grounding + a validation layer, and achieved major latency gains (>50%) through vector index optimization, caching, quantization, and selective re-validation. Comfortable orchestrating end-to-end LLM/data workflows with Airflow, Prefect, and Dagster, including monitoring and alerting.

View profile
BG

Senior Data Scientist / ML Engineer specializing in cloud ML pipelines and GenAI

Baltimore, MD17y exp
IntelIllinois Institute of Technology

ML/NLP practitioner with experience building a transformer-failure prediction system that combines sensor signals with unstructured maintenance comments using LLM-based extraction and similarity validation. Strong emphasis on production readiness—data leakage controls, SQL-driven data quality tiers, and rigorous bias/fairness validation (including contract/spec evaluation across diverse company profiles).

View profile
SM

Mid-level Data Scientist specializing in NLP/LLMs, time series forecasting, and MLOps

New York, NY6y exp
CitigroupKent State University

Data/ML practitioner with hands-on experience building NLP systems from prototype to production: delivered a Twitter sentiment classifier with robust preprocessing, SVM modeling, and Power BI reporting, and built entity-resolution pipelines for messy multi-source customer data (reporting ~95% improvement in unique entity identification). Also implemented semantic linking/search using SBERT embeddings with FAISS vector retrieval and domain fine-tuning (reported ~15% precision lift), and applies production workflow best practices (Airflow/Prefect, Docker, Azure ML/Databricks, Great Expectations).

View profile
Angelo Bianchi - Senior HR Business Partner specializing in organizational design, labor relations, and people analytics in New York, NY

Senior HR Business Partner specializing in organizational design, labor relations, and people analytics

New York, NY16y exp
Liberty Coca-Cola BeveragesUniversity of Miami

HR/people analytics and transformation leader with experience at Royal Caribbean and Baptist Hospital, combining a graduate degree in business analytics with hands-on org design and change execution. Built HR analytics capabilities (predictive/causal approaches, data quality, storytelling) and partnered directly with a CFO to run a 2+ year finance reorg across a 26,000-employee, 14-hospital system, including writing 124 job descriptions and implementing a CoE/business partnering model.

View profile
AR

Mid-level Business Analyst specializing in BI, reporting, and data insights

5y exp
Coca-ColaUniversity of Massachusetts Boston

Healthcare analytics professional with experience at UnitedHealth Group, focused on turning messy claims, eligibility, and provider data into clean reporting datasets and Power BI dashboards. Combines SQL and Python automation with strong stakeholder alignment around KPI definitions, helping operations teams improve claim turnaround visibility and cost efficiency.

View profile
SB

Mid-level Data Analyst specializing in financial and telecom analytics

Remote, USA5y exp
AT&TLewis University

Analytics candidate with hands-on experience at AT&T building SQL/Python pipelines for churn, usage, billing, and network-performance data at multi-million-row scale. Stands out for combining strong data quality and reconciliation practices with measurable operational impact, including a 30% query runtime improvement and ~8 hours/week of reporting automation savings.

View profile
Saisureshreddy Challa - Mid-level Data Scientist specializing in AI/ML, LLMs, and domain analytics in California, USA

Mid-level Data Scientist specializing in AI/ML, LLMs, and domain analytics

California, USA6y exp
BlackRockNortheastern University

BlackRock AI/ML engineer who built and owned a production LLM document intelligence system for regulatory and investment analysis end-to-end. They combined RAG, multi-agent validation, strong evaluation/monitoring, and reusable Python services to process 50K+ documents, cut review time 40-50%, and improve decision accuracy by about 25%.

View profile
PD

Mid-level Full-Stack Engineer specializing in enterprise SaaS and optimization platforms

Redwood City, CA5y exp
C3 AINortheastern University

Full-stack engineer with strong enterprise delivery experience across manufacturing and semiconductor use cases, owning deployments from discovery through post-launch support. Stands out for combining traditional product engineering with applied GenAI workflows and data pipeline reliability work, including a manufacturing app that reportedly saved a Fortune 500 customer about $6M and an AI chat panel adopted by 70% of pricing analysts.

View profile
YM

Yannick Matia

Screened

Mid-level Marketing Operations professional specializing in RevOps and growth systems

New York, NY3y exp
OrchestraUSC

Candidate appears to be a marketing operations and demand generation professional rather than a cold-calling/account management seller. They highlighted experience with CRM and growth operations, list sourcing and cleaning, and creating outreach collateral, and explicitly positioned themselves as someone who can improve growth operations from a technical marketing angle.

View profile
SB

Mid-level Data Engineer specializing in scalable pipelines, Spark, and cloud data warehousing

Boston, USA3y exp
Fidelity InvestmentsNortheastern University

Backend/data platform engineer who recently owned an end-to-end large-scale financial data platform delivering real-time decision support for finance and operations. Has hands-on experience modernizing legacy batch pipelines into AWS cloud-native ELT with parallel-run cutovers, strong data quality controls (dbt-style tests, reconciliation), and measurable improvements in runtime, cost, and SLA compliance. Also builds scalable, secure FastAPI microservices using Docker, ALB-based horizontal scaling, Redis caching, and managed auth with Cognito/Supabase plus Postgres RLS.

View profile
SN

Intern Full-Stack Software Engineer specializing in AI/ML and AWS cloud platforms

Birmingham, AL1y exp
Yuva BiosciencesTufts University

Full-stack engineer who built an LLM-powered productivity web app (LifeOS) end-to-end with TypeScript/Next.js, Prisma, and Postgres, emphasizing fast iteration with stable API contracts and an isolated AI service boundary. Also built a security/compliance login-verification workflow at Medpace used within an internal admin portal for thousands of employees, and has AWS experience orchestrating batch GPU workloads with robust retry/idempotency patterns.

View profile
SM

Mid-level AI/ML Engineer specializing in GenAI, NLP, and MLOps

Connecticut, USA5y exp
PfizerUniversity of New Haven

Built and deployed an enterprise GenAI knowledge assistant over thousands of internal PDFs/reports using a RAG stack (GPT-4 + Hugging Face embeddings + vector DB) to reduce manual search and SME escalations. Uses LangGraph/LangChain to orchestrate modular agent workflows with relevance filtering and fallback handling, and applies rigorous evaluation (golden datasets, edge cases, A/B tests) with production monitoring metrics.

View profile
SV

Mid-level AI/ML Engineer specializing in Generative AI and Conversational AI

Remote5y exp
InfosysUniversity at Buffalo

GenAI Engineer at Infosys who built and deployed a production multi-agent RAG system for a top-tier bank, scaling to ~50,000 queries/day with 99.9% uptime. Drove measurable gains (45% accuracy improvement, 30% API cost reduction) through open-source LLM fine-tuning, Pinecone indexing/retrieval optimization, and AWS-based MLOps/monitoring, and has experience enabling adoption via developer workshops and customer-facing collaboration.

View profile
MT

Mihir Trivedi

Screened

Junior Machine Learning & Quant Research Engineer specializing in low-latency data and trading systems

New York, NY3y exp
Astera HoldingsColumbia University

Applied ML to physical EV fleet systems at ST Labs, building a real-time CNN-LSTM fault prediction pipeline from streaming vehicle telemetry and addressing live data alignment issues via resampling/interpolation and buffered inference. Also developed a V2G/G2V energy transfer algorithm to automate charging/discharging for profit optimization, and made high-impact low-latency pipeline decisions at Astera Holdings using profiling, replay testing, and live A/B validation.

View profile
RK

Principal Software Engineer specializing in AI/ML and cloud-native backend systems

New York, NY16y exp
McKinsey & CompanyNJIT

McKinsey data/ML practitioner who led production deployment of an entity resolution + semantic search platform for unstructured finance and healthcare data, integrating with legacy systems under HIPAA constraints. Deep hands-on stack across transformers (spaCy/HF BERT), embeddings + FAISS, and production MLOps/workflow tooling (Airflow, Docker, CI/CD, Prometheus/Grafana), with reported gains of +30% decision speed and +25% search relevance.

View profile
SR

Senior Data Scientist specializing in machine learning and customer analytics

Illinois, USA7y exp
Northern TrustBradley University

Data/ML practitioner with experience applying NLP and classical ML to large-scale customer data (2B+ records) for segmentation, prediction, and survey-text classification, delivering measurable business impact (~18% engagement efficiency). Has hands-on entity resolution across multi-source datasets and has built embedding-based semantic search using SentenceBERT + a vector database with domain fine-tuning (~20% relevance improvement), plus production workflow experience with Spark/Airflow and cloud tooling (AWS/Azure).

View profile
IS

Irfan Shaik

Screened

Mid-level AI Software Engineer specializing in risk and fraud detection

Los Angeles, California4y exp
VisaGeorge Mason University

AI/software engineer with experience at Visa building a real-time transaction fraud/risk scoring microservice in the card authorization path (Python, Kafka, Kubernetes on AWS) with strict 120–150ms latency constraints and reason-code outputs for downstream decisioning. Owns ML backend end-to-end (data/feature engineering, model training, deployment) and has demonstrated production reliability work including latency spike mitigation, SLO-based observability, drift monitoring, and safe fallbacks to rule-based decisions.

View profile
DM

Mid-level Generative AI Engineer specializing in decision intelligence and RAG for regulated enterprises

5y exp
JPMorgan ChaseSaint Louis University

Healthcare GenAI engineer who built a HIPAA-compliant, auditable RAG-based claims decision support system at Molina Healthcare, processing 3M claims and delivering major impact (48% faster manual reviews, 43% higher decision accuracy). Deep hands-on experience with LangChain orchestration, vector search (ChromaDB/FAISS), embedding fine-tuning, and safety controls (confidence scoring, rule validation, human-in-the-loop escalation) for clinical workflows.

View profile
Belal Beydoun - Intern Full-Stack Software Engineer specializing in AI and data analytics in Detroit, MI

Belal Beydoun

Screened

Intern Full-Stack Software Engineer specializing in AI and data analytics

Detroit, MI2y exp
DTE EnergyUniversity of Michigan

Software engineer focused on real-time, low-latency AI pipelines: built an end-to-end mobile-to-backend image classification system using React Native/Expo, Node.js, gRPC, MySQL, and Google Vision AI, optimizing throughput and latency. Also integrated an AI model into a real-time field workflow at DTE via Node.js + Azure Databricks, adding data cleaning/validation and safe fallback logic for reliability in operations.

View profile
Bhavyasree Chinthala - Mid-level Data Engineer specializing in cloud data pipelines and real-time streaming in USA, USA

Mid-level Data Engineer specializing in cloud data pipelines and real-time streaming

USA, USA5y exp
PNCSaint Peter's University

Data engineer with PNC Bank experience owning high-volume financial transaction pipelines end-to-end (Kafka/REST ingestion through Spark/Glue transformations to Redshift serving) for risk and fraud analytics. Built strong reliability and data quality practices (Great Expectations, reconciliation, Airflow alerting, idempotent retries, incremental/windowed processing), reporting 40% ingestion efficiency gains and ~99.9% data accuracy.

View profile
BZ

Binghan Zhang

Screened

Intern Data Analyst specializing in business intelligence and financial analytics

San Francisco, CA1y exp
Innova AI TechUCLA

Analytics candidate with hands-on experience in both fraud and churn use cases, including SQL-based preparation of 6.5M transaction records and reproducible Python modeling workflows. Stands out for combining technical rigor in data quality, feature engineering, and imbalance handling with strong stakeholder alignment, metric definition, and dashboard adoption.

View profile
AM

Mid-level analytics professional specializing in AI, strategy, and business intelligence

Seattle, WA5y exp
Dell TechnologiesUniversity of Washington

Analytics-focused candidate with hands-on experience using SQL and Python to clean messy business data, automate reporting, and build practical customer analytics solutions. Notable examples include a 70% reduction in reporting time through Python-based Excel automation at Shell and stakeholder-friendly retention/RFM segmentation work for small business clients in freight and winery contexts.

View profile

Need someone specific?

AI Search