Pre-screened and vetted.
Intern Data Scientist specializing in robotics localization and SLAM
“Robotics/embodied-AI practitioner who built a TurtleBot3 LiDAR-fingerprint localization pipeline end-to-end (autonomous data collection + multi-head NN) achieving ~30 cm error in a 10x10 m space. Also has industry experience at Infineon building large-scale production data/AI pipelines and rapidly fixing a deployed recommendation system by correcting upstream data normalization, improving accuracy by 20%+.”
Intern AI/ML Software Engineer specializing in RAG and medical AI
“ML/LLM engineer with production experience building medical RAG systems to automate chart review, including retrieval + re-ranking and rigorous evaluation. Notably uncovered errors/bias in physician-curated ground truth by tracing answers back to source note chunks and presented evidence to an academic partner, accelerating deployment. Also built a RAG-based FAQ chatbot for a health insurance company and delivered it to non-technical stakeholders via demos.”
Mid-level AI/ML Engineer specializing in GenAI and cloud MLOps
“Applied LLMs to high-stakes domains (wildfire risk for emergency teams and loan approval via a fine-tuned IBM Granite model), with a strong focus on reliability—using RAG-based cross-validation to reduce hallucinations and continuous ingestion pipelines (MODIS satellite imagery via AWS Lambda) to keep data current. Experienced in production orchestration and MLOps-style workflows using Airflow, AWS Step Functions, and SageMaker Pipelines, and collaborates closely with analysts on KPI-driven evaluation.”
Senior Data Scientist specializing in healthcare ML, LLMs, and responsible AI
“Clinical data scientist who has built an agentic LLM-powered literature review assistant (with RAG-style storage/retrieval) to identify predictors for downstream predictive modeling. Also delivered a patient-focused progression analysis model using Databricks + Airflow orchestration, partnering closely with clinicians to define targets and validate that model insights aligned with clinical expectations.”
Principal Data Scientist specializing in cybersecurity ML and MLOps
“ML/NLP engineer (Beyond Identity) who built production semantic search and entity-resolution systems over internal security documentation, using LDA + BERT embeddings with FAISS/Pinecone to cut search time by 30%. Also scaled a real-time anomaly detection pipeline to millions of events/day with Spark and AWS Lambda, with strong emphasis on measurable validation (Precision@k, MRR, F1, ARI).”
Mid-level AI & Data Scientist specializing in LLMs, RAG, and healthcare NLP
“Built a production LLM/RAG solution for healthcare operations teams to query large policy and care-guideline repositories in natural language. Improved domain alignment using vector retrieval plus parameter-efficient fine-tuning and prompt optimization, validated through internal user testing and metrics, cutting manual lookup time by ~40%. Also has hands-on experience orchestrating automated ML pipelines with Apache Airflow.”
Intern Data Analyst specializing in data pipelines and LLM/RAG applications
“Built and deployed LLM-powered analytics and reporting systems, including a RAG-based assistant over Snowflake that let business users ask questions in plain English instead of writing SQL. Experienced orchestrating LLM agents (LangChain) and serverless reporting pipelines (AWS Lambda/S3/RDS), with a strong focus on grounded outputs, monitoring/evaluation, and data quality—used daily by non-technical finance and operations teams at Cigna.”
Junior Data Scientist specializing in ML, geospatial analytics, and LLM applications
“Built and deployed a production AI “term explainer” agent that adapts explanations to beginner/intermediate/expert users by combining multi-step LLM reasoning with grounded Wikipedia retrieval. Owns end-to-end agent orchestration (smolagents/Python), reliability patterns (fallback across LLM providers, retries, guardrails), and observability/metrics-driven evaluation; also partnered with a non-technical researcher to deliver a plain-language research assistant agent.”
“Backend/data engineer who builds Python (FastAPI) data-processing API services for internal analytics/reporting, emphasizing modular architecture, async performance tuning, and reliability patterns (health checks, retries, observability). Also migrated legacy on-prem ETL pipelines to Azure using ADF/Data Lake/Functions and implemented a near-real-time ingestion flow with Event Hubs plus watermarking to handle late events and deduplication.”
Junior Data Scientist and Robotics Perception Engineer specializing in GenAI and autonomous systems
“Robotics software architect who built an automated pick-and-place palletizing prototype at BLACK-I-ROBOTICS, spanning perception (multi-RealSense fusion, segmentation, 6D pose, ICP), GPU-accelerated motion planning (MoveIt 2 + NVIDIA CuRobo), grasp generation, and safety (human detection + safe mode). Also brings cloud/CI/CD depth from VERIDIX AI (AWS Cognito/Lambda/ECS and CodePipeline stack) and demonstrated strong debugging chops by reducing outdoor rover EKF drift to ~5 cm via Allan variance-based IMU tuning.”
Mid-level AI Engineer & Data Scientist specializing in LLMs, RAG, and multimodal systems
“LLM/GenAI engineer who built a production AI-powered credit risk policy summarization and compliance alerting platform at HCL Tech, focused on factual accuracy and auditability for a financial client. Implemented a multi-retriever LangChain RAG architecture with citations-only prompting, fallback agents, and human-in-the-loop legal review—cutting manual review time by 35% and scaling to 12 teams.”
Mid-level Data Scientist specializing in ML, NLP, and LLM-powered solutions
“AI/NLP-focused practitioner who built a zero-/few-shot LLM event extraction system on the long-tail Maven dataset, combining prompt-structured outputs with LoRA/QLoRA fine-tuning and rigorous F1 evaluation. Also implemented entity resolution/data cleaning pipelines and embedding-based semantic search using Sentence-BERT + FAISS, and has healthcare experience delivering a multilingual speech/translation mobile prototype using HIPAA-compliant Azure Cognitive Services.”
Mid-level Data Scientist specializing in GenAI, LLM-to-SQL, and analytics platforms
“LLM/agentic AI builder who led end-to-end integration of an LLM system into a business intelligence product, creating a scalable, metadata-driven RAG/agent pipeline with an orchestrator that routes queries to specialized agents (including DB-backed quantitative querying). Also built an LLM-to-SQL chatbot and partnered with non-technical stakeholders to capture domain context and improve SQL generation, using automated LLM-based testing to evaluate reliability.”
Mid-level Data Scientist specializing in NLP and predictive modeling
“AI/ML practitioner in healthcare/insurance (Blue Cross Blue Shield) who built and deployed a production NLP system to classify patient risk from unstructured clinical notes. Experienced in end-to-end pipeline orchestration (Airflow, AWS Step Functions/Lambda/SageMaker) and real-time optimization (BERT to DistilBERT on AWS GPUs), with strong clinician collaboration to drive adoption.”
Junior Data & Insights Analyst specializing in BI, dashboards, and automation
“Worked on taking an LLM-based system at Soundmakr from prototype to production by adding prompt constraints, validation/guardrails, deterministic ranking, and robust logging/monitoring with feedback loops. Also partnered with product/marketing during an internship on Thea: Study Smart to analyze onboarding drop-offs and run A/B tests on AI-driven flows, translating results into actions that improved retention and conversion.”
Mid-level Data Scientist / AI-ML Engineer specializing in RAG, MLOps, and real-time analytics
“Software/ML engineer who built a production automated job-finding and cold-email personalization system for Fortune 500 outreach, using JobSpy for dynamic scraping, LangChain orchestration, and LLM+vector DB semantic search with grounding/relevance metrics and guardrails. Also delivered a predictive investment analytics platform for financial advisors, communicating results via Tableau dashboards and portfolio KPIs like Sharpe ratio and drawdowns.”
Senior Data Scientist specializing in ML, NLP, and production AI systems
“Machine learning/NLP engineer with deep Azure stack experience (Data Factory, Databricks/Spark, Delta Lake, Azure OpenAI, Azure AI Search) who built end-to-end production systems for semantic clustering, entity resolution, and hybrid search. Demonstrated measurable gains from embedding fine-tuning (~15% retrieval precision, ~10–12% nDCG@10) and designed scalable, quality-checked pipelines with MLOps best practices.”
Senior Data Engineer specializing in ETL/ELT pipelines and data integration platforms
“Data engineer/software engineer who led an end-to-end ETL/ELT pipeline at Pearson processing millions of rows of student data nightly, including client-side data prep/validation, SFTP/API ingestion, staging-based SQL validation/transforms, and production loading. Built reliability features like configurable per-client validation thresholds, detailed reporting, concurrency throttling via a custom queue, and multi-source merge/backfill logic to keep nightly loads running even when sources fail.”
Junior Data Scientist specializing in agentic AI and RAG pipelines
“LLM/agentic systems builder who shipped production workflows at Angel Flight West and Eureka AI, combining LangGraph + RAG (Postgres/pgvector) with strong observability (LangSmith/Langfuse). Delivered large operational gains (address lookup cut from 10 minutes to 60 seconds; accuracy to 92%) and has a track record of quickly stabilizing customer-critical pipelines (Pydantic-enforced JSON for ETL) while partnering with sales/ops to drive adoption.”
Mid-level Data Scientist specializing in healthcare ML and GenAI
“Healthcare data/NLP practitioner with experience at UnitedHealthcare building production ML systems that connect unstructured call center transcripts and medical notes to structured claims data. Has delivered measurable impact (25% classification accuracy lift; ~30% relevance improvement) using classical NLP, embeddings (Sentence-BERT + FAISS), and AWS SageMaker deployments with robust validation and drift monitoring.”
Mid-level Data Engineer specializing in cloud data pipelines and machine learning
“Experience spans college-built AWS-hosted Python/Flask web apps and enterprise data work at General Motors, including PostgreSQL query optimization on millions of records and multi-tenant-style data isolation using group-based, column-level permission grants. Also built an AWS-hosted meat price prediction dashboard using Dash/Plotly and ran large nightly data pipelines orchestrated with Apache Airflow.”
Mid-level Data Scientist/Data Analyst specializing in ML, BI dashboards, and ETL pipelines
“Data/ML practitioner with experience at Humana and Hexaware, focused on turning messy, semi-structured datasets into production-ready pipelines. Built an age-prediction model from book ratings using heavy feature engineering and multiple regression models, and has hands-on entity resolution (deterministic + fuzzy matching) plus embeddings/vector DB approaches for linking and search relevance.”
Mid-level Data Engineer specializing in multi-cloud real-time data pipelines
“Data engineer with healthcare/clinical trial domain experience who owned a 100TB+/month AWS pipeline end-to-end (Glue/S3/Redshift/Airflow) and drove measurable outcomes (20% lower latency, 99.9% reliability, 40% less manual reporting). Also built production data services and API-based ingestion on GCP (Cloud Run/Functions/BigQuery) with strong validation, versioning, and safe migration practices, and launched an early-stage RAG solution (LangChain + GPT-4) for researchers.”
Mid-level Data Engineer specializing in Azure, Spark, and scalable ETL/ELT pipelines
“Data engineer with banking FP&A experience who led an end-to-end migration of 10+ TB from Teradata to Azure (ADF + Data Lake + Databricks/PySpark + Synapse). Emphasizes reliability (multi-stage validation, monitoring/alerts) and performance (Spark tuning, incremental loads, autoscaling), reporting ~99.5% pipeline reliability while supporting downstream consumers with stable schemas and clear change management.”