Pre-screened and vetted.
Mid-level Data Analytics & ML Engineer specializing in NLP, LLMs, and cloud data platforms
“At KPMG, built and productionized a secure RAG-based LLM assistant that lets business and risk stakeholders query data warehouses in natural language, reducing dependence on data engineers for ad-hoc analysis. Demonstrates strong production rigor (Airflow orchestration, CI/CD, containerization), retrieval/embedding tuning (rechunking, semantic abstraction for structured data), and reliability controls (confidence thresholds, refusal behavior, monitoring and canary evals).”
Mid-level Data Scientist / ML Engineer specializing in streaming ML systems for healthcare and IoT
“ML/GenAI engineer with production experience building an LLM-powered governance layer that summarizes verified drift/performance signals into validation reports and release notes, designed for regulated environments with de-identification and non-blocking fallbacks. Strong Airflow-based orchestration background across healthcare and finance, integrating Databricks/Spark and MLflow for scalable retraining/monitoring. Demonstrated ability to partner with non-technical healthcare operations teams to deliver actionable risk-scoring outputs via dashboards and automated reporting.”
“ML/GenAI engineer with recent CVS Health experience building a production RAG system over unstructured financial/research documents using LangChain, FAISS, and Pinecone, plus LoRA/PEFT fine-tuning of GPT/LLaMA for domain-aware summarization. Demonstrates strong applied MLOps and data engineering skills (Airflow/Prefect, Docker/Kubernetes, CI/CD, MLflow) and measurable impact (sub-second retrieval, ~40% better context retrieval, ~25% entity matching improvement).”
Mid-level Data Engineer specializing in cloud ETL/ELT and healthcare analytics
“Healthcare-focused data engineer/ML practitioner with experience at Lightbeam Health Solutions and Humana building production entity-resolution and semantic similarity pipelines across EMR, lab, and claims data. Uses NLP/ML (spaCy, scikit-learn, BioBERT/LightGBM) plus Snowflake/Airflow and vector search (Pinecone) to improve linkage accuracy (reported 90%) and semantic match quality (reported +12–15%), while reducing manual cleanup by 40%+.”
Senior Data Analyst specializing in data pipelines, web scraping, and legal data enrichment
“Data engineer focused on reliable, scalable analytics pipelines and external data collection. Has owned end-to-end pipelines processing 5–10M records/day, serving Snowflake data marts to Power BI/Tableau, and reports ~99% reliability through strong validation/monitoring. Also shipped versioned REST APIs for curated data with query optimization and caching.”
Mid-level Data Scientist / ML Engineer specializing in secure GenAI and financial compliance
“Built a production "sentinel insight engine" to tame information overload from millions of product reviews and support transcripts, combining Azure OpenAI (GPT-3.5) zero-shot classification with a fine-tuned T5 summarizer to generate weekly actionable product insights. Demonstrated strong MLOps/production engineering by adding drift monitoring with embedding-based detection, integrating REST with legacy SOAP/queue-based CRM via FastAPI middleware, and scaling reliably on Kubernetes with HPA.”
Senior Data & Backend Engineer specializing in cloud data pipelines and LLM/RAG systems
“Data engineer with end-to-end ownership of large-scale retail and clinical data ingestion/processing on AWS, including real-time streaming and batch pipelines. Delivered measurable outcomes: 20M daily transactions processed, latency cut from 4 hours to 5 minutes, ~70% fewer failures, and 120+ pipelines running at 99.8% reliability with full audit compliance.”
Junior Data Engineer specializing in Snowflake and investment data platforms
“Private markets/private credit data engineer owning core Snowflake/AWS data infrastructure (S3 → ActiveBatch → Snowflake) with automated iceDQ quality checks and curated datasets for internal Power BI/React reporting. Drove major reliability and delivery improvements, including cutting DB CI/CD deploy time 50% and reducing downstream table errors by 90%+, and also built an internal React/FastAPI app to visualize the team’s data infrastructure in an ambiguous early-stage environment.”
Mid-level Data Scientist specializing in Generative AI, MLOps, and cloud data platforms
“GenAI/ML engineer (CitiusTech) who has deployed production RAG systems for compliance/operations document Q&A, using Pinecone + FastAPI microservices on Kubernetes with strong monitoring and guardrails. Also built a GenAI-powered incident triage/routing solution in collaboration with non-technical stakeholders, achieving 35% faster response times and 40% fewer misclassified tickets, and has hands-on orchestration experience with Airflow and AutoSys.”
Mid-level Data Engineer specializing in big data pipelines and real-time streaming
“Data engineer who has owned end-to-end production pipelines processing a few million records/day, using Python/Airflow/SQL/PySpark with Snowflake serving to BI (Power BI). Built resilient external web data collection systems (anti-bot, schema-change detection, backfills) and shipped versioned REST APIs for internal consumers, improving pipeline success rates to 99% through monitoring, retries, and idempotent design.”
Mid-Level Data Engineer specializing in cloud data platforms and governed analytics
“Data engineer with Optum experience building end-to-end healthcare data pipelines for HL7/FHIR, processing millions of records daily across Kafka streaming and Databricks/Spark batch. Strong focus on data quality (schema enforcement/validations), reliability (Airflow monitoring/alerts), and analytics-ready serving in Snowflake powering Power BI/Tableau, with CI/CD via Git and Jenkins.”
Mid-level Cloud Data Engineer specializing in Azure/AWS pipelines and medallion architecture
“Data engineer focused on reliability and data quality, owning end-to-end pipelines processing ~100k–300k records/day. Implemented robust validation and monitoring that cut reporting issues by ~30%, and built stable external data collection with anti-bot measures, backfills, and schema-change detection while maintaining backward-compatible internal data services.”
Senior Data Analyst specializing in marketing, BI, and financial analytics
“Marketing analytics candidate with experience at WPP and on a global Coca-Cola campaign, focused on turning messy multi-platform media data into trusted reporting and decision systems. They combine hands-on SQL/Python pipeline building with stakeholder KPI alignment, and cite a 22% improvement in media effectiveness plus faster budget reallocation through daily automated reporting.”
Mid-level Marketing Analytics & Performance Marketing Analyst specializing in paid media and attribution
“Performance creative/growth marketer with hands-on experience running full-funnel paid social for e-commerce and other brands, focused on combating creative fatigue and scaling efficiently. Uses structured A/B testing and modular creative systems across Meta, TikTok, and YouTube; recently delivered a 22% CPA reduction and 28% ROAS lift by shifting to problem-solution and social-proof storytelling.”
Principal Data Scientist & Software Engineer specializing in space mission data systems
“Space/heliophysics ML engineer who built a PyTorch GRU model to propagate solar wind from L1 to the magnetopause with probabilistic outputs for uncertainty quantification, achieving ~25% better CRPS than standard approaches. Also developed production-grade Python ETL and an open-source telemetry processing package for a mission (LEXI), using Docker and GitHub Actions CI/CD and iterating with scientist/engineer stakeholders.”
Mid-level Data Engineer specializing in scalable ETL, streaming analytics, and cloud data platforms
“At Dreamline AI, built and productionized an AWS-based incentive intelligence platform that uses Llama-2/GPT-4 to extract eligibility rules from unstructured state policy documents into structured JSON, then processes them with Glue/PySpark and serves results via Lambda/SageMaker/API Gateway. Designed state-specific ingestion connectors plus schema validation and automated checks/alerts to handle frequent policy/format changes without breaking the pipeline, and partnered with business/analytics stakeholders to deliver interpretable eligibility decisions via explanations and dashboards.”
Intern Data Scientist specializing in healthcare AI and experimentation
“Human-AI Design Lab practitioner who productionized a wearable-health anomaly detection system by evolving a standalone autoencoder into a hybrid autoencoder + GPT-based approach, backed by PySpark ETL and MLOps on AWS SageMaker/MLflow. Also has applied LLM troubleshooting experience (fine-tuned FLAN-T5 summarization) and partnered with BI teams to run A/B tests and improve retention via feature stores and experimentation.”
Senior Data Engineer specializing in cloud-native data platforms for finance and healthcare
“Data engineer/backend data services practitioner with Bank of America experience building real-time and batch transaction-monitoring pipelines and APIs (Kafka + databases, REST/GraphQL). Highlights include a reported 45% response-time improvement through performance optimizations and use of Delta Lake schema evolution plus CI/CD (GitHub Actions/Jenkins) and operational reliability patterns like CloudWatch monitoring and dead-letter queues.”
Mid-level Data Scientist specializing in ML, MLOps, and customer analytics
“ML/NLP practitioner focused on insurance/claims analytics for a large financial firm, working with millions of fragmented structured and unstructured records. Built production-grade pipelines for entity extraction, entity resolution, and semantic search using Sentence-BERT + vector DB, including fine-tuning with contrastive learning (reported ~15% recall lift) and scalable ETL/containerized deployment on Kubernetes.”
Mid-level Data Scientist specializing in NLP, LLMs, and RAG systems
“Built and deployed a production-style vision-language pipeline that generates structured medical reports from chest X-rays using BioViLT embeddings, an image-text alignment module, and BiGPT fine-tuned with LoRA, delivered via Streamlit and hosted on AWS EC2. Also collaborating experience presenting EDA findings, feature importance, and model performance to Ford managers while working with vehicle parts data at Bimcon.”
Senior Data Engineer specializing in cloud data platforms and big data pipelines
“Data engineer focused on building reliable, production-grade pipelines and external data collection systems on AWS (S3/Lambda/SQS/Glue/EMR) using PySpark/SQL, serving curated datasets to Snowflake/Redshift for finance and fraud teams. Has operated a large-scale crawler ingesting millions of records/day with anti-bot tactics, schema versioning/quarantine, and CloudWatch/Datadog monitoring, and also shipped a versioned REST API with caching and query optimization.”
Mid-level Data Engineer specializing in cloud ETL/ELT and big data pipelines
“Data engineer focused on production-grade pipelines and data services: ingests millions of records/day into S3, performs SQL/Python quality validation and PySpark/SQL transformations, and serves curated datasets via Athena/Redshift. Has experience hardening external data collection with retries/rate-limit handling and shipping versioned internal data APIs with backward compatibility, monitoring, and CI/CD in early-stage environments.”
Mid-level Data Engineer specializing in cloud lakehouse, streaming, and MLOps
“Data engineer at AT&T focused on large-scale telecom (5G/IoT) data platforms, owning end-to-end pipelines from Kafka/Azure ingestion through Databricks/Delta Lake transformations to serving analytics and ML. Has operated at very high volumes (~50+ TB/day) and delivered measurable performance gains (25–30% faster processing) plus improved reliability via Airflow monitoring, robust data quality checks, and resilient external data collection patterns (rate limiting, retries, dynamic schemas).”
Mid-level Data Engineer specializing in cloud data platforms and streaming pipelines
“Data engineer currently at American Airlines who built and owned end-to-end flight operations and booking data pipelines (batch + real-time) using Azure Data Factory, Kafka, Spark/Databricks, Synapse, and Snowflake—processing hundreds of GBs/day. Strong focus on reliability and data quality (idempotency, checkpointing, retries, validation/alerts) and delivered near-real-time analytics powering Power BI dashboards; previously helped stand up an early-stage data platform at Sysco on AWS (Glue/S3/Redshift) with Airflow and Jenkins CI/CD.”