Pre-screened and vetted.
Mid-level Data Scientist & Generative AI Engineer specializing in LLMs and RAG
“ML/NLP practitioner who built a retrieval-augmented generation (RAG) system for large financial and operational document sets using Sentence-Transformers (all-mpnet-base-v2) and a vector DB (e.g., Pinecone), with a strong focus on retrieval evaluation and chunking strategy optimization. Experienced in entity resolution (rules + embedding similarity with type-specific thresholds) and in productionizing scalable Python data workflows using Airflow/Dagster and Spark.”
Mid-level AI/ML Engineer specializing in LLMs, GenAI, and NLP
“AI/ML Engineer who built a production RAG-based LLM system for insurance policy documents, turning thousands of messy PDFs into a searchable index using LangChain, Azure AI Search vectors, hybrid retrieval, and FastAPI. Strong focus on evaluation (MRR/precision@k/recall@k, REGAS) and performance optimization (vLLM), with prior clinical NLP experience using BERT-based NER validated on ground-truth datasets.”
Senior Software Engineer specializing in data pipelines and legal data systems
“Data/analytics engineer who owned Angi’s service-request funnel event pipeline end-to-end, routing events server-side to bypass ad blockers and recovering ~15% lost tracking at millions of events/day. Built Snowflake/dbt reporting tables powering Looker dashboards, with strong emphasis on validation, monitoring/alerting, and safe schema evolution. Also shipped a reusable flow state management backend service with TTL storage, CI/CD, and developer-friendly APIs.”
Mid-level Data Engineer specializing in AWS lakehouse platforms and scalable ETL/ELT
“Data engineer focused on reliable, production-grade pipelines and data services: has owned end-to-end ingestion-to-serving workflows processing millions of records/day, using Airflow, Python/SQL, and PySpark. Demonstrates strong operational rigor (monitoring, retries, idempotency, backfills) and measurable outcomes (98% stability, ~30% faster processing), plus experience exposing curated warehouse data via versioned REST APIs.”
Mid-level Data Scientist specializing in ML, NLP, and Generative AI
“Data engineering / ML practitioner with experience at MetLife building transformer-based sentiment analysis over large unstructured datasets and productionizing pipelines with Airflow/PySpark/Hadoop (reported 52% efficiency gain). Also implemented embedding-based semantic search using Pinecone/Weaviate to improve retrieval relevance and enable RAG for customer support and document matching use cases.”
Mid-level Software Engineer specializing in backend microservices and cloud data pipelines
“Backend engineer with Morgan Stanley experience building and owning an end-to-end Python FastAPI microservice for high-volume market data used by trading and risk systems. Strong in performance tuning and reliability (PySpark, Redis caching, async APIs), real-time streaming with Kafka, and production operations (Docker/Kubernetes, GitOps-style CI/CD, monitoring). Has led cloud/on-prem migration work across AWS and Azure, including fixing Azure Synapse performance issues via query and pipeline redesign.”
Director-level Data Science & Analytics Leader specializing in cloud data platforms and AI/ML
“Candidate states they are very familiar with the venture capital/studio/accelerator landscape and expresses strong willingness to pursue entrepreneurship "at all costs," but did not provide details on a current startup, business plan, fundraising, or prior accelerator/VC involvement during the interview.”
Mid-level Data Engineer specializing in cloud data pipelines for healthcare and financial services
“Data engineer with ~4 years of experience (Cigna) building and operating Azure Data Factory pipelines for healthcare claims/member/provider data at 2–3M records/day. Emphasizes reliability and downstream safety via schema/data-quality validation, quarantine workflows, idempotent processing, and backfills; also improved runtime ~20% through SQL optimization and served curated datasets through versioned views and well-documented, analyst-friendly interfaces.”
Mid-level Data Engineer specializing in multi-cloud data platforms for healthcare and finance
“Data engineer with Cigna experience building and operating an end-to-end AWS-based healthcare claims pipeline processing ~2TB/day, using Glue/Kafka/PySpark/SQL into Redshift. Strong focus on data quality and reliability (schema validation, monitoring/alerting, retries/checkpointing/backfills), reporting improved accuracy (~99%) and reduced latency, plus experience serving real-time Kafka/Spark data to downstream analytics with documented data contracts.”
Mid-level AI/ML Engineer specializing in FinTech risk, fraud detection, and GenAI/RAG systems
“Built and productionized Azure-based LLM/RAG systems for regulatory/compliance use cases, including automating analyst research and compliance report generation across large unstructured document sets. Demonstrates strong practical depth in hallucination mitigation, hybrid retrieval tuning (BM25 + embeddings), and production MLOps (Databricks, Cognitive Search, AKS, Airflow/MLflow), plus proven ability to deliver auditable, explainable solutions with non-technical compliance teams.”
Mid-level AI/ML Engineer specializing in cloud data engineering and GenAI
“AI/LLM engineer with production experience in legal tech: built a GPT-4 + LangChain RAG summarization system at Govpanel that reduced legal case-file review time by 50%+. Previously at LexisNexis, orchestrated end-to-end Airflow data/AI pipelines processing 5M+ legal documents daily, improving ETL runtime by 35% with robust validation, monitoring, and SLAs.”
Senior Data Engineer specializing in data infrastructure and marketing/CRM analytics
“Salesforce-focused implementation/solutions engineer from Full Circle Insights who owned end-to-end campaign attribution and reporting deployments for multiple customers at once (3–5 concurrently), including sandbox testing, KPI monitoring, and rollback-safe migrations from legacy reporting. Also builds personal multi-agent workflows and uses Claude Code to rapidly scaffold data/analytics scripts like an advertising optimization parser over CSV/XLSX inputs.”
Junior Machine Learning Engineer specializing in Generative AI and analytics automation
“AI/LLM engineer who built a production intelligent support system using RAG over a vectorized documentation library, addressing real-world issues like lost-in-the-middle context failures and doc freshness via automated GitHub-driven re-embedding pipelines. Emphasizes rigorous agent evaluation (component/E2E/ops) and prefers lightweight, decoupled workflow automation using message brokers (Redis/RabbitMQ) over heavyweight orchestration frameworks.”
Mid-level GenAI Engineer specializing in LLM fine-tuning, RAG, and MLOps
“Healthcare-focused LLM engineer who deployed a production triage and clinical knowledge retrieval assistant using RAG and LangGraph-orchestrated multi-agent workflows. Emphasizes clinical safety and compliance with robust hallucination controls, HIPAA/PHI protections (tokenization, encryption, audit logging, zero-retention), and human-in-the-loop escalation; reports a 75% latency reduction in a healthcare agent system.”
Senior AI/ML Engineer specializing in Generative AI, LLMs, and MLOps
“Telecom (Verizon) AI/ML practitioner who built a production multimodal system that ingests messy customer issue reports (calls, chats, emails, screenshots, videos) and turns them into confidence-scored incident summaries with reproducible steps and evidence links. Also built KPI/alarm-to-ticket correlation to rank likely root-cause domains (RAN/Core/Transport), cutting triage from hours to minutes and improving MTTR.”
Mid-level AI/ML Engineer specializing in Generative AI and data engineering
“IBM engineer who built and deployed a production RAG-based LLM assistant using LangChain/FAISS with a fine-tuned LLaMA model, served via FastAPI microservices on Kubernetes, achieving 99%+ uptime. Demonstrates strong practical expertise in reducing hallucinations (semantic chunking + metadata-driven retrieval) and managing latency, plus mature MLOps practices (Airflow/dbt pipelines, MLflow tracking, monitoring, A/B and shadow deployments) and effective collaboration with non-technical stakeholders.”
Mid-level AI Engineer specializing in LLM orchestration, RAG, and multi-agent systems
“Research Assistant at the University of Houston who built and live-deployed a production RAG system for 1000+ research documents, using hybrid retrieval (dense+BM25+RRF) with cross-encoder reranking and RAGAS-based evaluation; reported 66% MRR, 0.85+ faithfulness, and 68% lower LLM inference costs. Also built a deployed LangGraph multi-agent research system (Researcher/Critic/Writer) with tool integrations (Tavily, arXiv) and dual memory (ChromaDB + Neo4j), plus freelance automation work delivering a WhatsApp chatbot and n8n workflows for a wholesale clothing business.”
Mid-level Data Engineer specializing in cloud ETL/ELT and lakehouse architecture
“Data engineer focused on sales/marketing analytics pipelines, owning ingestion from CRMs/ad platforms through warehouse serving and dashboards at ~hundreds of thousands of records/day. Built reliability-focused systems including dbt/SQL/Python data quality gates with alerting, a resilient web-scraping pipeline (retries/backoff, anti-bot tactics, schema-change detection, backfills), and a versioned internal REST API with caching and strong developer usability.”
Mid-level Data Engineer specializing in real-time streaming and cloud data platforms
“Data engineer with Wells Fargo experience owning an end-to-end lakehouse ETL pipeline on Databricks/Azure Data Factory, processing ~480GB daily and implementing robust data quality/reconciliation across 40+ tables to reach ~99.3% reliability. Strong in performance optimization (cut runtime 5.5h→3.8h), CI/CD and monitoring, and resilient external/API ingestion with retries, schema validation, and backfills.”
Intern Data Scientist specializing in ML engineering and LLM agentic workflows
“Built an agentic, multi-step LLM system that generates full-stack code for API integrations using LangChain orchestration, Pinecone/SentenceBERT RAG, and a human-in-the-loop feedback loop for iterative code refinement. Also collaborated with non-technical content writers and PMs during a Contentstack internship to deliver a Slack-based AI workflow that generates and brand-checks articles with one-click approvals.”
Mid-level Data Analyst specializing in business intelligence and cloud data platforms
“Healthcare analytics professional with TCS/Humana experience turning messy claims and eligibility data into reliable reporting assets using SQL and Python. They combine strong data engineering and analytics execution with stakeholder management, including automating monthly claims reporting from half a day to under 5 minutes and driving a provider outreach effort that reduced claim rejection rates by about 20%.”
Mid-level Data Analyst specializing in cloud ETL, BI, and machine learning
“Data/ML practitioner with experience at UnitedHealth Group building a fraud claims detection solution combining structured claims data and unstructured notes, validated with compliance stakeholders to improve actionable accuracy. Also applied embeddings, vector databases, and fine-tuned language models in a Bank of America capstone to detect threats/anomalies in financial documents, with production-minded Python ETL workflows using Airflow.”
Mid-level AI Engineer specializing in healthcare claims analytics and RAG copilots
“Built a production "appeals co-pilot" for a healthcare claims appeals team, combining an XGBoost/logistic ranking model with a Python/LangChain RAG stack (FAISS + Mistral 7B) to surface high-probability appeal wins and speed policy-grounded drafting. Emphasizes reliability and trust: hybrid retrieval with metadata routing, citation/eval scripts, guardrails, and an explainability layer that non-technical stakeholders could understand and override.”
Mid-level Data Engineer specializing in cloud lakehouse/warehouse pipelines
“Data engineer with HCA Healthcare experience building and operating end-to-end AWS-based pipelines for clinical and operational reporting (50–100 GB/day), serving curated data into Redshift/Snowflake for Power BI/Tableau. Emphasizes production reliability (Airflow SLAs/retries/alerting, logging/observability) and strong data quality controls (reconciliations, schema/null/duplicate checks), and has shipped versioned REST APIs to expose warehouse data to downstream systems.”