Pre-screened and vetted.
Mid-level Data Engineer specializing in cloud data pipelines and streaming
“Data engineer with experience at Wells Fargo and Accenture owning end-to-end production pipelines processing hundreds of millions of transactional/risk records daily. Strong focus on data quality and reliability (reconciliation checks, schema drift detection, CloudWatch alerting) plus Spark performance tuning and idempotent backfills using Delta Lake/merge logic across AWS (S3/EMR/Databricks/Redshift) and Azure (ADF/Azure DevOps/Azure Monitor).”
Mid-level Data Engineer specializing in AWS/Azure pipelines and streaming analytics
“Data engineer with experience across healthcare and geospatial risk systems, owning end-to-end pipelines from ingestion through serving on AWS/Azure stacks. Built HIPAA-compliant data quality gates and CDC for millions of daily claims, and also delivered a real-time wildfire risk platform with 20-minute refresh cycles and a 60% data accuracy lift. Strong in streaming (Kafka), Spark performance tuning, and production-grade orchestration/CI/CD (Airflow, Docker, Jenkins, GitHub Actions, Terraform).”
Senior Data Engineer specializing in cloud data platforms and automated data quality
“Data engineer at CenterPoint Energy who built and operated multiple production-grade GCP data systems: a daily Snowflake→BigQuery replication framework (150+ tables) with Monte Carlo/Atlan-driven observability and schema-drift protection, plus a FastAPI metrics service for pipeline health. Demonstrated measurable impact (40% faster dashboard queries, 70% less manual refresh work, zero data loss) and strong operational rigor (scaling Cloud Run jobs, SAP SLT reconciliation, quarantine patterns, CI/CD via GitHub Actions + Terraform).”
Mid-level Data Engineer specializing in cloud ETL and financial data platforms
“Data engineer with experience at Capital One and HSBC building and operating GCP-based data platforms. Led an end-to-end Oracle-to-BigQuery migration processing ~200–300GB/day using Dataflow/Beam, Airflow, Dataproc/PySpark, and Looker, achieving ~99.5% pipeline success and ~30% fewer data quality issues. Strong in production reliability, schema drift handling for external APIs, and BigQuery performance/serving patterns (materialized views, authorized views, versioned datasets).”
Mid-level Applied AI & Data Engineer specializing in automation and enterprise analytics
“Backend engineer with experience evolving a high-volume agricultural loan processing platform (APMS) at HDFC Bank, emphasizing transactional integrity, auditability, and modularity while integrating with credit bureaus, document management, and risk engines. Also improved automation/reporting robustness at Trend Micro by catching duplicate-event retry edge cases and adding idempotency safeguards.”
Mid-level Data Engineer specializing in cloud ETL/ELT and healthcare analytics
“Healthcare-focused data engineer/ML practitioner with experience at Lightbeam Health Solutions and Humana building production entity-resolution and semantic similarity pipelines across EMR, lab, and claims data. Uses NLP/ML (spaCy, scikit-learn, BioBERT/LightGBM) plus Snowflake/Airflow and vector search (Pinecone) to improve linkage accuracy (reported 90%) and semantic match quality (reported +12–15%), while reducing manual cleanup by 40%+.”
Senior Data & Backend Engineer specializing in cloud data pipelines and LLM/RAG systems
“Data engineer with end-to-end ownership of large-scale retail and clinical data ingestion/processing on AWS, including real-time streaming and batch pipelines. Delivered measurable outcomes: 20M daily transactions processed, latency cut from 4 hours to 5 minutes, ~70% fewer failures, and 120+ pipelines running at 99.8% reliability with full audit compliance.”
Junior Data Engineer specializing in Snowflake and investment data platforms
“Private markets/private credit data engineer owning core Snowflake/AWS data infrastructure (S3 → ActiveBatch → Snowflake) with automated iceDQ quality checks and curated datasets for internal Power BI/React reporting. Drove major reliability and delivery improvements, including cutting DB CI/CD deploy time 50% and reducing downstream table errors by 90%+, and also built an internal React/FastAPI app to visualize the team’s data infrastructure in an ambiguous early-stage environment.”
Mid-level Data Engineer specializing in big data pipelines and real-time streaming
“Data engineer who has owned end-to-end production pipelines processing a few million records/day, using Python/Airflow/SQL/PySpark with Snowflake serving to BI (Power BI). Built resilient external web data collection systems (anti-bot, schema-change detection, backfills) and shipped versioned REST APIs for internal consumers, improving pipeline success rates to 99% through monitoring, retries, and idempotent design.”
Mid-Level Data Engineer specializing in cloud data platforms and governed analytics
“Data engineer with Optum experience building end-to-end healthcare data pipelines for HL7/FHIR, processing millions of records daily across Kafka streaming and Databricks/Spark batch. Strong focus on data quality (schema enforcement/validations), reliability (Airflow monitoring/alerts), and analytics-ready serving in Snowflake powering Power BI/Tableau, with CI/CD via Git and Jenkins.”
Mid-level Cloud Data Engineer specializing in Azure/AWS pipelines and medallion architecture
“Data engineer focused on reliability and data quality, owning end-to-end pipelines processing ~100k–300k records/day. Implemented robust validation and monitoring that cut reporting issues by ~30%, and built stable external data collection with anti-bot measures, backfills, and schema-change detection while maintaining backward-compatible internal data services.”
Junior Data Scientist / Big Data Engineer specializing in ML, LLMs, and analytics platforms
“Backend/data platform engineer who led a major redesign of a hybrid streaming+batch analytics platform processing 10+ TB/day (Airflow/Hive/BigQuery) with strong data-quality automation. Also built a production RAG PDF assistant with concrete mitigations for hallucinations and prompt injection (re-ranking, grounding, verifier step) and has deep experience executing low-risk migrations (dual-write, blue-green, rapid rollback) and implementing JWT-based row-level security.”
Mid-level Data & AI Engineer specializing in healthcare data pipelines and MLOps
“Built and deployed a production LLM-powered clinical note summarization system used by care managers to speed review of 5–20 page unstructured medical records. Implemented safety-focused validation (prompt constraints, rule-based and section-level checks, human-in-the-loop) to reduce hallucinations while maintaining low latency and meeting privacy/regulatory constraints, integrating via APIs into existing clinical tools.”
Mid-level Data Engineer specializing in scalable ETL, streaming analytics, and cloud data platforms
“At Dreamline AI, built and productionized an AWS-based incentive intelligence platform that uses Llama-2/GPT-4 to extract eligibility rules from unstructured state policy documents into structured JSON, then processes them with Glue/PySpark and serves results via Lambda/SageMaker/API Gateway. Designed state-specific ingestion connectors plus schema validation and automated checks/alerts to handle frequent policy/format changes without breaking the pipeline, and partnered with business/analytics stakeholders to deliver interpretable eligibility decisions via explanations and dashboards.”
Mid-level AI/ML Engineer specializing in MLOps and LLM-powered applications
“AI/ML engineer with production experience building a RAG-based internal analytics assistant (Databricks + ADF ingestion, Pinecone vector store, LangChain orchestration) deployed via Docker on AWS SageMaker with CI/CD and MLflow. Strong focus on real-world constraints—latency/cost optimization (LoRA ~60% compute reduction), hallucination control with citation grounding, and enterprise security/governance. Previously at Intuit, delivered an interpretable churn prediction system (PySpark/Databricks, Airflow/Azure ML) that improved retention targeting ~12%.”
Mid-level AI/ML Engineer specializing in GenAI agents, RAG pipelines, and MLOps
“AI/ML engineer who built a production RAG-based internal document intelligence assistant (LangChain + Pinecone) to let employees query enterprise reports in natural language. Demonstrated hands-on pipeline orchestration with Apache Airflow and tackled real production issues like retrieval grounding and latency using tuning, caching, and token optimization, while partnering closely with non-technical business stakeholders through iterative demos.”
Senior Data Engineer specializing in cloud-native data platforms for finance and healthcare
“Data engineer/backend data services practitioner with Bank of America experience building real-time and batch transaction-monitoring pipelines and APIs (Kafka + databases, REST/GraphQL). Highlights include a reported 45% response-time improvement through performance optimizations and use of Delta Lake schema evolution plus CI/CD (GitHub Actions/Jenkins) and operational reliability patterns like CloudWatch monitoring and dead-letter queues.”
Senior Data Engineer specializing in cloud data platforms and big data pipelines
“Data engineer focused on building reliable, production-grade pipelines and external data collection systems on AWS (S3/Lambda/SQS/Glue/EMR) using PySpark/SQL, serving curated datasets to Snowflake/Redshift for finance and fraud teams. Has operated a large-scale crawler ingesting millions of records/day with anti-bot tactics, schema versioning/quarantine, and CloudWatch/Datadog monitoring, and also shipped a versioned REST API with caching and query optimization.”
Mid-level Data Engineer specializing in cloud ETL/ELT and big data pipelines
“Data engineer focused on production-grade pipelines and data services: ingests millions of records/day into S3, performs SQL/Python quality validation and PySpark/SQL transformations, and serves curated datasets via Athena/Redshift. Has experience hardening external data collection with retries/rate-limit handling and shipping versioned internal data APIs with backward compatibility, monitoring, and CI/CD in early-stage environments.”
Mid-level ML Data Engineer specializing in MLOps and scalable healthcare data pipelines
“Data/ML platform engineer with healthcare (Cigna) experience owning an end-to-end pipeline spanning Airflow + Debezium CDC ingestion, PySpark/SQL transformations, rigorous data quality gates, and feature-store/API serving for ML training and inference. Worked at 10+ TB scale and cites a ~30% latency reduction plus stronger reliability via idempotent design, monitoring, and backfill-safe reprocessing; also built pragmatic early-stage data pipelines at Frankenbuild Ventures.”
Mid-level Data Engineer specializing in cloud lakehouse, streaming, and MLOps
“Data engineer at AT&T focused on large-scale telecom (5G/IoT) data platforms, owning end-to-end pipelines from Kafka/Azure ingestion through Databricks/Delta Lake transformations to serving analytics and ML. Has operated at very high volumes (~50+ TB/day) and delivered measurable performance gains (25–30% faster processing) plus improved reliability via Airflow monitoring, robust data quality checks, and resilient external data collection patterns (rate limiting, retries, dynamic schemas).”
Mid-level Data Engineer specializing in cloud data platforms and streaming pipelines
“Data engineer currently at American Airlines who built and owned end-to-end flight operations and booking data pipelines (batch + real-time) using Azure Data Factory, Kafka, Spark/Databricks, Synapse, and Snowflake—processing hundreds of GBs/day. Strong focus on reliability and data quality (idempotency, checkpointing, retries, validation/alerts) and delivered near-real-time analytics powering Power BI dashboards; previously helped stand up an early-stage data platform at Sysco on AWS (Glue/S3/Redshift) with Airflow and Jenkins CI/CD.”
Senior AI/ML & Data Engineer specializing in Generative AI and RAG systems
“GenAI/RAG engineer who has deployed a production policy/regulatory search assistant for a financial client using LangChain + Vertex AI, FastAPI, Docker/Kubernetes, and Airflow-orchestrated data pipelines. Demonstrated measurable impact with 50–60% latency reduction and 70% fewer pipeline failures, plus KPI-driven grounding evaluation (90%+ target) and strong cross-functional collaboration with compliance/business teams.”