Browse Talent Find Talent Open Jobs Pricing FAQsGet Started

Vetted Data Engineers

Pre-screened and vetted.

Python SQL ETL CI/CD Amazon S3 AWS

Bay Area DFW Metroplex NYC Metro Remote Chicago Metro Greater Boston Greater Seattle Los Angeles Metro Austin Metro DMV

Bhavya Sree Ganja

Screened

Senior Data Engineer specializing in cloud lakehouse platforms and streaming analytics

Pittsburgh, PA8y exp

First National BankTexas A&M University-Corpus Christi

“Data engineer focused on fraud and banking analytics who has owned end-to-end batch + streaming pipelines at very large scale (hundreds of millions of records/day). Built robust data quality/observability layers (schema validation, anomaly detection, alerting) and delivered low-latency serving via AWS Lambda/API Gateway with DynamoDB + Redis, plus external data ingestion/scraping pipelines orchestrated in Airflow with anti-bot protections.”

Agile Amazon API Gateway Amazon Athena Amazon CloudWatch Amazon DynamoDB Amazon EC2+210

View profile

Ajay Madhusudhan Thumala

Screened

Junior Software Engineer specializing in data engineering and LLM applications

Irvine, CA1y exp

GeisingerUC Irvine

“Computer science engineer and master’s graduate who independently built a mechatronics-heavy capstone prototype: a smartphone concept for deafblind users using micro-actuator arrays for braille reading. Also has platform engineering experience at Quantiphi, deploying webhooks to Kubernetes and implementing GitOps-based CI/CD using AWS CodeCommit/CodeBuild and ECR.”

API Development API Gateway AWS Bash C C+++206

View profile

Bhanu Prakash Reddy Dakilli

Screened

Mid-level Data Engineer specializing in Azure ETL/ELT and data warehousing

Framingham, MA4y exp

Bank of AmericaNew England College

“Data engineer who has owned end-to-end production pipelines for customer transaction data (~2–5 GB/day) using Python/PySpark/SQL and Airflow, delivering major reliability and speed gains (70% faster reporting; 60–70% fewer data issues). Also built a daily external web-scraping system with anti-bot handling and safe, idempotent Airflow-driven backfills, plus a Python data API optimized with indexing/caching and tested for correctness.”

Python SQL PySpark Apache Spark Java Power BI+97

View profile

Ruthvik Bacha

Screened

Mid-level Data Engineer specializing in financial data pipelines and reliability

North Carolina, USA7y exp

Wells FargoUniversity of South Florida

“Systems/robotics-oriented software engineer focused on real-time orchestration and reliability: built a central control layer coordinating multiple concurrent agents with safe state machines, failure isolation, and recovery. Has hands-on ROS/ROS 2 integration experience in simulation (DDS/QoS, lifecycle, nodes in Python/C++) and emphasizes observability (structured JSON logs, correlation IDs) and low-latency control-loop performance under load.”

Python Distributed systems State management Docker Containerization Debugging+85

View profile

Deepthi Mundarinti

Screened

Mid-level Data Engineer specializing in real-time analytics and regulated domains

NC, USA5y exp

JPMorgan ChaseSaint Louis University

“Data platform engineer focused on large-scale, real-time fraud systems, with hands-on ownership of streaming architectures using Kafka, Spark, Snowflake, and Databricks. Stands out for combining performance tuning and platform automation with LLM/RAG-based enrichment, delivering measurable gains in latency, fraud accuracy, false positives, and analyst decision speed.”

Python NumPy Pandas PySpark Scikit-learn TensorFlow+120

View profile

Sathyavarthan Balachandar

Screened

Mid-level Data Engineer specializing in scalable pipelines, Spark, and cloud data warehousing

Boston, USA3y exp

Fidelity InvestmentsNortheastern University

“Backend/data platform engineer who recently owned an end-to-end large-scale financial data platform delivering real-time decision support for finance and operations. Has hands-on experience modernizing legacy batch pipelines into AWS cloud-native ELT with parallel-run cutovers, strong data quality controls (dbt-style tests, reconciliation), and measurable improvements in runtime, cost, and SLA compliance. Also builds scalable, secure FastAPI microservices using Docker, ALB-based horizontal scaling, Redis caching, and managed auth with Cognito/Supabase plus Postgres RLS.”

Python SQL Go Apache Spark PySpark Databricks+125

View profile

Bhuvan Chandi

Screened

Mid-level Data Engineer specializing in AI/ML data platforms

NY, NY6y exp

BlackRockWebster University

“Built and productionized an LLM-powered PDF document Q&A system to eliminate manual searching through long documents, focusing on scalability and answer reliability. Implemented semantic chunking (using headings/paragraphs/tables), overlap, and preprocessing/quality checks to reduce hallucinations, and orchestrated the end-to-end pipeline with Airflow using retries, alerts, and parallel tasks.”

Python SQL Shell Scripting Apache Spark PySpark Apache Hadoop+103

View profile

Shanmukh Sai Madhu

Screened

Mid-level Data Engineer specializing in real-time pipelines and cloud analytics

Chicago, IL5y exp

JPMorgan ChaseUniversity of South Dakota

“Researcher from the University of South Dakota who built a production medical RAG system to help interpret model predictions by retrieving relevant clinical notes and medical literature, overcoming retrieval accuracy and imaging-dataset challenges through semantic chunking and metadata-driven indexing. Also has hands-on orchestration experience with Airflow and Azure Data Factory, plus a pragmatic approach to LLM evaluation and stakeholder-driven iteration.”

Agile Amazon EMR Apache Airflow Apache Kafka Apache Spark AWS+122

View profile

Devender Kunta

Screened

Senior Data Engineer specializing in Azure Lakehouse, Databricks/Spark, and Snowflake

Richardson, TX6y exp

PwCUniversity of Central Missouri

“Data engineer/platform builder with experience across PwC and Liberty Mutual delivering high-volume, production-grade pipelines and real-time data services. Has owned end-to-end streaming + batch architectures on AWS and Azure, including web scraping systems, with quantified reliability gains (99.9% availability, 90%+ error reduction, 30% latency reduction) and strong observability/CI-CD practices.”

AWS Databricks Apache Spark PySpark Scala Python+109

View profile

Deepthi Mundarinti

Screened

Mid-level Generative AI Engineer specializing in decision intelligence and RAG for regulated enterprises

5y exp

JPMorgan ChaseSaint Louis University

“Healthcare GenAI engineer who built a HIPAA-compliant, auditable RAG-based claims decision support system at Molina Healthcare, processing 3M claims and delivering major impact (48% faster manual reviews, 43% higher decision accuracy). Deep hands-on experience with LangChain orchestration, vector search (ChromaDB/FAISS), embedding fine-tuning, and safety controls (confidence scoring, rule validation, human-in-the-loop escalation) for clinical workflows.”

Generative AI GPT-4 OpenAI API Prompt Engineering Retrieval-Augmented Generation (RAG)Machine Learning+96

View profile

Bhavyasree Chinthala

Screened

Mid-level Data Engineer specializing in cloud data pipelines and real-time streaming

USA, USA5y exp

PNCSaint Peter's University

“Data engineer with PNC Bank experience owning high-volume financial transaction pipelines end-to-end (Kafka/REST ingestion through Spark/Glue transformations to Redshift serving) for risk and fraud analytics. Built strong reliability and data quality practices (Great Expectations, reconciliation, Airflow alerting, idempotent retries, incremental/windowed processing), reporting 40% ingestion efficiency gains and ~99.9% data accuracy.”

Python SQL Apache Spark PySpark Apache Kafka Apache Airflow+72

View profile

Vaibhav Sharma

Screened

Mid-level Software Engineer specializing in AI/ML and data platforms

Remote, USA5y exp

GoogleIndiana University Bloomington

“AI/ML engineer who built a production agentic system to automate computational research experiments (simulation execution, parameter exploration, and numerical analysis) and mitigated context-window failures using constrained tool-calling/prompt-chaining patterns in LangChain with OpenAI tool-enabled models. Also has adtech/big-data pipeline experience at InMobi, orchestrating Spark jobs in Airflow to filter bot-like user IDs and publish clean IDs to an online NoSQL store for live serving, plus Apache open-source collaboration experience.”

A/B Testing Apache Airflow Apache Hadoop Apache Hive Apache Kafka Apache Spark+100

View profile

Yinghai Yu

Screened

Mid-level Data Engineer specializing in cloud data platforms and AI/ML pipelines

San Mateo, CA6y exp

Bubbles and BooksGeorgia Tech

“Data-engineering-oriented candidate with hands-on experience building an agentic AI product and operational automation workflows. They described automating inventory-to-ERP discrepancy reconciliation with anomaly detection and daily reporting, and also have practical scraping/automation experience dealing with Cloudflare-protected sites using Selenium and Puppeteer.”

Python Pandas NumPy Scikit-learn Scala Java+87

View profile

Shruti Pangare

Screened

Junior AI/ML Software Engineer specializing in LLMs and data-intensive systems

New York, NY3y exp

NYU Langone HealthNYU

“AI/backend engineer who has owned production applied-ML systems end to end, including a Jitsi meeting intelligence platform with custom RoBERTa boundary detection, LLM summarization, and automated retraining from user feedback. Also has healthcare AI experience building a diabetes medication titration system with strict validation, drift monitoring, and safety guardrails—showing both product speed and high-stakes engineering rigor.”

Python SQL PL/SQL R Pandas NumPy+147

View profile

Prem Kumar

Screened

Senior Data Engineer specializing in cloud data platforms and regulated analytics

McLean, VA6y exp

Capital OneRowan University

“Data engineer at Capital One building AWS-based real-time and batch pipelines and backend data services for financial/fraud use cases. Has owned end-to-end pipelines processing millions of records/day, implemented dbt/Great Expectations quality gates, and tuned Redshift/Snowflake workloads (cutting query latency ~22–25% and reducing pipeline failures ~30–40%) while supporting 15+ downstream consumers.”

Python SQL PySpark Scala Java Bash+152

View profile

Snehitha Borra

Screened

Mid-level Data Engineer specializing in cloud data platforms and big data pipelines

5y exp

Molina HealthcareUniversity of Michigan-Dearborn

“Healthcare data engineer with hands-on ownership of claims/member data pipelines on a cloud analytics platform, spanning batch and streaming ingestion (Airflow/Kafka/Spark/Databricks) through serving for reporting. Emphasizes reliability and data quality via embedded validation, schema-drift detection, deduplication, and operational monitoring/incident response, plus pragmatic CI/CD and observability setup in early-stage/ambiguous projects.”

Python Scala SQL PySpark Shell Scripting PowerShell+128

View profile

amani mudili

Screened

Mid-level Data Engineer specializing in cloud ETL pipelines (Azure, AWS, GCP)

Mississauga, Canada4y exp

CitigroupWebster University

“Data engineer/backend developer who owned end-to-end pipelines and external data collection systems, including API ingestion and large-scale web scraping. Worked at ~50M records/month scale, improving processing speed by 20% and reducing reporting errors by 15%, and shipped a Rust-based internal data API with versioning, caching, and strong validation/observability practices.”

Amazon Kinesis Amazon Redshift Ansible Apache Airflow Artificial Intelligence Automation+91

View profile

Sushma Mangalampati

Screened

Mid-level Data Engineer specializing in lakehouse ETL and analytics engineering

Boston, MA6y exp

ServiceNowNortheastern University

“Data engineer with strong end-to-end ownership of production lakehouse pipelines (Snowflake + Databricks + Airflow + dbt + Great Expectations), handling 8M+ records/month and 500K+ daily CDC updates. Delivered measurable reliability and efficiency gains (41% cost reduction, freshness improved from 4h to 30m, 35% fewer downstream incidents) and has experience building a lakehouse platform from scratch across 12 source systems.”

Python SQL PySpark Apache Spark Stored Procedures ETL+89

View profile

Zhiwen Zhao

Screened

Junior Data Engineer specializing in cloud ETL and big data platforms

New York, NY3y exp

Bank of ChinaNYU

“Data engineer focused on transit/transportation datasets, building Spark-based pipelines that ingest from Oracle/APIs, apply PySpark data-quality fixes, and publish star-schema fact tables to Azure Data Lake. Experienced troubleshooting complex Spark failures (using checkpointing to manage long lineage) and operating Airflow-driven backfills and GitLab CI deployments for production DAGs.”

Python Java Scala R SQL C#+75

View profile