Vetted Data Engineers in the NYC Metro

Pre-screened and vetted in the NYC Metro.

Rom Manzano - Executive technical founder and full-stack engineer specializing in AI, SaaS, and FinTech in New York, NY

Rom Manzano

Screened

Executive technical founder and full-stack engineer specializing in AI, SaaS, and FinTech

New York, NY15y exp
1848VUC Berkeley

Engineer coming out of a venture studio as it winds down, now seeking another zero-to-one environment with strong studio support and go-to-market playbooks. They show a thoughtful founder mindset centered on rapid shipping, design-partner validation, lean execution, and testing whether users will actually pay for a workflow-specific solution.

View profile
TD

Thuc Duong

Screened

Senior Data Engineer specializing in AI-driven GTM analytics and LLM evaluation

Long Island City, NY5y exp
MetaTemple University

Data/analytics engineer who stood up foundational pipelines and services at Meta for the Ray-Ban Meta launch—building a retailer sales ingestion system (S3/Hive) with rigorous DQ checks, 1-day SLAs, and dimensional rollups used by GTM to track sales trends. Also built a modular multi-retailer web-scraping system for out-of-stock alerts and shipped internal GraphQL APIs and an n8n-like workflow builder using serverless (AWS Lambda) with strong testing and observability practices.

View profile
TZ

Mid-level Data Engineer specializing in big data platforms and analytics infrastructure

New York, NY7y exp
MetaUniversity of Illinois Chicago
View profile
JV

Staff-level Software Engineer specializing in AI, data platforms, and cloud infrastructure

New York, NY8y exp
GrowthLoopCarnegie Mellon University
View profile
SR

Sanketh Reddy

Screened

Senior Data Engineer specializing in cloud data platforms and large-scale ETL

Jersey City, NJ6y exp
JPMorgan ChaseUniversity of Texas at Dallas

Data engineer focused on large-scale ETL/ELT pipelines across cloud stacks (GCP and AWS), including Spark-based transformations and orchestration with Airflow. Has experience loading up to ~2TB per BigQuery target table and designing atomic loads to multiple downstream systems (Elasticsearch + Kafka), with Kubernetes deployment and Jenkins CI/CD.

View profile
SM

Mid-level AI/ML Engineer specializing in Generative AI and enterprise machine learning

New York, NY4y exp
BroadcomUniversity of Central Missouri
View profile
RR

Rahul Reddy

Screened

Senior Data Engineer specializing in cloud data platforms and big data pipelines

New York, NY6y exp
CVS HealthSouthern Arkansas University

Data engineer with healthcare (CVS Health) experience who migrated production PySpark workloads to native BigQuery SQL and built a Great Expectations-based validation microservice on GKE (Flask + REST) integrated into Cloud Composer. Has operated high-volume pipelines (~300–400GB/day) and designed external vendor ingestion on AWS (Lambda/Step Functions/Glue) with schema-drift detection, alerting, and backfill-safe controls to protect downstream Snowflake/BigQuery tables.

View profile
RH

Senior Data & AI/ML Engineer specializing in LLM/NLP platforms and cloud data engineering

Bronx, NY11y exp
CBRENYU
View profile
RP

Mid-level Data Engineer specializing in LLM agents, RAG pipelines, and LLMOps

New York, US6y exp
mcSquared AIUniversity at Buffalo
View profile
PP

Senior Data Engineer specializing in Cloud Data Platforms and Generative AI

Brooklyn, NY11y exp
JPMorgan ChaseOsmania University
View profile
SP

Junior AI/ML Software Engineer specializing in LLMs and data-intensive systems

New York, NY3y exp
NYU Langone HealthNYU
View profile
KP

Mid-level Data Engineer specializing in GCP, Spark, and healthcare analytics

New York, NY3y exp
CVS HealthColumbia University
View profile
SG

Mid-level Data Engineer specializing in streaming and cloud data platforms for financial services

Edison, NJ3y exp
Morgan StanleyPace University

Data engineering-focused candidate (internship/project experience) who built end-to-end pipelines processing a few million transactional records/day for fraud detection and reporting, using Airflow, Python/SQL, and PySpark with strong emphasis on data quality gates, idempotency, and monitoring. Also implemented an external web/API data collection system with anti-bot tactics and schema-change quarantine, and shipped a versioned Flask API to serve curated warehouse data.

View profile
Zhiwen Zhao - Junior Data Engineer specializing in cloud ETL and big data platforms in New York, NY

Zhiwen Zhao

Screened

Junior Data Engineer specializing in cloud ETL and big data platforms

New York, NY3y exp
Bank of ChinaNYU

Data engineer focused on transit/transportation datasets, building Spark-based pipelines that ingest from Oracle/APIs, apply PySpark data-quality fixes, and publish star-schema fact tables to Azure Data Lake. Experienced troubleshooting complex Spark failures (using checkpointing to manage long lineage) and operating Airflow-driven backfills and GitLab CI deployments for production DAGs.

View profile
SK

Mid-level Data Engineer specializing in cloud ETL and analytics

New York, NY7y exp
VerizonNew England College
View profile
NN

Senior Data Engineer specializing in cloud ELT/ETL and data warehousing

New York, NY4y exp
American ExpressWeber State University
View profile
TS

Mid-level Data Engineer specializing in lakehouse and cloud data platforms

New York, NY3y exp
Invisible TechnologiesRutgers University–New Brunswick
View profile
RR

Junior Data Scientist specializing in analytics automation and BI dashboards

Newark, NJ2y exp
Public Service Enterprise GroupBoston University
View profile
SR

Mid-level Data Engineer specializing in Azure data platforms and near real-time pipelines

New York, USA4y exp
ServiceNowUniversity of Missouri-Kansas City
View profile
CK

Mid-level Data Engineer specializing in financial data engineering and scalable pipelines

Jersey City, NJ4y exp
JPMorgan Chase
View profile
PV

Mid-level Machine Learning Engineer specializing in LLM agents, RAG, and MLOps

New York City, NY6y exp
AvanadeUniversity of North Texas

Built a production AI-driven contract/document extraction system combining OCR, normalization, and LLM schema-guided extraction, orchestrated with PySpark and Azure Data Factory and loaded into PostgreSQL for analytics. Emphasizes reliability at scale—using strict JSON schemas, confidence scoring, targeted retries, and multi-layer validation to control hallucinations while processing thousands of PDFs per hour—and partners closely with non-technical business teams to refine fields and deliver usable dashboards.

View profile

Need someone specific?

AI Search