Browse Talent Find Talent Open Jobs Pricing FAQsGet Started

Vetted Apache Hive Professionals

Pre-screened and vetted.

Apache Hive Python SQL Apache Spark Docker AWS

Vasavi Mittapalli

Screened

Senior Data Scientist specializing in GenAI, LLMs and RAG

Dallas, TX5y exp

Texas InstrumentsTrine University

“Built and deployed a production LLM-powered RAG assistant for semiconductor manufacturing failure analysis, reducing engineer triage effort by grounding outputs in retrieved evidence and gating responses with SPC + ML signals (LSTM anomaly scores, XGBoost probabilities). Experienced with LangChain/LangGraph to ship reliable, observable multi-step agents with branching/fallback logic, and evaluates impact using both technical metrics and business KPIs like mean time to triage and downtime reduction.”

A/B Testing Agile Amazon DynamoDB Amazon EC2 Amazon EMR Amazon Kinesis+195

View profile

Jaswanth Vakkala

Screened

Mid-level Generative AI Engineer specializing in enterprise RAG and multimodal NLP

Iselin, NJ5y exp

Wells FargoSt. Francis College

“Built and deployed a production LLM/RAG chatbot at Wells Fargo for securely querying regulated financial and compliance documents, emphasizing low hallucination rates, explainability, and strict governance. Experienced with LangChain multi-agent orchestration plus Airflow/Prefect pipelines for ingestion, embeddings, evaluation, and retraining, and partnered closely with compliance/operations to drive adoption through demos and feedback-driven retrieval rules.”

A/B Testing Anomaly Detection Apache Hadoop Apache Hive Apache Spark AWS+224

View profile

Harshitha Kotari

Screened

Mid-level Data/ML Engineer specializing in NLP, GenAI, and scalable data pipelines

5y exp

AbbottClarkson University

“AI/ML engineer with production experience building LLM-powered document intelligence and customer support systems in healthcare/insurance, emphasizing high-accuracy RAG, long-document processing, and robust monitoring/fallback mechanisms. Also automates and scales ML lifecycle workflows using Apache Airflow and Kubeflow, and partners closely with non-technical operations stakeholders to drive adoption.”

Python R SQL Java MATLAB HTML+148

View profile

Bernard Griffin

Screened

Senior Data Scientist / ML Engineer specializing in cloud ML pipelines and GenAI

Baltimore, MD17y exp

IntelIllinois Institute of Technology

“ML/NLP practitioner with experience building a transformer-failure prediction system that combines sensor signals with unstructured maintenance comments using LLM-based extraction and similarity validation. Strong emphasis on production readiness—data leakage controls, SQL-driven data quality tiers, and rigorous bias/fairness validation (including contract/spec evaluation across diverse company profiles).”

A/B Testing Amazon Athena Amazon Bedrock Amazon EC2 Amazon EMR Amazon Kinesis+130

View profile

Dhanalakshmi Jammisetti

Screened

Mid-level Full-Stack Developer specializing in cloud microservices and internal tooling

4y exp

The Home DepotUniversity of Central Missouri

“LLM/RAG engineer who has shipped production systems in high-stakes domains (fraud analytics at Mastercard and security compliance as a CI/CD gate). Strong focus on reliability: hybrid retrieval for latency, citation-backed outputs for trust, and code-driven eval/regression pipelines using golden datasets. Also built scalable OCR-based ingestion for messy classroom artifacts (handwriting, PDFs, whiteboard photos) using Go/Python and cloud services.”

.NET Accessibility Agile Angular API Development API Gateway+246

View profile

Yu Liu

Screened

Senior Big Data Engineer specializing in AML/KYC compliance and cloud data platforms

New York, NY17y exp

CitigroupUniversity of Missouri

“Data engineer with experience delivering an end-to-end pipeline handling ~3.5TB in a star-schema setup (fact + dimensions) and producing business-facing tables in Hive/Spark. Identified and resolved UAT-reported duplicate issues caused by joins through root-cause analysis, and also built automation to run Spark SQL metrics on weekly/monthly/quarterly cadences and distribute results to users.”

Python JavaScript Shell Scripting SQL MySQL PostgreSQL+110

View profile

Bhavya Sree Ganja

Screened

Senior Data Engineer specializing in cloud lakehouse platforms and streaming analytics

Pittsburgh, PA8y exp

First National BankTexas A&M University-Corpus Christi

“Data engineer focused on fraud and banking analytics who has owned end-to-end batch + streaming pipelines at very large scale (hundreds of millions of records/day). Built robust data quality/observability layers (schema validation, anomaly detection, alerting) and delivered low-latency serving via AWS Lambda/API Gateway with DynamoDB + Redis, plus external data ingestion/scraping pipelines orchestrated in Airflow with anti-bot protections.”

Agile Amazon API Gateway Amazon Athena Amazon CloudWatch Amazon DynamoDB Amazon EC2+210

View profile

Aditya Jhaveri

Screened

Mid-level Software Engineer specializing in AI, big data, and distributed systems

Jersey City, NJ3y exp

New York UniversityNYU

“Software Developer at NYU (GEMSS) focused on scaling and optimizing a data-heavy asset management web app, including migrating/optimizing data access via Google Sheets API and Firestore. Previously an SDE at Sainapse working on Spring Boot microservices POCs (Kafka, Hadoop at 2B+ record scale). Built an end-to-end Apple Wallet coupon generation/redemption system using PassKit + Google Apps Script with measurable ops impact (40% efficiency gain).”

Agile Algorithms Anomaly Detection Apache Hadoop Apache Hive Apache Kafka+124

View profile

Deepthi Mundarinti

Screened

Mid-level Data Engineer specializing in real-time analytics and regulated domains

NC, USA5y exp

JPMorgan ChaseSaint Louis University

“Data platform engineer focused on large-scale, real-time fraud systems, with hands-on ownership of streaming architectures using Kafka, Spark, Snowflake, and Databricks. Stands out for combining performance tuning and platform automation with LLM/RAG-based enrichment, delivering measurable gains in latency, fraud accuracy, false positives, and analyst decision speed.”

Python NumPy Pandas PySpark Scikit-learn TensorFlow+120

View profile

Sathyavarthan Balachandar

Screened

Mid-level Data Engineer specializing in scalable pipelines, Spark, and cloud data warehousing

Boston, USA3y exp

Fidelity InvestmentsNortheastern University

“Backend/data platform engineer who recently owned an end-to-end large-scale financial data platform delivering real-time decision support for finance and operations. Has hands-on experience modernizing legacy batch pipelines into AWS cloud-native ELT with parallel-run cutovers, strong data quality controls (dbt-style tests, reconciliation), and measurable improvements in runtime, cost, and SLA compliance. Also builds scalable, secure FastAPI microservices using Docker, ALB-based horizontal scaling, Redis caching, and managed auth with Cognito/Supabase plus Postgres RLS.”

Python SQL Go Apache Spark PySpark Databricks+125

View profile

Bhuvan Chandi

Screened

Mid-level Data Engineer specializing in AI/ML data platforms

NY, NY6y exp

BlackRockWebster University

“Built and productionized an LLM-powered PDF document Q&A system to eliminate manual searching through long documents, focusing on scalability and answer reliability. Implemented semantic chunking (using headings/paragraphs/tables), overlap, and preprocessing/quality checks to reduce hallucinations, and orchestrated the end-to-end pipeline with Airflow using retries, alerts, and parallel tasks.”

Python SQL Shell Scripting Apache Spark PySpark Apache Hadoop+103

View profile

Sai Raja Ramya Bhavana Thota

Screened

Senior Data Scientist specializing in machine learning and customer analytics

Illinois, USA7y exp

Northern TrustBradley University

“Data/ML practitioner with experience applying NLP and classical ML to large-scale customer data (2B+ records) for segmentation, prediction, and survey-text classification, delivering measurable business impact (~18% engagement efficiency). Has hands-on entity resolution across multi-source datasets and has built embedding-based semantic search using SentenceBERT + a vector database with domain fine-tuning (~20% relevance improvement), plus production workflow experience with Spark/Airflow and cloud tooling (AWS/Azure).”

A/B Testing Analytics Azure Machine Learning Bash BigQuery C+195

View profile

Devender Kunta

Screened

Senior Data Engineer specializing in Azure Lakehouse, Databricks/Spark, and Snowflake

Richardson, TX6y exp

PwCUniversity of Central Missouri

“Data engineer/platform builder with experience across PwC and Liberty Mutual delivering high-volume, production-grade pipelines and real-time data services. Has owned end-to-end streaming + batch architectures on AWS and Azure, including web scraping systems, with quantified reliability gains (99.9% availability, 90%+ error reduction, 30% latency reduction) and strong observability/CI-CD practices.”

AWS Databricks Apache Spark PySpark Scala Python+109

View profile

Vaibhav Sharma

Screened

Mid-level Software Engineer specializing in AI/ML and data platforms

Remote, USA5y exp

GoogleIndiana University Bloomington

“AI/ML engineer who built a production agentic system to automate computational research experiments (simulation execution, parameter exploration, and numerical analysis) and mitigated context-window failures using constrained tool-calling/prompt-chaining patterns in LangChain with OpenAI tool-enabled models. Also has adtech/big-data pipeline experience at InMobi, orchestrating Spark jobs in Airflow to filter bot-like user IDs and publish clean IDs to an online NoSQL store for live serving, plus Apache open-source collaboration experience.”

A/B Testing Apache Airflow Apache Hadoop Apache Hive Apache Kafka Apache Spark+100

View profile

Harshavardhan Reddy

Screened

Mid-level AI/ML Data Scientist specializing in NLP, computer vision, and risk analytics

Albany, NY5y exp

Capital OnePace University

“ML/AI engineer with Capital One experience building production-grade customer segmentation and fraud detection systems combining NLP (transformers) and anomaly detection. Strong MLOps and orchestration background (PySpark ETL, MLflow, Airflow, Docker/Kubernetes, Azure ML) with real-time monitoring/alerting and performance optimizations like quantization and caching, plus proven ability to deliver business-facing insights through Power BI/Tableau for marketing stakeholders.”

Python R SQL PySpark Scala Java+105

View profile

Yinghai Yu

Screened

Mid-level Data Engineer specializing in cloud data platforms and AI/ML pipelines

San Mateo, CA6y exp

Bubbles and BooksGeorgia Tech

“Data-engineering-oriented candidate with hands-on experience building an agentic AI product and operational automation workflows. They described automating inventory-to-ERP discrepancy reconciliation with anomaly detection and daily reporting, and also have practical scraping/automation experience dealing with Cloudflare-protected sites using Selenium and Puppeteer.”

Python Pandas NumPy Scikit-learn Scala Java+87

View profile

Shruti Pangare

Screened

Junior AI/ML Software Engineer specializing in LLMs and data-intensive systems

New York, NY3y exp

NYU Langone HealthNYU

“AI/backend engineer who has owned production applied-ML systems end to end, including a Jitsi meeting intelligence platform with custom RoBERTa boundary detection, LLM summarization, and automated retraining from user feedback. Also has healthcare AI experience building a diabetes medication titration system with strict validation, drift monitoring, and safety guardrails—showing both product speed and high-stakes engineering rigor.”

Python SQL PL/SQL R Pandas NumPy+147

View profile

Richard Wicaksono

Screened

Junior Data Engineer and Analyst specializing in ETL, analytics, and e-commerce data

Walnut, CA3y exp

Dreamstream, LLCUC Irvine

“Data engineer with a Master's in Data Science who has owned 30+ customer-facing K-12 SIS migrations end-to-end, building ETL, validation, and SOP-driven deployment processes in a PII-sensitive environment. Also brings recent hands-on agentic AI experience from a biotech capstone, where they led a production-oriented NLP-to-SQL + RAG support system that handled about 30% of support queries in testing.”

Python Pandas SQL R Java C+++66

View profile

Anuj Bubna

Screened

Senior DevOps/SRE Engineer specializing in cloud automation, reliability, and data pipelines

10y exp

IntuitUniversity of Texas at Dallas

“Hands-on technical professional experienced in taking LLM/AI-adjacent integrations from prototype to production, using customer observation to refine UX and uncover edge cases. Diagnoses workflow issues in real time using logs and Sankey-style workflow analysis, and communicates fixes with clear short/long-term plans plus proactive alerting. Also partners cross-functionally to drive adoption and cost savings, including a POC around IBM Sterling Integrator that reduced licensing costs by $30K/year.”

DevOps Automation Retrieval-Augmented Generation (RAG)Data Engineering ETL AWS+121

View profile

Giri Nathan

Screened

Executive Technology Leader (CTO/CIO) specializing in cloud, AI/ML, and cybersecurity

38y exp

Production Resource GroupCharter Oak State College

“CTO who ties technology strategy directly to business outcomes, building multi-year roadmaps with measurable ROI. Led major modernization (cloud, data platform, unified API, microservices + CI/CD) delivering 5x faster releases/deployments, 99.8% uptime, and 40% user growth without headcount increases, while scaling engineering from 15 to 80+ in ~18 months.”

Leadership Strategic planning Mentoring Coaching Team management Budgeting+108

View profile

Prem Kumar

Screened

Senior Data Engineer specializing in cloud data platforms and regulated analytics

McLean, VA6y exp

Capital OneRowan University

“Data engineer at Capital One building AWS-based real-time and batch pipelines and backend data services for financial/fraud use cases. Has owned end-to-end pipelines processing millions of records/day, implemented dbt/Great Expectations quality gates, and tuned Redshift/Snowflake workloads (cutting query latency ~22–25% and reducing pipeline failures ~30–40%) while supporting 15+ downstream consumers.”

Python SQL PySpark Scala Java Bash+152

View profile

Snehitha Borra

Screened

Mid-level Data Engineer specializing in cloud data platforms and big data pipelines

5y exp

Molina HealthcareUniversity of Michigan-Dearborn

“Healthcare data engineer with hands-on ownership of claims/member data pipelines on a cloud analytics platform, spanning batch and streaming ingestion (Airflow/Kafka/Spark/Databricks) through serving for reporting. Emphasizes reliability and data quality via embedded validation, schema-drift detection, deduplication, and operational monitoring/incident response, plus pragmatic CI/CD and observability setup in early-stage/ambiguous projects.”

Python Scala SQL PySpark Shell Scripting PowerShell+128

View profile

Dinesh Kumar Patibandla

Screened

Mid-level Machine Learning Engineer specializing in LLMs and RAG for finance and healthcare

Texas, USA4y exp

Goldman SachsUniversity of North Texas

“ML Engineer with recent Goldman Sachs experience building and deploying a production RAG/LLM assistant for summarization, drafting, and internal knowledge retrieval across financial, risk, and compliance documents. Designed for heavy regulatory constraints and scaled to 10,000+ concurrent users using Kubernetes-based orchestration, dynamic LLM routing, and rigorous testing (adversarial prompts, A/B tests, load simulations) with privacy controls like differential privacy.”

A/B Testing Apache Hadoop Apache Hive Apache Spark AWS BERT+118

View profile

Pavan Kumar Malasani

Screened

Mid-level AI/ML Engineer specializing in financial risk, fraud detection, and GenAI

Remote, USA4y exp

CitigroupUniversity of Colorado Boulder

“GenAI/ML engineer in Citigroup’s finance environment who has deployed production RAG systems for investment banking under strict privacy and model-risk constraints. Built an internal-VPC Llama2 + Pinecone + LangChain solution with NER redaction and citation-based verification to prevent hallucinations, delivering major time savings, and also partnered with global finance executives to ship an AI early-warning indicator for treasury/liquidity risk.”

A/B Testing Amazon CloudWatch Apache Airflow Apache Hive Apache Kafka Apache Spark+137

View profile

Data Engineers Machine Learning Engineers Data Scientists Software Engineers Data Analysts Software Development Engineers Data & Analytics AI & Machine Learning Engineering Education

Need someone specific?

AI Search

Related

Need someone specific?