Mid-level Data Scientist specializing in GenAI, RAG, and forecasting
New Jersey, USAResearch Assistant4 years experienceMid-LevelHealthcareHealthcare ITEducation
ScreenedIdentity Verified
Connect with Diana
Diana already has a relationship with Reval, so a warm intro from us gets a much better response than cold outreach.
Recommended
Already have an account?
About
ML/NLP engineer focused on large-scale data linking for e-commerce-style catalogs and customer records, combining transformer embeddings (BERT/Sentence-BERT), NER, and FAISS-based vector search. Has delivered measurable lifts (e.g., +30% matching accuracy, Precision@10 62%→84%) and built production-grade, scalable pipelines in Airflow/PySpark with strong data quality and schema-drift handling.
Experience
Research AssistantUniversity at Buffalo
Graduate Student AssistantUniversity at Buffalo
Data AnalystTata Consultancy Services Ltd.
Full-Stack Software Engineering InternQSpiders
Education
University at Buffalomaster, Data Science (2025)
Visvesvaraya Technological Universitybachelor, Information Science (2020)
Key Strengths
Built NLP pipeline to unify multi-vendor product catalogs using BERT/DistilBERT + NER + FAISS
Improved product matching accuracy by ~30%
Designed scalable entity resolution with hybrid blocking + BERT similarity; scaled to tens of millions of records
Improved entity match accuracy by ~25% vs prior system
Improved semantic search relevance via Sentence-BERT fine-tuning; Precision@10 from 62% to 84% and ~35% relevance lift
Production-grade data workflow engineering (Airflow, PySpark, Docker, CI/CD, monitoring, data quality checks)
Handled vendor schema drift with automated schema validation and dynamic mapping layer
Built and deployed a RAG-based compliance/legal document review system
Designed for high-precision retrieval to reduce compliance risk
Optimized retrieval and model performance for large-scale data (latency/compute constraints)
Hands-on Airflow orchestration for ETL + ML pipelines at scale (5M+ daily records)
End-to-end pipeline design from S3/Glue/PySpark to Redshift with DAG-based scheduling
Structured agent/workflow testing approach (unit + integration) with measurable metrics
Production monitoring with real-time dashboards and stakeholder feedback loops
Effective collaboration with non-technical stakeholders (marketing/compliance) translating requirements into usable outputs
Discover more candidates like Diana
Search across thousands of pre-screened, high-quality, high-intent candidates on Reval.