Pre-screened and vetted.
Junior Data Scientist/Data Analyst specializing in machine learning and business intelligence
Junior Machine Learning Engineer specializing in NLP and LLM-based clinical AI
“Built a production automated resume matching system using Python, FAISS vector search, and Selenium-based job scraping, including mitigation for IP blocking and heterogeneous site structures. Also develops LLM/RAG applications with LangChain, using Pydantic-guardrailed structured outputs and LLM-as-a-judge evaluation (including a project focused on tone/semantics for a 3D avatar’s emotional responses).”
Junior Full-Stack Data Engineer specializing in data pipelines and analytics
“Built and shipped a production-grade RAG-powered news summarization and Q&A product, tackling real-world issues like retrieval drift, hallucinations, latency, and autoscaling deployment (Docker + FastAPI + Streamlit Cloud). Experienced in end-to-end ML/LLM workflow automation using Airflow, Kubeflow Pipelines, and MLflow, and has demonstrated business impact (40% inference precision improvement) through close collaboration with non-technical stakeholders at Evoastra Ventures.”
“Built and deployed an LLM-powered financial document processing and summarization platform at Morgan Stanley using a production RAG pipeline (PDF ingestion, embedding-based retrieval, schema-constrained JSON outputs) delivered via FastAPI microservices on Kubernetes. Drove measurable impact (40% reduction in manual review time) and improved factual accuracy for numeric fields by 30% through metadata-aware retrieval, strict schemas, and post-generation validation, with a human feedback loop from financial analysts.”
“Built an automated ML/NLP document classification system for unstructured legal documents, combining classical models (TF-IDF + logistic regression/random forest) with entity resolution via fuzzy matching validated by precision/recall. Also implemented semantic similarity search using sentence embeddings stored in FAISS and improved matching by fine-tuning a transformer on domain-specific data and tuning similarity thresholds for fewer false positives.”