Hi, I'm Sanjana —I turn data into insight and create impact.
Data Scientist | AI Researcher

About Me
With 2 degrees and nearly 4 years of hands-on experience working with data, I thrive at the intersection of data analysis and engineering. Whether it's building scalable data pipelines, automating workflows with Python, uncovering insights with SQL, or deploying solutions using AWS and Spark — I enjoy making data systems both powerful and elegant.
If you're working on something impactful (or just want to nerd out about LLMs, visualizations, or what makes a clean data pipeline), let's connect.
Storyteller
Coffee Lover
Writer
Visual Learner
Featured Projects
Panel Detection in Double Feature Comics
Computer Vision
Image Processing
Python
Algorithm Design
Work Experience
Associate Researcher and Instructor
Indiana University Luddy School of Informatics, Computing and Engineering
Aug 2024 - Present
- Designed an AI-driven knowledge navigation tool utilizing graph databases like Neo4j, and retrieval-augmented generation (RAG) using LLMs , enhancing information retrieval efficiency.
- Optimized model scoring techniques, prompt engineering strategies, and Agentic LLM implementation, improving AI-driven decision-making processes.
- Enhanced chunking and document segmentation methodologies, increasing information retrieval accuracy for the CEDS Planning Process and improving structured document understanding.
Associate Data Scientist
MathCo
Jun 2021 - Jul 2022
- Built ETL pipelines with Apache Airflow and PySpark, integrated with AWS, GCP, and Azure, reduced manual effort by 40%.
- Developed supply chain forecasting models that reduced stockouts by 15% and saved $2M annually.
- Created custom BI dashboards for logistics and operations teams, increasing visibility into performance metrics.
- Led data cataloging initiatives to improve asset discoverability and standardized metadata practices.
- Collaborated on implementation of role-based access controls and governance across cloud data environments.
Data Analyst
Pratham Books
Aug 2020 - Jun 2021
- Improved educational content discoverability by 130% using Bayesian modeling and semantic search optimization.
- Designed executive dashboards to monitor national UNICEF campaigns reaching 300K+ schools.
- Consolidated real-time engagement data across MongoDB and Cassandra sources for unified analytics.
- Conducted A/B testing to optimize outreach strategies, resulting in a 25% increase in user interaction.
Software Developement Engineer
Adobe
Jun 2018 - Jun 2019
- Built a marketing mix model using Python, SQL, and time-series forecasting (ARIMA, Prophet), leading to $5M in revenue growth.
- Designed and maintained scalable big data pipelines (Hadoop, Hive, Spark), improving delivery speeds by 30%.
- Reduced data validation errors by 15% through automated quality checks and pipeline alerts.
- Produced interactive analytics dashboards for global marketing teams to track campaign ROI.
Skills & Tools
I blend structured logic with creative problem solving.
Languages
Python
Java
Scala
C++
SQL
T-SQL
PSQL
Cloud & Big Data
AWS (S3, Redshift)
GCP (BigQuery, Vertex AI)
Azure (Databricks, Synapse)
Hadoop
Spark
Databases
Snowflake
Cassandra
HBase
MongoDB
PostgreSQL
MySQL
Data Engineering
Airflow
PySpark
ETL automation
CI/CD for data pipelines
Visualization & BI
Tableau
Power BI
Matplotlib
Seaborn
Plotly
Data Modeling
Dimensional modeling
Data marts
Semantic layers
Graph modeling (Neo4j)
Governance & Security
Data cataloging
Access control policies
Hybrid-cloud architecture