Hi, I'm Sanjana —
I turn data into insight and create impact.

Data Scientist | AI Researcher

Sanjana Jairam

About Me

With 2 degrees and nearly 4 years of hands-on experience working with data, I thrive at the intersection of data analysis and engineering. Whether it's building scalable data pipelines, automating workflows with Python, uncovering insights with SQL, or deploying solutions using AWS and Spark — I enjoy making data systems both powerful and elegant.

If you're working on something impactful (or just want to nerd out about LLMs, visualizations, or what makes a clean data pipeline), let's connect.

Storyteller
Coffee Lover
Writer
Visual Learner

Featured Projects

Rideshare Crashes in NYC

Data Analysis
Visualization
Transportation
Python

Visualizing the Eras Tour

Streamlit
Data Visualization
Web App
Python

Emi Bot

Flask
Python
Web App
Human-Robot Interaction

Panel Detection in Double Feature Comics

Computer Vision
Image Processing
Python
Algorithm Design

Abusive Language Detection

NLP
Machine Learning
XGBoost
GloVe

Tennis Video Classification

Video Processing
Machine Learning
Classification
Python

Work Experience

Associate Researcher and Instructor

Indiana University Luddy School of Informatics, Computing and Engineering

Aug 2024 - Present

  • Designed an AI-driven knowledge navigation tool utilizing graph databases like Neo4j, and retrieval-augmented generation (RAG) using LLMs , enhancing information retrieval efficiency.
  • Optimized model scoring techniques, prompt engineering strategies, and Agentic LLM implementation, improving AI-driven decision-making processes.
  • Enhanced chunking and document segmentation methodologies, increasing information retrieval accuracy for the CEDS Planning Process and improving structured document understanding.

Associate Data Scientist

MathCo

Jun 2021 - Jul 2022

  • Built ETL pipelines with Apache Airflow and PySpark, integrated with AWS, GCP, and Azure, reduced manual effort by 40%.
  • Developed supply chain forecasting models that reduced stockouts by 15% and saved $2M annually.
  • Created custom BI dashboards for logistics and operations teams, increasing visibility into performance metrics.
  • Led data cataloging initiatives to improve asset discoverability and standardized metadata practices.
  • Collaborated on implementation of role-based access controls and governance across cloud data environments.

Data Analyst

Pratham Books

Aug 2020 - Jun 2021

  • Improved educational content discoverability by 130% using Bayesian modeling and semantic search optimization.
  • Designed executive dashboards to monitor national UNICEF campaigns reaching 300K+ schools.
  • Consolidated real-time engagement data across MongoDB and Cassandra sources for unified analytics.
  • Conducted A/B testing to optimize outreach strategies, resulting in a 25% increase in user interaction.

Software Developement Engineer

Adobe

Jun 2018 - Jun 2019

  • Built a marketing mix model using Python, SQL, and time-series forecasting (ARIMA, Prophet), leading to $5M in revenue growth.
  • Designed and maintained scalable big data pipelines (Hadoop, Hive, Spark), improving delivery speeds by 30%.
  • Reduced data validation errors by 15% through automated quality checks and pipeline alerts.
  • Produced interactive analytics dashboards for global marketing teams to track campaign ROI.

Skills & Tools

I blend structured logic with creative problem solving.

Languages
Python
Java
Scala
C++
SQL
T-SQL
PSQL
Cloud & Big Data
AWS (S3, Redshift)
GCP (BigQuery, Vertex AI)
Azure (Databricks, Synapse)
Hadoop
Spark
Databases
Snowflake
Cassandra
HBase
MongoDB
PostgreSQL
MySQL
Data Engineering
Airflow
PySpark
ETL automation
CI/CD for data pipelines
Visualization & BI
Tableau
Power BI
Matplotlib
Seaborn
Plotly
Data Modeling
Dimensional modeling
Data marts
Semantic layers
Graph modeling (Neo4j)
Governance & Security
Data cataloging
Access control policies
Hybrid-cloud architecture