Sandeep Bhupatiraju

About

I build AI systems for governance, legal, and institutional applications, working with large collections of court judgments, policy documents, and administrative records.

I have a Ph.D. in Mathematics (Probability Theory) from Indiana University Bloomington. That background shapes how I think about text at scale, finding structure in noisy documents and quantifying things that seem hard to measure.

I've built large-scale datasets and used machine learning to study bias in judicial systems, analyze policy documents, and extract structured information from bureaucratic text. I'm interested in how these tools can surface patterns that would otherwise be invisible.

I currently work with organizations in international development, public finance, and legal tech.

Experience

AI Architect (Consultant)

Paradigm Case Management, Denver · 2024–Present

Developed AI architecture and engineering for legal workflow automation platform serving prosecutors and case management professionals. Designed and deployed production agentic systems using LangGraph and Neo4j knowledge graphs. Built multi-agent structured information extraction systems and event-driven workflows using AWS Lambda and Bedrock.

Consultant (AI & Data Analytics)

World Bank Group, Washington D.C. · 2018–Present

Lead AI solutions development and large-scale data engineering for governance analytics across multi-country projects. Developed production RAG systems for knowledge search using Azure AI Search, expertise discovery and retrieval platforms, and data pipelines for processing and automating BOOST expenditure and budget data from various countries. Built large-scale legal datasets covering India (83M+ district court cases), Kenya, and Indonesia (12M records).

Consultant (AI & NLP)

Global Environment Facility (GEF), Washington D.C. · 2024–2025

Developed structured information extraction to track technology use across GEF project corpus, including classification and taxonomy development. Analyzed gaps between advisory recommendations and implementation. Built policy coherence identification tools and automatic feature extraction that replicates expert analysis patterns.

Team Leadership

AI Architecture & Engineering

Paradigm Case Management · 2024–Present

Lead technical architecture and cross-functional collaboration with backend engineering team. Drive architecture decisions for AI features, coordinate development priorities, and ensure alignment between AI capabilities and product requirements.

Team Lead, DeJure Projects

World Bank Group · 2019–2022

Led team of 10 across multiple research projects, making technical decisions and managing deliverables for research initiatives. Coordinated data collection and analytics workflows, organized task distribution, and provided technical mentorship on data processing methodologies. Established success metrics and maintained alignment with project objectives.

World Bank, ITS

World Bank Group · 2025–Present

Coordinate with development team to architect and deliver knowledge search and expertise discovery systems. Oversee technical requirements, feature prioritization, and deliverable timelines while ensuring alignment between technical implementation and organizational needs.

Publications

Peer-Reviewed Journal Articles

Measuring Gender and Religious Bias in the Indian Judiciary

(with E. Ash, S. Asher, A. Bhowmick, D. L. Chen, T. Devi, C. Goessmann, P. Novosad, and B. Siddiqi)

Review of Economics and Statistics

Best Paper, ACM SIGCAS COMPASS '21
VoxDev | Ideas for India | हिंदी | Hindustan Times | Telegraph India | Indian Express | EPW | Bar & Bench
Mapping the Geometry of Law Using Natural Language Processing

(with D. L. Chen and K. Venkataramanan)

European Journal of Empirical Legal Studies
Caste Aside? Names, Networks and Justice in the Courts of Bihar, India

(with D. L. Chen, S. Joshi, and P. Neis)

European Journal of Empirical Legal Studies
The Promise of Machine Learning for the Courts of India

(with D. L. Chen and S. Joshi)

National Law School of India Review
Inequalities for Critical Exponents in d-Dimensional Sandpiles

(with J. Hanson and A. A. Járai)

Electronic Journal of Probability

Book Chapters

The Promise of AI for the Courts in India

(with D. L. Chen and S. Joshi)

The Cambridge Handbook of AI and Technologies in Courts
Government Analytics Using Machine Learning

(with D. L. Chen, S. Jankin, G. Kim, M. Kupi, and M. R. Maqueda)

The Government Analytics Handbook

Conference Proceedings

Islamophobia in the Justice System and Judicial Mitigation in Bihar, India

(with P. Neis, D. L. Chen, and S. Joshi)

18th Annual Conference on Empirical Legal Studies (CELS)

Working Papers

Litigation as Scrutiny: A Four Decade Analysis of Environmental Justice, Firms, and Pollution in India

(with D. L. Chen, S. Joshi, P. Neis, and S. Singh)

Working Paper
Impact of Free Legal Search on Rule of Law: Evidence from Indian Kanoon

(with D. L. Chen, R. Das, S. Joshi, and P. Neis)

Working Paper
Who Is in Justice? Caste, Religion and Gender in the Courts of Bihar over a Decade

(with D. L. Chen, S. Joshi, and P. Neis)

World Bank Policy Research Working Paper
World Bank Blog

Reports

A Decade of POCSO: Developments, Challenges, and Insights from Judicial Data

(with Apoorva, A. Ranjan, S. Joshi, and D. L. Chen)

Vidhi Centre for Legal Policy
Op-Ed (Times of India)

Dissertation & Thesis

Critical Exponents for Models in Statistical Mechanics and Self-Organized Criticality

Advisor: Russ Lyons

Ph.D. Dissertation, Indiana University, Bloomington, USA (2019)
Some Aspects Of The First Passage Time Problem In Neuroscience

Advisor: Govindan Rangarajan

Master's Thesis, Indian Institute of Science, Bangalore, India (2010)

Google Scholar ›

Teaching & Talks

Extracting Structured Insights from Public Finance Documents with LLMs

Computational Impact Meetup (AI/ML Series), World Bank · June 2024

Online Training Session for Judicial Officers

JALDI, Telangana State Judicial Academy, India · July 2023

Presented findings from the co-authored POCSO report to over 300 judges and judicial officers of the district judiciary, as part of a training session organized by Vidhi Centre for Legal Policy.

Natural Language Processing for Economic Research

Computational Impact Meetup, World Bank · May 2023

Natural Language Processing

KREA University, Sri City, India · March 2022 Trimester

Taught undergraduate course covering fundamental NLP techniques for studying document corpora. Students selected document collections, formulated research questions, and applied NLP methods to analyze their chosen texts.

Introduction to Data Pipelines

DEC Python Course: Advanced Topics in Data Science with Python , World Bank

Covered building data pipelines in Databricks, including medallion architecture, ETL and ELT pipeline patterns, and best practices for scalable data processing.

Teaching Assistant

Department of Mathematics, Indiana University, Bloomington, USA

Served as TA for undergraduate and graduate courses including Calculus I and II, Linear Algebra, Mathematical Analysis, Introduction to Mathematical Statistics, Probability Theory, and Stochastic Processes.

Collaborators

Selected Projects

BOOST & PFM Analytics

World Bank Group · 2024–Present

Built Delta Live Tables pipelines in Databricks to automate BOOST data harmonization workflows across countries.

Developed bottleneck identification tool that extracts textual evidence from Public Expenditure Reviews and PFM reports across revenue management, budget planning, expenditure control, and institutional capacity.

Technologies/Tools: Delta Live Tables, Databricks, Instructor, OpenAI, Python, SQL

BOOST PFM Bottlenecks

eCourts Dataset & Knowledge Search

World Bank Group · 2018–Present

Built one of the largest judicial datasets available, covering 83M+ district court cases, 8M high court cases, and 150K appellate cases from India. Developed web scraping infrastructure handling CAPTCHAs and dynamic content. Processed and structured data at scale for research and policy applications.

Designed retrieval-augmented generation system using Azure AI Search for multi-document question answering across development policy documents. Implemented query decomposition, cross-encoder reranking, and synthesis optimization.

Technologies/Tools: Scrapy, Selenium, MongoDB, Azure AI Search, LangGraph, LangChain, Redis

View dataset ›

Entity Resolution and Knowledge Graph System

Paradigm Case Management & World Bank Group · 2024–Present

Multi-agent system for identifying identical persons across heterogeneous documents (court filings, police reports, case records). Implemented reflection pattern for iterative accuracy improvement and consensus-based extraction using judge-advocate-skeptic pattern. Developed Graph RAG system over Neo4j that automatically identifies missing evidence and documentation gaps by cross-referencing mentions across case files, police reports, and court filings. System alerts prosecutors to incomplete document chains before trial preparation, ensuring case integrity and discovery compliance.

Designed expertise quantification system aggregating signals across heterogeneous data sources for subject matter expert matching. Developed retrieval algorithms with multi-dimensional metrics balancing recency, depth, and breadth of expertise.

Technologies: Neo4j, LangGraph, CrewAI, Graph Transformers, Cypher, AWS Lambda, Pydantic, PostgreSQL, Azure AI Search, FAISS

Witchcraft Court Case Pattern Analysis

Personal Research · 2023–2024

Multi-agent system for analyzing narrative structures in witchcraft court cases. Used graph transformers to extract knowledge graphs identifying relationships between accusers, defendants, and alleged acts. Agents collaboratively extract accusation patterns, classify crime typologies, and identify recurring themes across cases. Interactive T-SNE visualization enables exploration of how different types of accusations and legal narratives cluster and evolve.

Technologies/Tools: Atomic Agents, Graph Transformers, OpenAI, Fly.io

View application ›

Contact

sandeepbhupatiraju [at] gmail [dot] com