About
I build AI systems for governance, legal, and institutional applications, working with large collections of court judgments, policy documents, and administrative records.
I have a Ph.D. in Mathematics (Probability Theory) from Indiana University Bloomington. That background shapes how I think about text at scale, finding structure in noisy documents and quantifying things that seem hard to measure.
I've built large-scale datasets and used machine learning to study bias in judicial systems, analyze policy documents, and extract structured information from bureaucratic text. I'm interested in how these tools can surface patterns that would otherwise be invisible.
I currently work with organizations in international development, public finance, and legal tech.
Experience
AI Architect (Consultant)
Developed AI architecture and engineering for legal workflow automation platform serving prosecutors and case management professionals. Designed and deployed production agentic systems using LangGraph and Neo4j knowledge graphs. Built multi-agent structured information extraction systems and event-driven workflows using AWS Lambda and Bedrock.
Consultant (AI & Data Analytics)
Lead AI solutions development and large-scale data engineering for governance analytics across multi-country projects. Developed production RAG systems for knowledge search using Azure AI Search, expertise discovery and retrieval platforms, and data pipelines for processing and automating BOOST expenditure and budget data from various countries. Built large-scale legal datasets covering India (83M+ district court cases), Kenya, and Indonesia (12M records).
Consultant (AI & NLP)
Developed structured information extraction to track technology use across GEF project corpus, including classification and taxonomy development. Analyzed gaps between advisory recommendations and implementation. Built policy coherence identification tools and automatic feature extraction that replicates expert analysis patterns.
Team Leadership
AI Architecture & Engineering
Lead technical architecture and cross-functional collaboration with backend engineering team. Drive architecture decisions for AI features, coordinate development priorities, and ensure alignment between AI capabilities and product requirements.
Team Lead, DeJure Projects
Led team of 10 across multiple research projects, making technical decisions and managing deliverables for research initiatives. Coordinated data collection and analytics workflows, organized task distribution, and provided technical mentorship on data processing methodologies. Established success metrics and maintained alignment with project objectives.
World Bank, ITS
Coordinate with development team to architect and deliver knowledge search and expertise discovery systems. Oversee technical requirements, feature prioritization, and deliverable timelines while ensuring alignment between technical implementation and organizational needs.
Publications
Peer-Reviewed Journal Articles
-
Measuring Gender and Religious Bias in the Indian JudiciaryBest Paper, ACM SIGCAS COMPASS '21VoxDev | Ideas for India | हिंदी | Hindustan Times | Telegraph India | Indian Express | EPW | Bar & Bench
-
Mapping the Geometry of Law Using Natural Language Processing
-
Caste Aside? Names, Networks and Justice in the Courts of Bihar, India
-
The Promise of Machine Learning for the Courts of India
-
Inequalities for Critical Exponents in d-Dimensional Sandpiles
Book Chapters
-
The Promise of AI for the Courts in IndiaThe Cambridge Handbook of AI and Technologies in Courts (Forthcoming)
-
Government Analytics Using Machine Learning
Conference Proceedings
-
Islamophobia in the Justice System and Judicial Mitigation in Bihar, India
Working Papers
-
Litigation as Scrutiny: A Four Decade Analysis of Environmental Justice, Firms, and Pollution in India
-
Impact of Free Legal Search on Rule of Law: Evidence from Indian Kanoon
-
Who Is in Justice? Caste, Religion and Gender in the Courts of Bihar over a DecadeWorld Bank Blog
Reports
-
A Decade of POCSO: Developments, Challenges, and Insights from Judicial DataOp-Ed (Times of India)
Dissertation & Thesis
-
Critical Exponents for Models in Statistical Mechanics and Self-Organized Criticality
-
Some Aspects Of The First Passage Time Problem In Neuroscience
Teaching & Talks
Extracting Structured Insights from Public Finance Documents with LLMs
Online Training Session for Judicial Officers
Presented findings from the co-authored POCSO report to over 300 judges and judicial officers of the district judiciary, as part of a training session organized by Vidhi Centre for Legal Policy.
Natural Language Processing for Economic Research
Natural Language Processing
Taught undergraduate course covering fundamental NLP techniques for studying document corpora. Students selected document collections, formulated research questions, and applied NLP methods to analyze their chosen texts.
Introduction to Data Pipelines
Covered building data pipelines in Databricks, including medallion architecture, ETL and ELT pipeline patterns, and best practices for scalable data processing.
Teaching Assistant
Served as TA for undergraduate and graduate courses including Calculus I and II, Linear Algebra, Mathematical Analysis, Introduction to Mathematical Statistics, Probability Theory, and Stochastic Processes.
Collaborators
Selected Projects
BOOST & PFM Analytics
Built Delta Live Tables pipelines in Databricks to automate BOOST data harmonization workflows across countries.
Developed bottleneck identification tool that extracts textual evidence from Public Expenditure Reviews and PFM reports across revenue management, budget planning, expenditure control, and institutional capacity.
Technologies/Tools: Delta Live Tables, Databricks, Instructor, OpenAI, Python, SQL
eCourts Dataset & Knowledge Search
Built one of the largest judicial datasets available, covering 83M+ district court cases, 8M high court cases, and 150K appellate cases from India. Developed web scraping infrastructure handling CAPTCHAs and dynamic content. Processed and structured data at scale for research and policy applications.
Designed retrieval-augmented generation system using Azure AI Search for multi-document question answering across development policy documents. Implemented query decomposition, cross-encoder reranking, and synthesis optimization.
Technologies/Tools: Scrapy, Selenium, MongoDB, Azure AI Search, LangGraph, LangChain, Redis
Entity Resolution and Knowledge Graph System
Multi-agent system for identifying identical persons across heterogeneous documents (court filings, police reports, case records). Implemented reflection pattern for iterative accuracy improvement and consensus-based extraction using judge-advocate-skeptic pattern. Developed Graph RAG system over Neo4j that automatically identifies missing evidence and documentation gaps by cross-referencing mentions across case files, police reports, and court filings. System alerts prosecutors to incomplete document chains before trial preparation, ensuring case integrity and discovery compliance.
Designed expertise quantification system aggregating signals across heterogeneous data sources for subject matter expert matching. Developed retrieval algorithms with multi-dimensional metrics balancing recency, depth, and breadth of expertise.
Technologies: Neo4j, LangGraph, CrewAI, Graph Transformers, Cypher, AWS Lambda, Pydantic, PostgreSQL, Azure AI Search, FAISS
Witchcraft Court Case Pattern Analysis
Multi-agent system for analyzing narrative structures in witchcraft court cases. Used graph transformers to extract knowledge graphs identifying relationships between accusers, defendants, and alleged acts. Agents collaboratively extract accusation patterns, classify crime typologies, and identify recurring themes across cases. Interactive T-SNE visualization enables exploration of how different types of accusations and legal narratives cluster and evolve.
Technologies/Tools: Atomic Agents, Graph Transformers, OpenAI, Fly.io
Contact
sandeepbhupatiraju [at] gmail [dot] com