Back to articles

Building the world's first queryable knowledge graph of 30 years of UN diplomatic history

2 min read
Building the world's first queryable knowledge graph of 30 years of UN diplomatic history

Generic LLMs can't:

  • Calculate voting correlation across 178 UN resolutions
  • Track how 184 countries voted on specific issues over 30 years
  • Find speeches that reference UNSC Resolution 1325 with sub-second latency and full context
  • Analyze temporal evolution of diplomatic positions using graph relationships

These questions require reasoning over interconnected data—voting records, speech transcripts, temporal patterns, and geopolitical relationships. Standard RAG approaches fail because they can't traverse relationships or calculate patterns across thousands of documents.

📊 The Dataset: 30 Years of UN Diplomatic History

I assembled and structured the most comprehensive UN knowledge graph ever created:

  • 61,505 Security Council speeches (1995-2024)
  • 12,481 UN resolutions with full voting records
  • 1.68 million individual country votes across 184 nations
  • 2.8 million relationships connecting nations, speeches, resolutions, and diplomatic positions

To my knowledge, this aggregate dataset doesn't exist anywhere else on the internet, let alone being queryable in this form.

Check it out at https://un.corticality.com/

🔧 Technical Architecture

The system combines three technologies that each solve a different problem:

Neo4j graph database stores the relationships between entities—who voted on what, which speeches referenced which resolutions, how nations interact over time. Vector-optimized HNSW indexes enable sub-second semantic search across millions of nodes.

OpenAI GPT-5 (OpenAI's latest reasoning model) orchestrates 8 specialized tools through the Vercel AI SDK: vector search on speeches and resolutions, voting pattern analysis with temporal filters, resolution impact tracking, hybrid search combining semantic similarity with graph traversal, and entity normalization to handle variations like USA vs United States.

Vercel Next.js 16 delivers the experience through streaming server components, progressively rendering both the AI analysis and graph visualizations as queries execute.

The key innovation is a two-stage hybrid approach: first, embed the user's question and perform vector similarity search to find relevant speeches and resolutions. Then, use graph traversal to satisfy structural constraints like time periods, voting patterns, and diplomatic groups. This is 10-100x faster than pure graph traversal while maintaining semantic relevance.

🌍 Real-World Impact

Policy researchers can now analyze voting blocs and diplomatic shifts that were previously hidden in unstructured UN documents. Journalists can fact-check claims about voting records instantly. Academics can study how international norms evolve over decades. Diplomats can understand historical positions on complex issues before negotiations.

The implications extend far beyond UN data. The same architectural pattern applies to any domain with interconnected entities: legal precedent networks, scientific literature citation graphs, corporate knowledge bases, financial transaction networks, healthcare records. Whenever you need to reason over relationships, temporal patterns, and structured data at scale, this hybrid vector + graph approach outperforms pure vector search or LLM-only solutions.

🔗 Open to discuss hybrid vector + graph architectures, RAG pipeline patterns for specialized domains, and building domain-specific AI systems that outperform general-purpose LLMs.

#AI #MachineLearning #KnowledgeGraphs #Neo4j #OpenAI #GPT5 #RAG #NextJS #SocialImpact #DataScience #NLP #SemanticSearch