Building a Scientific Reasoning Engine with Graph-Powered Reinforcement Learning
Most language models learn reasoning implicitly from internet text. They absorb patterns of what sounds convincing without explicitly modeling the structure of scientific arguments. We took a different approach: build a knowledge graph that explicitly represents how scientific reasoning works, then use it to fine-tune an open-source model.
The result: a 32-billion parameter model that follows logical chains, grounds claims in evidence, and synthesizes contradictory findings—trained on 100+ landmark research papers across physics, biology, neuroscience, and beyond.
At the heart of our approach is a knowledge graph with 28 node types and 34 relationship types designed specifically to model scientific reasoning (graph schema excluded for brevity).
The Data: 100+ Landmark Papers Across Fields
The knowledge graph is populated from a curated collection of landmark scientific papers spanning multiple disciplines:
Physics & Cosmology
- Sir Isaac Newton's Principia Mathematica
- Hubble's observations of galactic redshift
- Einstein's special and general relativity papers
- Quantum mechanics foundational work (Schrödinger, Heisenberg, Bohr)
- Particle physics breakthroughs
Biology & Neuroscience
- Watson & Crick's DNA structure
- Hodgkin-Huxley neural signaling model
- Hubel & Wiesel's visual cortex mapping
- Evolutionary biology and genetics
Psychology & Cognitive Science
- Kahneman & Tversky's cognitive biases
- Memory and learning research
- Decision-making under uncertainty
Methodology & Philosophy of Science
- Popper on falsification
- Kuhn on paradigm shifts
- Statistical inference frameworks
From Knowledge Graph to Training Data
The graph structure enables us to generate training examples that teach specific reasoning skills, for example:
Task 1: Argument Evaluation
Input: A set of premises from the graph
Target: The logical conclusion that follows
Premises from graph:
- Gravitational attraction diminishes with inverse square of distance
- This relationship holds universally across mass scales
Model learns to produce: "Orbital mechanics can be precisely calculated using Newtonian principles for any two-body gravitational system."
Task 2: Claim-Evidence Grounding
Input: A claim and its linked evidence nodes
Target: An assessment of evidential support
Claim: "Cortical receptive fields form two distinct groups"
Evidence: "233 of 303 cells showed simple fields, 70 showed complex"
Model learns to produce: "The quantitative data strongly supports the dichotomy, with 77% showing simple fields. The clear numerical separation suggests distinct processing mechanisms, though boundary cases may exist."
Task 3: Warrant Assessment
Input: A claim and its justifying warrants
Target: Evaluation of justification strength
The graph tells us exactly which warrants justify which claims, enabling targeted training on justification reasoning.
Task 4: Contradiction Resolution
Input: Two contradicting claims from the graph
Target: A synthesis that addresses both positions
Contradiction: Theory A claims X, Theory B claims not-X
Model learns to produce: "The apparent contradiction resolves when distinguishing between [context where A applies] and [context where B applies]..."
Task 5: Inference Chains
Input: A multi-step reasoning chain from the graph
Target: The final conclusion with confidence tracking
Task 6: Hypothesis Completion
Input: Partial hypothesis with evidence context
Target: Completed testable hypothesis
Training Pipeline
Phase 1: Supervised Fine-Tuning
We extracted 1,529 examples across the 6 tasks and fine-tuned Qwen3-32B with LoRA adaptation.
Results after 3 epochs:
The loss hierarchy mirrors the cognitive complexity of each task. Straightforward deduction is easier than multi-step synthesis.
Phase 2: Reinforcement Learning
RL optimizes directly for reasoning quality using task-specific reward functions:
Results after 200 batches:
-
Mean reward: 0.45 → 0.55 (+23.8%)
-
Argument evaluation: 0.90 (near-ceiling)
-
KL divergence: stable at 0.0009
What the Model Learned
Before Training
When given a scientific reasoning task, the base model produces generic, ungrounded responses:
Input: Evaluate this argument about gravitational effects...
Base model: "Gravity is an important force in physics that affect many phenomena. Scientists have studied gravity extensively..."
Fluent but disconnected from the specific argument structure.
After Training
Model learns to produce: "Given that gravitational attraction between two particles diminishes according to the inverse square of their separation distance, and that this relationship holds universally across all mass scales, it follows that orbital mechanics can be precisely calculated using Newtonian principles, with deviations only appearing at relativistic velocities or extreme mass concentrations."
The trained model references specific premises, follows logical structure, notes boundary conditions, and stays grounded in the argument.
Task Performance Hierarchy
After training, clear performance patterns emerged:
This hierarchy is informative: it tells us which reasoning skills transfer easily from knowledge graph structure and which need more training signal.
Limitations and Next Steps
Current Limitations
-
Dataset scale — 1,529 examples is small; 10-50x more data would help
-
Domain coverage — Not all fields equally represented
-
Evaluation — Need formal benchmarks beyond reward curves
Future Directions
-
Expand the knowledge graph — More papers, more domains, more complete coverage
-
Retrieval-augmented reasoning — Let the model query the graph at inference time
-
Multi-turn reasoning — Extend beyond single-shot to iterative refinement
-
Formal evaluation — Benchmark against scientific reasoning datasets
The Bigger Picture
We set out to answer: can we improve AI reasoning by making reasoning structure explicit?
The results suggest yes. A knowledge graph with 28 node types and 34 relationships—encoding how scientific reasoning actually works—provides training signal that pure text learning misses.
The model learned to:
- Follow premise→conclusion chains
- Ground claims in evidence
- Synthesize contradictory positions
- Track confidence through inference steps
It's only a specialized tool for structured scientific reasoning but it demonstrates that explicit knowledge representation and neural learning can complement each other.
The knowledge graph provides structure. The language model provides fluency. Together, they produce reasoning that is both logically grounded and naturally expressed.
Interested in knowledge graphs, scientific reasoning, or combining symbolic and neural approaches? I'd love to hear about similar projects.