Building a Scientific Reasoning Engine with Graph-Powered Reinforcement Learning

Most language models learn reasoning implicitly from internet text. They absorb patterns of what sounds convincing without explicitly modeling the structure of scientific arguments. We took a different approach: build a knowledge graph that explicitly represents how scientific reasoning works, then use it to fine-tune an open-source model.

The result: a 32-billion parameter model that follows logical chains, grounds claims in evidence, and synthesizes contradictory findings—trained on 100+ landmark research papers across physics, biology, neuroscience, and beyond.

At the heart of our approach is a knowledge graph with 28 node types and 34 relationship types designed specifically to model scientific reasoning (graph schema excluded for brevity).

The Data: 100+ Landmark Papers Across Fields

The knowledge graph is populated from a curated collection of landmark scientific papers spanning multiple disciplines:

Physics & Cosmology

Sir Isaac Newton's Principia Mathematica
Hubble's observations of galactic redshift
Einstein's special and general relativity papers
Quantum mechanics foundational work (Schrödinger, Heisenberg, Bohr)
Particle physics breakthroughs

Biology & Neuroscience

Watson & Crick's DNA structure
Hodgkin-Huxley neural signaling model
Hubel & Wiesel's visual cortex mapping
Evolutionary biology and genetics

Psychology & Cognitive Science

Kahneman & Tversky's cognitive biases
Memory and learning research
Decision-making under uncertainty

Methodology & Philosophy of Science

Popper on falsification
Kuhn on paradigm shifts
Statistical inference frameworks

From Knowledge Graph to Training Data

The graph structure enables us to generate training examples that teach specific reasoning skills, for example:

Task 1: Argument Evaluation

Input: A set of premises from the graph

Target: The logical conclusion that follows

Premises from graph:

Gravitational attraction diminishes with inverse square of distance
This relationship holds universally across mass scales

Model learns to produce: "Orbital mechanics can be precisely calculated using Newtonian principles for any two-body gravitational system."

Task 2: Claim-Evidence Grounding

Input: A claim and its linked evidence nodes

Target: An assessment of evidential support

Claim: "Cortical receptive fields form two distinct groups"

Evidence: "233 of 303 cells showed simple fields, 70 showed complex"

Model learns to produce: "The quantitative data strongly supports the dichotomy, with 77% showing simple fields. The clear numerical separation suggests distinct processing mechanisms, though boundary cases may exist."

Task 3: Warrant Assessment

Input: A claim and its justifying warrants

Target: Evaluation of justification strength

The graph tells us exactly which warrants justify which claims, enabling targeted training on justification reasoning.

Task 4: Contradiction Resolution

Input: Two contradicting claims from the graph

Target: A synthesis that addresses both positions

Contradiction: Theory A claims X, Theory B claims not-X

Model learns to produce: "The apparent contradiction resolves when distinguishing between [context where A applies] and [context where B applies]..."

Task 5: Inference Chains

Input: A multi-step reasoning chain from the graph

Target: The final conclusion with confidence tracking

Task 6: Hypothesis Completion

Input: Partial hypothesis with evidence context

Target: Completed testable hypothesis

Training Pipeline

Phase 1: Supervised Fine-Tuning

We extracted 1,529 examples across the 6 tasks and fine-tuned Qwen3-32B with LoRA adaptation.

Results after 3 epochs:

The loss hierarchy mirrors the cognitive complexity of each task. Straightforward deduction is easier than multi-step synthesis.

Phase 2: Reinforcement Learning

RL optimizes directly for reasoning quality using task-specific reward functions:

Results after 200 batches:

Mean reward: 0.45 → 0.55 (+23.8%)
Argument evaluation: 0.90 (near-ceiling)
KL divergence: stable at 0.0009

What the Model Learned

Before Training

When given a scientific reasoning task, the base model produces generic, ungrounded responses:

Input: Evaluate this argument about gravitational effects...

Base model: "Gravity is an important force in physics that affect many phenomena. Scientists have studied gravity extensively..."

Fluent but disconnected from the specific argument structure.

After Training

Model learns to produce: "Given that gravitational attraction between two particles diminishes according to the inverse square of their separation distance, and that this relationship holds universally across all mass scales, it follows that orbital mechanics can be precisely calculated using Newtonian principles, with deviations only appearing at relativistic velocities or extreme mass concentrations."

The trained model references specific premises, follows logical structure, notes boundary conditions, and stays grounded in the argument.

Task Performance Hierarchy

After training, clear performance patterns emerged:

This hierarchy is informative: it tells us which reasoning skills transfer easily from knowledge graph structure and which need more training signal.

Limitations and Next Steps

Current Limitations

Dataset scale — 1,529 examples is small; 10-50x more data would help
Domain coverage — Not all fields equally represented
Evaluation — Need formal benchmarks beyond reward curves

Future Directions

Expand the knowledge graph — More papers, more domains, more complete coverage
Retrieval-augmented reasoning — Let the model query the graph at inference time
Multi-turn reasoning — Extend beyond single-shot to iterative refinement
Formal evaluation — Benchmark against scientific reasoning datasets

The Bigger Picture

We set out to answer: can we improve AI reasoning by making reasoning structure explicit?

The results suggest yes. A knowledge graph with 28 node types and 34 relationships—encoding how scientific reasoning actually works—provides training signal that pure text learning misses.

The model learned to:

Follow premise→conclusion chains
Ground claims in evidence
Synthesize contradictory positions
Track confidence through inference steps

It's only a specialized tool for structured scientific reasoning but it demonstrates that explicit knowledge representation and neural learning can complement each other.

The knowledge graph provides structure. The language model provides fluency. Together, they produce reasoning that is both logically grounded and naturally expressed.

Interested in knowledge graphs, scientific reasoning, or combining symbolic and neural approaches? I'd love to hear about similar projects.

At the heart of our approach is a knowledge graph with 28 node types and 34 relationship types designed specifically to model scientific reasoning (graph schema excluded for brevity).

The Data: 100+ Landmark Papers Across Fields

The knowledge graph is populated from a curated collection of landmark scientific papers spanning multiple disciplines:

Physics & Cosmology

Sir Isaac Newton's Principia Mathematica
Hubble's observations of galactic redshift
Einstein's special and general relativity papers
Quantum mechanics foundational work (Schrödinger, Heisenberg, Bohr)
Particle physics breakthroughs

Biology & Neuroscience

Watson & Crick's DNA structure
Hodgkin-Huxley neural signaling model
Hubel & Wiesel's visual cortex mapping
Evolutionary biology and genetics

Psychology & Cognitive Science

Kahneman & Tversky's cognitive biases
Memory and learning research
Decision-making under uncertainty

Methodology & Philosophy of Science

Popper on falsification
Kuhn on paradigm shifts
Statistical inference frameworks

From Knowledge Graph to Training Data

The graph structure enables us to generate training examples that teach specific reasoning skills, for example:

Task 1: Argument Evaluation

Input: A set of premises from the graph

Target: The logical conclusion that follows

Premises from graph:

Gravitational attraction diminishes with inverse square of distance
This relationship holds universally across mass scales

Model learns to produce: "Orbital mechanics can be precisely calculated using Newtonian principles for any two-body gravitational system."

Task 2: Claim-Evidence Grounding

Input: A claim and its linked evidence nodes

Target: An assessment of evidential support

Claim: "Cortical receptive fields form two distinct groups"

Evidence: "233 of 303 cells showed simple fields, 70 showed complex"

Task 3: Warrant Assessment

Input: A claim and its justifying warrants

Target: Evaluation of justification strength

The graph tells us exactly which warrants justify which claims, enabling targeted training on justification reasoning.

Task 4: Contradiction Resolution

Input: Two contradicting claims from the graph

Target: A synthesis that addresses both positions

Contradiction: Theory A claims X, Theory B claims not-X

Model learns to produce: "The apparent contradiction resolves when distinguishing between [context where A applies] and [context where B applies]..."

Task 5: Inference Chains

Input: A multi-step reasoning chain from the graph

Target: The final conclusion with confidence tracking

Task 6: Hypothesis Completion

Input: Partial hypothesis with evidence context

Target: Completed testable hypothesis

Training Pipeline

Phase 1: Supervised Fine-Tuning

We extracted 1,529 examples across the 6 tasks and fine-tuned Qwen3-32B with LoRA adaptation.

Results after 3 epochs:

The loss hierarchy mirrors the cognitive complexity of each task. Straightforward deduction is easier than multi-step synthesis.

Phase 2: Reinforcement Learning

RL optimizes directly for reasoning quality using task-specific reward functions:

Results after 200 batches:

Mean reward: 0.45 → 0.55 (+23.8%)
Argument evaluation: 0.90 (near-ceiling)
KL divergence: stable at 0.0009

What the Model Learned

Before Training

When given a scientific reasoning task, the base model produces generic, ungrounded responses:

Input: Evaluate this argument about gravitational effects...

Base model: "Gravity is an important force in physics that affect many phenomena. Scientists have studied gravity extensively..."

Fluent but disconnected from the specific argument structure.

After Training

The trained model references specific premises, follows logical structure, notes boundary conditions, and stays grounded in the argument.

Task Performance Hierarchy

After training, clear performance patterns emerged:

This hierarchy is informative: it tells us which reasoning skills transfer easily from knowledge graph structure and which need more training signal.

Limitations and Next Steps

Current Limitations

Dataset scale — 1,529 examples is small; 10-50x more data would help
Domain coverage — Not all fields equally represented
Evaluation — Need formal benchmarks beyond reward curves

Future Directions

Expand the knowledge graph — More papers, more domains, more complete coverage
Retrieval-augmented reasoning — Let the model query the graph at inference time
Multi-turn reasoning — Extend beyond single-shot to iterative refinement
Formal evaluation — Benchmark against scientific reasoning datasets

The Bigger Picture

We set out to answer: can we improve AI reasoning by making reasoning structure explicit?

The results suggest yes. A knowledge graph with 28 node types and 34 relationships—encoding how scientific reasoning actually works—provides training signal that pure text learning misses.

The model learned to:

Follow premise→conclusion chains
Ground claims in evidence
Synthesize contradictory positions
Track confidence through inference steps

It's only a specialized tool for structured scientific reasoning but it demonstrates that explicit knowledge representation and neural learning can complement each other.

The knowledge graph provides structure. The language model provides fluency. Together, they produce reasoning that is both logically grounded and naturally expressed.

Interested in knowledge graphs, scientific reasoning, or combining symbolic and neural approaches? I'd love to hear about similar projects.

The Data: 100+ Landmark Papers Across Fields

Physics & Cosmology

Biology & Neuroscience

Psychology & Cognitive Science

Methodology & Philosophy of Science

From Knowledge Graph to Training Data

Task 1: Argument Evaluation

Task 2: Claim-Evidence Grounding

Task 3: Warrant Assessment

Task 4: Contradiction Resolution

Task 5: Inference Chains

Task 6: Hypothesis Completion

Training Pipeline

Phase 1: Supervised Fine-Tuning

Phase 2: Reinforcement Learning

What the Model Learned

Before Training

After Training

Task Performance Hierarchy

Limitations and Next Steps

Current Limitations

Future Directions

The Bigger Picture

Topics

The Data: 100+ Landmark Papers Across Fields

Physics & Cosmology

Biology & Neuroscience

Psychology & Cognitive Science

Methodology & Philosophy of Science

From Knowledge Graph to Training Data

Task 1: Argument Evaluation

Task 2: Claim-Evidence Grounding

Task 3: Warrant Assessment

Task 4: Contradiction Resolution

Task 5: Inference Chains

Task 6: Hypothesis Completion

Training Pipeline

Phase 1: Supervised Fine-Tuning

Phase 2: Reinforcement Learning

What the Model Learned

Before Training

After Training

Task Performance Hierarchy

Limitations and Next Steps

Current Limitations

Future Directions

The Bigger Picture

Topics