Substrate vs Cosmos — Storage Deep Dive
Both are leading Layer-1 / multi-chain frameworks, yet they made radically different choices for how they store and access state. This guide explains every difference — from tree structure to encoding format — with real performance impact data.
🌐 1. Big Picture Overview
- • State stored in Patricia Merkle Trie
- • Keys structured by pallet namespace
- • SCALE encoding (lean, no schema needed)
- • No built-in versioning (opt-in history)
- • Blake2b hashing for nodes
- • Overlay cache buffers writes in RAM
- • Database: RocksDB or ParityDB
- • State stored in IAVL (Immutable AVL) Tree
- • Keys prefixed by module name
- • Protobuf encoding (typed, schema-required)
- • Built-in versioning (every block = snapshot)
- • SHA256 hashing for nodes
- • Writes go straight to the IAVL in memory
- • Database: GoLevelDB or RocksDB
🌳 2. Tree Structures Compared
Substrate: Patricia Merkle Trie
1Patricia Merkle Trie (Substrate)2━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━3Property │ Value4──────────────────┼──────────────────────────────────5Type │ Radix trie (compressed paths)6Branching factor │ Variable (2–16 children per node)7Node types │ 1 flexible type (partial_key + children + value)8Path length │ Varies by key similarity9Ordering │ Lexicographic on the raw key10Self-balancing │ NO — structure follows key distribution11Node hash │ Blake2b-256 (or Keccak-256, configurable)12Empty slots │ Not stored (bitmask marks used children)13Merkle root │ Hash of root node → 32 bytes
Cosmos: IAVL Tree
1IAVL Tree (Cosmos SDK)2━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━3Property │ Value4──────────────────┼──────────────────────────────────5Type │ Immutable AVL (Adelson-Velsky-Landis) tree6Branching factor │ 2 (binary tree)7Node types │ 2: inner nodes + leaf nodes8Path length │ O(log₂ n) — guaranteed balanced9Ordering │ Sorted by key (BST property)10Self-balancing │ YES — rotations on every write11Node hash │ SHA25612Empty slots │ N/A (binary tree has exactly left + right)13Merkle root │ Hash of root node → 32 bytes14Versioning │ Every write creates new nodes; old nodes kept
What self-balancing means — and why it matters
- • Not self-balancing — shape follows key order
- • With structured keys (pallet prefixes), depth is naturally bounded
- • No rotation cost on writes
- • Path length depends on key similarity
- ✅ Simpler write path — no rebalancing
- ❌ Worst case: very deep if keys collide early
- • Self-balancing AVL — always O(log₂ n)
- • Every write potentially triggers 1-2 rotations
- • Each rotation = new nodes written (immutable)
- • Path length is strictly bounded and predictable
- ✅ Guaranteed O(log n) reads always
- ❌ Rotations add CPU + write amplification
🔑 3. Key Formation
Substrate — Namespaced Structured Keys
1// Every key in Substrate is built from 3 parts:23final_key =4 Twox128("PalletName") // 16 bytes — non-crypto hash of pallet name5 ++ Twox128("StorageItemName") // 16 bytes — non-crypto hash of storage item6 ++ KeyHasher(actual_key) // 16-32 bytes — depends on storage type78// Example: Balances pallet, Account storage map, for Alice:9final_key =10 Twox128("Balances") = 0x26aa394eea5630e07c48ae0c9558cef711 Twox128("Account") = 0xb99d880ec681799c0cf30e8886371da912 Blake2_128Concat(Alice_pubkey) = 0xde1e86a9a8c739864cf3cc5ec2bea59f...1314// KEY PROPERTIES:15// ✅ All Balances.Account entries share the same 32-byte prefix16// ✅ Scanning all accounts in a pallet = prefix scan (very fast)17// ✅ Cache-friendly: related data lives near each other in RocksDB18// ❌ Key is opaque if you don't know the pallet/storage structure
Cosmos SDK — Module-Prefixed Simple Keys
1// Cosmos SDK uses a KV Store abstraction per module2// Each module gets its own isolated KV namespace34// Store key for Cosmos bank module:5prefix = "bank/" // module name as ASCII string prefix6key = "balances/cosmos1abc..." // human-readable path78// In the IAVL tree, the actual stored key is:9iavl_key = prefix + key10 = "bank/balances/cosmos1abc..."1112// The IAVL tree sorts keys lexicographically (BST property)13// So all "bank/" keys are sorted and adjacent in the tree1415// KEY PROPERTIES:16// ✅ Human-readable keys — easier debugging17// ✅ Module isolation — each module's keys are contiguous18// ✅ Natural range scans (sorted BST)19// ✅ IBC proofs use full key path for cross-chain verification20// ❌ Variable-length keys complicate some optimizations21// ❌ No hashing on keys — prefix collision if not careful
- • Fixed-length 32-byte prefix per pallet+storage
- • Actual key hashed → uniform distribution
- • Excellent RocksDB prefix scan support
- • Keys are opaque (can't decode without schema)
- • Prevents any key length-based attacks
- • Variable-length human-readable keys
- • Module name as ASCII prefix string
- • IAVL BST ordering gives free range queries
- • Keys are readable/debuggable
- • Key ordering is semantically meaningful
📚 4. Versioning — Cosmos's Biggest Advantage
How Cosmos IAVL Versioning Works
1// IAVL is "immutable" — nodes are never modified in-place2// Every write creates new nodes; old nodes remain in DB34// Block 100: Alice has 100 ATOM5root_100 = NodeHash {6 left: NodeHash(Alice → 100),7 right: NodeHash(Bob → 50)8}910// Block 101: Alice sends 10 ATOM to Bob11// New nodes created for changed path; old nodes stay intact12root_101 = NodeHash {13 left: NodeHash(Alice → 90), // NEW node14 right: NodeHash(Bob → 60) // NEW node15}1617// Block 100 root_100 still exists in the DB!18// You can query:19iavl.GetVersioned(100, "balances/alice") // returns 100 ATOM20iavl.GetVersioned(101, "balances/alice") // returns 90 ATOM2122// This is how Cosmos light clients work:23// A validator can prove historical state without extra tooling24// This is CRITICAL for IBC (cross-chain message verification)
Cosmos Versioning — The Cost
How Substrate Handles History
1// Substrate does NOT version the trie by default2// State is updated in-place (old nodes overwritten or deleted)34// To get historical state, you need an ARCHIVE NODE:5// Archive nodes keep all old state in a separate "state database"6// They're much larger and slower to sync78// Substrate's approach:9// - Full node: keeps only recent state (prunable)10// - Archive node: keeps all state (like Cosmos always does)1112// Substrate's TrieBackend:13let backend = TrieBackend::new(storage, root);14// "root" is the state root of a specific block15// You can query historical state IF you have the archive16// But the trie itself doesn't store old nodes natively1718// The advantage:19// Full nodes are MUCH smaller (Substrate node: ~50-200GB vs Cosmos ~800GB+)20// The disadvantage:21// Historical queries require archive nodes or external indexers
- • No built-in versioning in the trie
- • Full nodes are compact (only recent state)
- • Historical queries need archive node or indexer
- • State pruning is simple — just delete old keys
- ✅ Much smaller storage footprint for full nodes
- ❌ Can't prove historical state without archive
- • Built-in versioning — every block is a snapshot
- • All nodes can prove ANY historical state
- • Critical for IBC cross-chain light clients
- • State grows with every block (needs pruning)
- ✅ Historical proofs without archive infrastructure
- ❌ Much larger storage requirement (800GB+)
📖 5. Read Performance
Substrate Read Path
1// Reading Balances.Account[Alice]:21. Build key: Twox128("Balances") ++ Twox128("Account") ++ Blake2_128Concat(alice)3 → fast (Twox is non-crypto, ~5ns per call)452. Look up in TrieBackend overlay (in-memory cache first)6 → if hit: return immediately (sub-microsecond)783. If miss: traverse Patricia trie from current state root9 → follow partial_key path through nodes10 → each node: RocksDB read + SCALE decode11124. Return SCALE-encoded value, decode on client side1314// Performance characteristics:15// - Structured keys → high cache hit rate for related keys16// - SCALE decode is fast (no schema lookup needed)17// - Path length: bounded by key diversity18// - Typical reads in a block: 95%+ come from overlay (RAM)
Cosmos IAVL Read Path
1// Reading bank.balances[alice]:21. Look up in current version IAVL (in-memory working set)3 → IAVL keeps the "working tree" in memory4 → if hit: O(1) return562. If not in working set: traverse IAVL from root7 → binary BST traversal: left or right at each inner node8 → each node: RocksDB read + decode9 → O(log₂ n) steps: for 1M entries = ~20 steps10113. Return value (Protobuf-encoded), decode with generated code1213// Performance characteristics:14// - O(log₂ n) guaranteed (balanced AVL)15// - In-memory IAVL for current version = very fast for hot data16// - SHA256 hash verification at each node (CPU cost)17// - Binary tree = more levels than Patricia (for same data)18// - Historical reads: fast (just use getVersioned() API)
Bars show relative capability (higher = better for that metric). Based on published benchmarks from Parity, Cosmos engineering blog, and community reports.
✍️ 6. Write Performance
Substrate — Overlay Cache Strategy
1// Substrate write flow during block execution:2//3// 1. All pallet storage writes go to OverlayChangeset (RAM)4// → Zero disk I/O during block execution5//6// 2. At end of block: compute new state root7// → Traverse changed nodes in overlay8// → Blake2b-hash each changed node9// → Propagate hashes to root10//11// 3. Commit to RocksDB in one batch write12// → RocksDB batching is extremely fast13// → One fsync call instead of one per write1415// Example: 5000 state changes in one block16// Substrate: 0 DB reads, 0 DB writes during execution17// 1 batch DB write at block end (~10-50ms)18//19// Overhead per write: ~1 Blake2b hash + node re-encode (SCALE)20// Blake2b: ~3-5ns per byte → very fast
Cosmos — Direct IAVL Write Strategy
1// Cosmos write flow during block execution:2//3// 1. Writes go to IAVL working tree (in-memory IAVL)4// → IAVL modified in memory → no disk I/O yet5// → BUT: AVL rebalancing happens in memory immediately6//7// 2. At end of block: Commit the IAVL8// → SaveVersion() called on the IAVL9// → New nodes written to RocksDB (old nodes kept!)10// → Every changed path = new nodes from leaf to root11//12// 3. SHA256 hashed at each node up the tree1314// Example: 5000 state changes in one block15// Cosmos: 0 DB reads during execution (in-memory IAVL)16// ~5000-15000 new DB entries (write amplification from17// immutable nodes + path re-creation)18// SHA256 at each modified node = expensive CPU19//20// Additionally: IAVL rebalancing rotations on insert/delete21// (see next section for full rebalancing cost analysis)
⚖️ 7. IAVL Rebalancing — The Hidden Cost
What is an AVL Rotation?
1// AVL Rotation Example (conceptual):2//3// BEFORE inserting key "D":4// B5// / \6// A C7//8// After inserting D (tree becomes right-heavy):9// B10// / \11// A C12// \13// D ← Height diff = 2, UNBALANCED14//15// LEFT ROTATION at B:16// C ← C promoted to root17// / \18// B D19// /20// A21//22// In IAVL (immutable): B, C are NEW nodes in DB23// Old B and C remain (for historical version)24// Result: 2 writes for the old nodes + 2 writes for new = 4 writes25// for what is logically 1 insert2627// In Substrate Patricia Trie: NO rotation ever28// Insert D: just add a new node at the correct position29// 1 logical insert = ~1-2 actual node writes
Real Impact of IAVL Rebalancing
IAVL vs Patricia Write Cost at Scale
1Scenario: 1,000 state updates in one block23SUBSTRATE PATRICIA TRIE:4 Logical writes: 1,0005 Actual node writes: ~2,000–4,000 (2-4× amplification)6 Hash operations: ~2,000–4,000 (Blake2b, ~5ns each)7 Rotation events: 08 Estimated time: ~10-30ms910COSMOS IAVL (v1):11 Logical writes: 1,00012 Actual node writes: ~5,000–12,000 (5-12× amplification from13 immutability + rotations)14 Hash operations: ~5,000–12,000 (SHA256, ~25ns each)15 Rotation events: ~200–40016 Estimated time: ~50-150ms1718COSMOS IAVL (v2, 2024):19 Actual node writes: ~3,000–7,000 (improved caching)20 Estimated time: ~25-80ms (2-3× faster than v1)2122Note: Times are approximate and vary by hardware, state size,23and specific workload. Source: Cosmos engineering blog,24informal.systems benchmarks, Osmosis state analysis.
🔢 8. Encoding: SCALE vs Protobuf
1// SCALE encoding — no schema needed at runtime, very compact23// Encoding a struct AccountInfo { nonce: u32, balance: u128 }:4// nonce = 5 → [05 00 00 00] (4 bytes, little-endian)5// balance = 1000 → [e8 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00] (16 bytes)6// Total: 20 bytes — raw values concatenated78// No field names, no type tags, no length prefixes for fixed types9// Variable types use compact encoding for length1011// Advantages:12// ✅ Smallest possible output for fixed-size types13// ✅ Zero-copy decoding possible (for fixed types)14// ✅ Very fast encode/decode (just memcpy for fixed types)15// ✅ No schema file needed at runtime1617// Disadvantages:18// ❌ Not self-describing — you must know the type19// ❌ Schema evolution is manual (breaking changes = migration)20// ❌ Cross-language support is limited (Rust-centric ecosystem)
Size Comparison — Real Data
🔐 9. Proof Generation
Proof Types
- • Patricia Trie proof: set of nodes on the path
- • Compact node encoding (SCALE)
- • Typical size: 2-8KB
- • Used in Polkadot light client protocol
- • Proof verification: O(depth × hash_cost)
- ✅ Smaller than Cosmos for single key proofs
- • IAVL absence/existence proof
- • Binary tree path + sibling hashes
- • Typical size: 3-8KB
- • Used extensively in IBC message verification
- • Proof verification: O(log n × SHA256)
- ✅ Historical proofs without archive nodes
IBC — Why Cosmos Proofs Matter More
1// IBC message proof flow (simplified):23// Chain A sends a packet to Chain B:4// 1. Chain A commits the packet in its IAVL state5// 2. Relayer queries Chain A: "give me proof that packet P exists at height H"6// 3. Chain A returns: iavl.ProveExistence("ibc/packets/channel-0/1", height=H)7// → Returns: [inner_node_hashes, leaf_data, version=H]8// 4. Relayer submits proof to Chain B9// 5. Chain B's light client verifies:10// → Uses Chain A's trusted block header (root hash at height H)11// → Reconstructs hash from proof → matches root → VALID1213// Without versioned state (like Substrate without archive):14// Step 3 fails if height H is pruned from the node15// → Requires archive infrastructure for reliable IBC1617// This is why Cosmos chose immutable IAVL despite the write cost:18// The versioning is REQUIRED for IBC light client security
🗑️ 10. Pruning Strategies
- • State trie nodes not referenced by any recent block root are deleted
- • Configurable: keep last N block states (default: 256)
- • Dead nodes are identified and removed incrementally
- • No version tracking overhead — nodes exist or they don't
- ✅ Very efficient — minimal overhead
- ✅ Full node stays small (50-200GB typically)
- • Old versions of IAVL nodes must be explicitly deleted
- • "Snapshot" versions (every Nth block) kept longer
- • Pruning is slow: must identify which nodes belong to which version
- • Pruning runs can cause latency spikes ("pruning pauses")
- ❌ Complex — known source of bugs historically
- ❌ Nodes grow quickly without aggressive pruning
🌍 11. Real-World Impact
🎯 12. Final Verdict
There is no universal winner — the right choice depends on what you're building. Here's the decision framework:
- ✅ Maximum write throughput is critical
- ✅ You want small node storage footprint
- ✅ You're building in the Polkadot ecosystem
- ✅ Historical queries can use external indexers
- ✅ You want lean, fast state transitions
- ✅ Custom runtime logic (pallets) is a priority
- ✅ IBC cross-chain connectivity is a requirement
- ✅ Historical state queries must be native
- ✅ Light client security is paramount
- ✅ You need human-readable state keys
- ✅ You want Protobuf / gRPC API out of box
- ✅ Strong tooling for DeFi / exchange apps
One-Line Summary
A lean, high-throughput state machine — structured keys and overlay caching give it a write speed advantage, while compact SCALE encoding keeps node sizes small. Trade-off: no built-in history means archive nodes are needed for historical proofs.
A versioned, IBC-native archive — the immutable IAVL tree makes cross-chain light client proofs trivial and historical queries free, at the cost of write amplification, storage bloat, and complex pruning. The right choice when trustless cross-chain connectivity is the primary goal.