A Complete History of Solana Outages: Causes, Fixes, and Lessons Learnt

The sudden, insistent beeping jolts Steven awake. In the dim glow of his phone, alerts flood in—one after another. His node is down. Then the realization hits: the entire Solana cluster is offline. Across time zones, hundreds of validator operators are experiencing the same moment of dread. It’s happened again—an outage.

Solana, like all high-performance distributed systems, isn’t immune to failure. Despite rigorous testing and robust architecture, real-world conditions expose edge cases no simulation can fully predict. Over the past five years, the network has weathered multiple disruptions—each offering hard-earned lessons in resilience, security, and decentralization.

This article explores Solana’s outage history in detail, from root causes and fixes to the broader implications for blockchain stability. We’ll examine how the network prioritizes safety over liveness, how restarts are coordinated, and how critical bugs are responsibly disclosed. You’ll walk away with a deeper understanding of what makes Solana tick—and what happens when it doesn’t.

👉 Discover how leading blockchain platforms maintain network integrity during high-stress events.

Understanding Liveness and Safety in Distributed Systems

At the heart of every blockchain lies a fundamental trade-off defined by the CAP theorem: Consistency, Availability, and Partition Tolerance. Blockchains must tolerate network partitions, so the real choice is between availability (keeping the chain moving) and consistency (ensuring data accuracy).

Solana chooses consistency over availability, making it a CP system. When under extreme stress or facing consensus failure, Solana halts rather than risk state corruption. This means users’ funds remain secure—even if transactions temporarily stop confirming.

Liveness Failure: The network stalls. No new blocks are finalized. Transactions queue up but don’t process.
Safety Failure: The ledger’s history is altered—double-spends occur or forks persist incorrectly.

Solana’s design philosophy prioritizes safety. An outage is inconvenient; a corrupted ledger is catastrophic. This deliberate trade-off explains why the network may freeze during critical bugs or spam attacks—but also why user assets are never at risk.

How Solana Restarts After an Outage

When consensus breaks down, restarting Solana requires careful coordination. Validators don’t rely on on-chain signals to resume operations. Instead, they use off-chain communication—primarily in the #mb-validators channel on Solana Tech Discord—to agree on a safe rollback point.

Once consensus is reached on the last optimistically confirmed block slot:

Validators generate a new local snapshot using the ledger tool.
They reboot their nodes from this trusted state.
The network waits until at least 80% of total stake is back online before resuming block production.

This threshold ensures sufficient redundancy to prevent immediate re-forking. Automated alert systems help validators respond within minutes, minimizing downtime.

👉 Learn how modern blockchain infrastructures handle real-time network recovery.

Bug Bounties and Responsible Disclosure

Preventing outages starts long before they happen. Solana’s proactive security strategy includes a robust bug bounty program managed through official channels like the Agave GitHub repository.

Researchers who discover vulnerabilities in the core client can earn rewards based on severity:

Loss of Funds: Up to 25,000 SOL
Consensus or Safety Violations: Up to 12,500 SOL
Liveness or Availability Loss: Up to 5,000 SOL

Additionally, the upcoming FireDancer client runs a separate bounty via Immunefi, offering up to $500,000 USDC for critical findings.

These programs incentivize ethical hackers to report flaws before malicious actors exploit them—turning potential disasters into preventable fixes.

Chronological Breakdown of Solana Outages

Since Mainnet Beta launched in March 2020, Solana has faced seven major incidents. Below is a detailed analysis of each.

December 2020: Turbine Block Propagation Bug

Downtime: ~6 hours
Cause: Validator sent conflicting blocks for the same slot
Fix: Track blocks by hash instead of slot number

The issue stemmed from how nodes identified blocks—using only slot numbers led to misidentification during splits. After the fix, blocks are tracked by hash, enabling proper fork resolution.

September 2021: Grape Protocol IDO Spam Attack

Downtime: 17 hours
Cause: Bot-driven DDoS during an NFT launch
Fixes: Vote prioritization, RPC retry tuning, write-lock ignore

At peak load, some validators processed over 300,000 TPS. Bots locked key accounts, forcing sequential processing. The solution included prioritizing vote transactions and rate-limiting forwarded messages.

January 2022: High Congestion from Duplicate Transactions

Downtime: None (degraded performance)
Cause: Spam via repeated transactions
Fix: SigVerify deduplication improvements

Transaction success rates dropped by 70%. Upgrades in v1.8.12 and v1.8.14 improved cache handling and signature verification efficiency.

April–May 2022: Candy Machine NFT Mint Spam

Downtime: 8 hours
Cause: Bot attacks on Metaplex Candy Machine
Fixes: Bot tax (0.01 SOL), memory optimizations

Traffic spiked to 6 million requests per second per node. A hard-coded bot tax drained snipers’ funds—over 426 SOL lost in days—effectively ending spam.

Long-term solutions introduced:

QUIC protocol for flow-controlled transaction ingestion
Stake-Weighted Quality of Service (SWQoS) to limit Sybil attacks
Priority fees allowing users to pay for faster execution

June 2022: Durable Nonce Double Execution Bug

Downtime: 4.5 hours
Cause: Nonce transactions processed twice
Fix: Domain separation in v1.10.23

A runtime flaw allowed duplicate processing if recent blockhashes were misused as nonces. The fix enforced type safety and separated nonce/blockhash domains.

September 2022: Duplicate Block Fork Choice Bug

Downtime: 8.5 hours
Cause: Validator ran primary and spare nodes simultaneously
Fix: Patched fork selection logic

Validators got stuck because they couldn’t revert to the heaviest fork when slots matched their last vote. The patch corrected this edge case in fork resolution.

February 2023: Oversized Block Flooded Turbine

Downtime: ~19 hours
Cause: Malfunctioning shred-forwarder sent 150,000+ shreds
Fix: Enhanced deduplication filters in v1.13.7/v1.14.17

The massive block overwhelmed deduplication logic, forcing reliance on slower repair protocols. Updates improved filtering and added safeguards against oversized blocks.

February 2024: Infinite JIT Recompile Loop

Downtime: ~5 hours
Cause: Legacy loader bug in Agave v1.17
Fix: Disabled legacy loader post-restart

A JIT caching flaw caused validators to endlessly recompile certain programs. Over 95% of stake was affected. A coordinated patch disabled the vulnerable loader.

August 2024: Coordinated Vulnerability Patch (No Downtime)

Downtime: None
Cause: ELF .text section alignment vulnerability
Fix: Silent patch deployment

An external researcher found a flaw allowing attackers to crash validators via malformed programs. To avoid exploitation, a stealth patch was distributed privately to validators before public disclosure.

Despite concerns about centralization, the operation succeeded—a supermajority upgraded within hours, neutralizing the threat without disruption.

Frequently Asked Questions (FAQ)

Q: Why does Solana prioritize safety over liveness?
A: To prevent ledger corruption or double-spends. Halting the network is safer than allowing inconsistent state changes.

Q: How are Solana outages resolved so quickly?
A: Through real-time coordination among validators in public Discord channels and automated alert systems that trigger immediate response.

Q: What role do priority fees play in preventing spam?
A: They let users bid for transaction inclusion, creating a market-based mechanism that discourages low-value spam.

Q: Was the August 2024 patch really “centralized”?
A: While coordination involved private messaging, participation was voluntary. Over 1,500 independent node operators chose to upgrade—demonstrating decentralized cooperation.

Q: Can future outages be prevented entirely?
A: No system is perfect. But with client diversity (e.g., FireDancer), better congestion controls, and proactive security practices, outages are expected to become rarer and shorter.

Q: Are user funds ever at risk during an outage?
A: No. Even during prolonged halts, account balances remain secure due to Solana’s commitment to safety-first consensus.

👉 See how next-generation blockchain clients are redefining network reliability.

Conclusion

Solana has come a long way since its early instability. In over a year since the last major outage, it has demonstrated growing maturity and resilience. Improvements like QUIC, SWQoS, priority fees, and silent patch coordination reflect a network learning from its past.

While some predict outages will continue—as Mert Mumtaz of Helius has noted—the trend is clear: failures are becoming less frequent and less severe. With continued innovation and community vigilance, Solana is moving closer to its goal of robust, scalable decentralization.

The journey isn’t over—but the progress speaks for itself.

Core Keywords:

Solana outages, blockchain liveness and safety, distributed system failures, Solana network restart, priority fees Solana, bug bounty programs blockchain, QUIC Solana, stake-weighted quality of service