Scaling Laws, Explained Simply

Mar 24, 2025

Scaling laws in AI represent one of the most important discoveries with direct implications for technology businesses. It captures a mathematical relationship showing how AI performance improves as we increase compute, parameters (the numerical weights inside a neural network that constitute its 'knowledge'), and data (measured in tokens, the basic text units that language models process, roughly 1.3 per English word).

Remarkable finding is that this relationship follows a power law. Performance scales approximately as compute^0.3. Double your compute, you get about 30% performance gain. Multiply by 10, you get about 2x the performance. Multiply by 100, about 4x.

This relationship holds with extraordinary consistency across model architectures, domains, and tasks. It sounds academic but isn't. We can actually predict when AI will reach specific capability thresholds.

Unlike most technological progress, which is unpredictable, scaling laws transform AI from a research gamble into a straightforward question of resources.

Most strategic insight here is understanding how these continuous improvements create discontinuous business value. Performance improvements of 5% might seem incremental, but they can suddenly cross thresholds where new applications become viable. From 80% to 85% accuracy has minimal business impact. From 85% to 90%, new use cases emerge. From 90% to 95%, you get competitive advantage. From 95% to 99%, industries have shifted.

Tech companies that understand this track these thresholds obsessively. They know which business functions will become automatable when over the next 3 to 5 years.

Original scaling laws suggested increasing model size more aggressively than training data. But there was a revised finding, DeepMind's Chinchilla research revealed a critical refinement. For optimal performance, model size and training data should be scaled in equal proportion. Think of it as a revised scaling law (version 1.1).

Chinchilla findings showed that many models are significantly undertrained for their size. For every doubling of parameters, you should double your training tokens. A 70 billion parameter (the numerical weights inside a neural network that constitute its 'knowledge') model trained on 4x more data outperforms a 280 billion model using the same compute budget.

This revision has big implications for critical choices that companies building AI systems must make, making powerful models more accessible and shifting focus toward data quality and volume rather than just raw parameter count.

In 2021, code assistants were barely useful. They could suggest function names and complete simple syntax. Most CTOs viewed them as fancy autocomplete - nice to have, but not transformative. But a handful of CTOs who understood scaling laws saw something different. They plotted model performance and calculated that with just 4x more compute, these models would cross a critical threshold: the ability to write complete functions with proper error handling and edge cases.The threshold wasn't arbitrary. It came from applying the scaling law to specific metrics: correct function completion rate above 90%, error handling coverage for edge cases, proper implementation of security best practices.While most companies waited to see evidence, these CTOs started preparing. They built integration pipelines, trained engineers on prompt design, restructured workflows—all before the capability existed. When models crossed this threshold, they deployed immediately while competitors were still running evaluations.

Consider fraud detection. In 2022, AI systems could identify obvious patterns but generated too many false positives. Most fintech executives saw limited improvements on the horizon. But those who understood scaling laws calculated that with approximately 8x more compute, these models would cross a precise threshold where false positive rates would drop below 2% while maintaining 98%+ fraud capture. This specific threshold matters because it's where the economics of fraud detection fundamentally change. System becomes reliable enough to automate decisions rather than requiring human review. These companies prepared by building data infrastructure for the specific inputs these models would need: transaction velocity patterns across merchant categories, device fingerprinting signals with temporal analysis, network relationship mapping between accounts. When models crossed this capability threshold, they deployed immediately. Their fraud losses dropped by millions while competitors were still debating whether larger models would help their risk operations.

Tracking these thresholds shifts the job from technology speculation to planning. When certain capabilities will emerge at predictable times, you can make unusually precise decisions about when to build in-house vs. use APIs, when to start integration planning for capabilities that don't exist yet, how to time product launches to coincide with capability emergence.

Many get this wrong. They evaluate AI based on current performance rather than trajectory. The companies gaining advantage correctly identify when a capability will cross the "good enough" threshold for their specific use case and build infrastructure before competitors.

What's strange about this is that unlike most technologies, which get cheaper as they improve, frontier AI capabilities are getting more expensive as they improve. This fundamentally changes how we should think about competitive advantage and resource allocation.

The gap between "research prototype" and "business necessity" is also collapsing from years to months. By the time a capability is widely discussed, companies that understood these scaling laws have already built and deployed solutions.

While tech giants pour billions into compute, startups can use scaling laws as a strategic advantage. They can identify specific capability thresholds that unlock business value in their domain, then build infrastructure to deploy immediately when models cross these thresholds. They can focus on being first to build applications on frontier APIs when they cross domain-specific thresholds. They can apply smaller fine-tuned models to narrow domains where they can cross capability thresholds with significantly less compute. They can build proprietary datasets in underserved domains. They can create systems to continuously benchmark open-source models against capability thresholds to identify deployment opportunities before competitors.

Startups win not by outspending incumbents on compute, but by more precisely identifying which capability thresholds matter for specific use cases and moving faster when those thresholds are crossed.

Scaling laws do have limitations worth understanding. Emergent capabilities sometimes appear suddenly and don't follow the smooth curve. The exact scaling exponents vary across tasks, making predictions less reliable for novel applications. Some capabilities may hit fundamental limits that additional scale cannot overcome. Novel architectures occasionally reset scaling trajectories entirely.

Thoughtful organizations track both the scaling law trajectory and potential discontinuities that might accelerate or decelerate progress.

What makes scaling laws powerful is understanding exactly what threshold matters for your specific application, then calculating precisely when models will cross that threshold.

For lending algorithms, the critical threshold might be the ability to evaluate creditworthiness for thin file applicants with 85%+ accuracy, which scaling laws might tell you requires 5x more compute than current models.

For code generation, it might be the ability to implement complex algorithms with 90%+ correctness, which might require 3x more compute than current models.

Best companies don't just wait to see what happens, they calculate when capabilities will emerge and prepare accordingly. They build integration capacity before capabilities exist, allowing immediate deployment when models cross these mathematically predictable thresholds.

A capabilities timeline that maps performance thresholds to business applications drives everything from hiring to product roadmaps.

There's something elegant about these scaling laws. In a domain filled with hype and uncertainty, they provide clarity that feels almost unfair. The executives who understand them aren't just making better technology decisions—they're operating with a map of the future that most of their competitors don't even know exists.

mtrajan blog

Discussion about this post