AI Insights
Trends, patterns, and analysis derived from 207 models, 46 papers, and 24 hardware milestones.
The Cambrian Explosion
With 207 models now tracked, the AI landscape went from a handful of research projects to an industry producing 79 new models per year. 2023 was the inflection point — the year ChatGPT's success triggered an industry-wide arms race. Every major tech company, from Apple to Amazon, scrambled to release their own models. 63 organizations across 10+ countries are now competing.
The Open Source Revolution
In 2021, every major model was closed and proprietary. By 2024, open-weightopen-weightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction. and open-source models made up the majority of releases. Meta's LLaMA leak in 2023 was the spark — once researchers could study and fine-tune frontier-class models, the community produced an explosion of derivatives. This democratization may be the most consequential trend in AI history.
The Deepest Family Trees
OpenAI's GPT family is the deepest evolutionary tree in AI, with 14 generations from GPT-1 to GPT-5.5. This isn't just version numbering — each generation represents genuine architectural or training breakthroughs. Anthropic's Claude lineage, while shorter at 10 generations, shows the fastest iteration pace.
The MoE Takeover
Mixture-of-Expertsmixture-of-expertsArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute. started as a 2017 paper and was mostly ignored. Mixtral 8x7B's viral success in late 2023 proved MoEMoEMixture of Experts — architecture where only a fraction of parameters activate per input, enabling massive scale at lower compute cost. could deliver GPT-4-class quality at a fraction of the inferenceInferenceUsing a trained model to generate predictions or outputs (as opposed to training it). cost. Within 12 months, MoEMoEMixture of Experts — architecture where only a fraction of parameters activate per input, enabling massive scale at lower compute cost. became the default architecture for any model over 100B parameters — adopted by DeepSeek V2/V3, Grok, DBRX, Arctic, and Qwen.
The Context Window Explosion
In 2018, GPT-1 had a 512-token context windowContext windowThe maximum number of tokens a model can process in a single input. Ranges from 2K to 10M+.. By 2025, Gemini offered 10 million tokens — enough to process entire codebases or dozens of novels at once. This wasn't just incremental improvement; it required fundamental innovations like Flash AttentionFlash AttentionAn IO-aware exact attention algorithm that's 2-4× faster by minimizing GPU memory reads/writes. and rotary position embeddings. The practical impact: AI went from answering single questions to analyzing entire projects.
63 Companies, One Race
What started as a two-horse race between OpenAI and Google has become a global competition spanning 63 organizations. 15 models come from Chinese labs — DeepSeek alone has 7 entries, while Alibaba's Qwen and Zhipu AI's GLM families are rapidly expanding. Community contributors and startups (Nous Research, Eric Hartford) punch far above their weight through fine-tuningFine-tuningAdapting a pre-trained model to a specific task or domain by training on additional data. and abliterationabliterationRemoving safety guardrails from a model through targeted fine-tuning or weight manipulation. Controversial but popular in open-source community. techniques.
Innovation Pipeline: Paper to Product
The pipeline from research paper to production model has dramatically accelerated. Early innovations like RLHFrlhfReinforcement Learning from Human Feedback — training models to align with human preferences by having humans rank outputs. took 5+ years to go mainstream. Now, architectures like Mamba go from paper to production in months. This compression is both exciting (faster progress) and concerning (less time for safety evaluation).
The Modality Matrix
Early AI was text-only. Now, over half of new models handle multiple modalities — images, audio, video, and code. The trend is unmistakable: the future of AI is models that can see, hear, speak, code, and reason simultaneously. The arrival of models like GPT-4o (text+image+audio) and Gemini 2.0 (text+image+video) marks the beginning of truly general-purpose AI.
The Reasoning Revolution
ReasoningreasoningStructured step-by-step problem solving, often using chain-of-thought or tree-of-thought approaches. has exploded from a niche capability (Chain-of-Thoughtchain-of-thoughtPrompting technique where the model 'thinks out loud' step by step before giving a final answer. prompting in 2022) to the most sought-after feature in AI. OpenAI's o1 proved that 'thinking longer' (test-time compute) could dramatically improve performance on hard problems. Now every major lab — Anthropic, Google, DeepSeek — is racing to build models that don't just pattern-match but actually reason step-by-step.
The China Factor
China has emerged as the world's second AI superpower. Companies like DeepSeek proved that innovative architecture (MLAMLAMulti-head Latent Attention — DeepSeek's innovation that compresses key-value caches into a low-rank latent space., multi-head latent attention) can compete with brute-force scaling. Moonshot AI's Kimi K2 (1 trillion parameters, open-weightopen-weightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction.) and MiniMax-01 (4 million token context) show that Chinese labs are no longer following — they're leading on specific frontiers. The US-China AI race is now the defining dynamic of the industry, with implications for regulation, export controls, and the future of open research.
The Efficiency Revolution
The AI industry hit a wall: training ever-larger models became prohibitively expensive. The response was an efficiency revolution. DistilBERT showed you could compress BERT to 60% of its size while keeping 97% of its capability. ALBERT proved parameter sharing could slash model size 18×. Mixture-of-Expertsmixture-of-expertsArchitecture where only a fraction of the model's parameters are active for each input, allowing massive scale with lower compute. architectures (Mixtral, DeepSeek-V2) activate only a fraction of parameters per query. LoRALoRALow-Rank Adaptation — an efficient fine-tuning technique that adds small trainable matrices to frozen model weights. made fine-tuningFine-tuningAdapting a pre-trained model to a specific task or domain by training on additional data. accessible on consumer GPUs. The Chinchilla paper proved most models were undertrained relative to their size. The new mantra: smaller, smarter, cheaper.
The Safety Imperative
Safety went from an academic afterthought to an industry imperative. The RLHFrlhfReinforcement Learning from Human Feedback — training models to align with human preferences by having humans rank outputs. paper (2017) took 5 years to become standard practice. Constitutional AI gave Anthropic a principled framework for self-improvement. But the real shift came when Meta released Llama Guard — a dedicated safety classifier that any developer could use. Google followed with ShieldGemma. Meanwhile, the open-source community pushed back with 'abliterationabliterationRemoving safety guardrails from a model through targeted fine-tuning or weight manipulation. Controversial but popular in open-source community.' techniques, raising fundamental questions: who decides what's safe, and should guardrails be removable?
From Text to Everything
AI has fragmented from one thing (text prediction) into a dozen specialized disciplines. Embedding models (text-embedding-3, BGE) power every search engine and RAGRAGRetrieval-Augmented Generation — combining a language model with a search/retrieval system to ground responses in external knowledge. pipeline. Safety models (Llama Guard, ShieldGemma) act as AI immune systems. Robotics models (RT-2, PaLM-E) bridge language and physical action. Music generators (Suno, Udio), speech synthesizers (VALL-E, ElevenLabs), and coding agents (Devin, Cursor, SWE-Agent) each represent billion-dollar verticals. The 'foundation model' era is giving way to an era of specialized, deeply integrated AI products.
What's Next: Emerging Patterns
Based on the trajectories we've tracked, here are the patterns most likely to define AI's next chapter.
26 models already have agenticagenticModels that can autonomously plan, execute multi-step tasks, use tools, and self-correct without human intervention. capabilities — expect this to become the default interaction model.
Parameter growth is plateauing; efficiency (MoEMoEMixture of Experts — architecture where only a fraction of parameters activate per input, enabling massive scale at lower compute cost., Mamba, MLAMLAMulti-head Latent Attention — DeepSeek's innovation that compresses key-value caches into a low-rank latent space.) is the new frontier. Smaller, smarter models are winning.
Open-source share peaked at 62% but dropped to 35% in 2026 as frontier labs restrict access to their most powerful models.
The 24 hardware milestones show compute doubling every ~18 months, but model demands are growing faster.
Coding tools, search engines, music generators — AI is fragmenting into specialized, deeply integrated products that do one thing exceptionally well.
The US-China AI race will intensify — Chinese open-weightopen-weightModel weights are publicly released but training data/code may not be. Enables fine-tuning but not full reproduction. models (DeepSeek, Kimi K2) are already matching Western closed models on key benchmarks.
Every industry vertical will have its own foundation model — legal AI, medical AI, financial AI — each trained on domain-specific data at scale.