Fine-Tuning and the New AI Middle Class

VZ editorial frame

Read this piece through one operating lens: AI does not automate first, it amplifies first. If the underlying decision architecture is clear, AI scales clarity. If it is noisy, AI scales noise and cost.

VZ Lens

Through a VZ lens, this is not content for trend consumption - it is a decision signal. Fine-tuning lowers competitive distance, but increases adaptation pressure. The winners are teams that iterate faster on domain fit, not model size. The real leverage appears when the insight is translated into explicit operating choices.

TL;DR

The democratization of fine-tuning technology has created an AI middle class: developers who fine-tune open-source foundation models (e.g., Llama, Mistral) with their own data to build domain-specific capabilities without having to develop their own foundation models. This layer is wedged between frontier labs and API consumers, and its speed of adaptation (days vs. months) creates a competitive advantage, as exemplified by Harvey AI (legal domain) or Bloomberg GPT (financial).

One of the most useful metaphors for the structure of the AI market is the classic economic model: production power is determined by the concentration of capital. The wealthiest get the best tools.

In 2021–2022, the AI market looked like this: OpenAI, Google, Meta, and Anthropic train the foundation models, and everyone else calls them via API.

Today, this structure is changing. With the fine-tuning revolution, a new layer has emerged—the AI middle class.

This is the layer that doesn’t develop foundation models—but doesn’t simply consume APIs either. It fine-tunes open foundation models on its own data, optimizes performance for its own tasks, and thereby builds domain-specific AI capabilities that become a competitive advantage.

What is the AI middle class?

The model of layers

In the AI market, it is worth distinguishing three layers of players:

The frontier layer (foundation model developers): OpenAI, Anthropic, Google DeepMind, Meta AI, Mistral. With billions of dollars in investment, massive compute capacity, and dedicated ML research teams, they train the foundation models. The barrier to entry is extremely high.

The consumer tier (API users): individual developers, small teams, API-based applications. They use general-purpose frontier models with minor customization. Low barrier to entry, but low differentiation power.

The middle class (fine-tuning-based developers): organizations that build on open foundation models (Llama, Mistral, Qwen) but fine-tune them using their own data and for their own use cases. The barrier to entry is moderate, and the differentiation power is high.

The democratization of fine-tuning—made possible by LoRA and QLoRA—created this middle layer.

Why has the middle class become viable?

The LoRA revolution (which we analyzed in detail in a previous article) radically reduced the compute requirements for fine-tuning. What previously required large-scale lab infrastructure can now be done on a single GPU, in a matter of days, with a dataset of thousands to tens of thousands of examples.

But LoRA only lowered the infrastructure barrier. To fully democratize the fine-tuning ecosystem, a whole chain of tools was needed:

Fine-tuning platforms. Together AI, Replicate, Hugging Face AutoTrain — these offer no-code/low-code interfaces for fine-tuning. No ML engineer required; you upload data, set parameters, and the platform runs it.

Fine-tuning frameworks. Axolotl, Unsloth, LLaMA-Factory — open-source fine-tuning libraries that simplify the implementation of LoRA and QLoRA and can even accelerate training by 2–5×.

Inference platforms. vLLM, Ollama, llama.cpp — which simplify the integration of fine-tuned models into production.

Evaluation frameworks. HELM, LangSmith, Braintrust, Weights & Biases — which structure the measurement of fine-tuned model performance.

Together, these tools enable a small team with the right domain knowledge and a disciplined process to build a competitive domain-specific AI system—without having to develop a foundation model.

Why is this important now?

The Emergence of the Domain-Specific AI Market

Parallel to the emergence of the AI middle class, a new market has emerged: domain-specific AI.

Harvey AI (legal AI): A Llama-based model fine-tuned for legal documents. Harvey was released in 2023 and has been adopted by major law firms—because it outperformed general-purpose frontier models in domain-specific legal document processing.

Bloomberg GPT: Bloomberg’s LLM trained on its own financial data, specifically for financial analysis. 50 billion parameters—smaller than GPT-3—but performs better in financial text processing.

Med-PaLM / MedPaLM 2 (Google): a model tuned for medical question-answering. On the USMLE (US Medical Licensing Exam) test, MedPaLM 2 achieved performance on par with expert physicians.

Starcoder and DeepSeek-Coder: Models specialized in code generation that outperform general models on programming tasks.

These all represent the logic of the AI middle class: an open base model (or a carefully curated closed base) + domain-specific data + fine-tuning = domain-specific capability that outperforms the general frontier model in a given dimension.

Adaptation speed as a competitive advantage

The key competitive advantage of the AI middle class is not raw performance—frontier models are stronger on general tasks. The competitive advantage is adaptation speed.

A small team, with the right fine-tuning pipeline, can respond within days to a new dataset, a new task type, or a change in business rules.

Frontier labs spend months developing a single model version. Fine-tuning-based middle-class teams spend days.

This speed advantage is cumulative: each iteration brings the system closer to the domain-specific optimum. After a few months of daily iterations, the domain-specific AI capabilities of a middle-class organization far surpass what a general-purpose frontier model could ever offer.

The FTaaS (Fine-Tuning as a Service) Market

The emergence of the AI middle class has also given rise to a distinct business segment: Fine-Tuning as a Service.

These platforms—Together AI Fine-tuning, Replicate, Hugging Face Inference Endpoints, Anyscale, OctoAI—offer not just fine-tuning, but the entire pipeline: data loading, training, evaluation, deployment, and monitoring.

Competition in the FTaaS market centers on infrastructure costs, the breadth of model families used, and automated evaluation. By 2024–2025, this segment had become a standalone industry category.

Where did public discourse go wrong?

“RAG Instead of Fine-Tuning”

One of the most widespread misconceptions is that fine-tuning and RAG (Retrieval-Augmented Generation) are alternatives, and that RAG is generally the better solution—because it can handle fresher data and is cheaper than fine-tuning.

This binary choice is false. Fine-tuning and RAG are complementary techniques:

RAG is suitable for: incorporating up-to-date, document-level context; when data changes dynamically; when the database is large and cannot fit into the training data.

Fine-tuning is suitable for: teaching behavioral patterns; internalizing the model’s tone, style, and domain-specific vocabulary; and improving reliability on specific task types.

The best results often come from the combination of RAG and fine-tuning: the fine-tuned model knows how to process the documents retrieved by RAG—and the domain-specific processing style does not need to be reproduced sentence by sentence.

“Fine-tuning destroys general capabilities”

Another minor concern: fine-tuning builds a domain-specific capability but degrades general capabilities through catastrophic forgetting.

This was indeed a problem with earlier fine-tuning methods. LoRA-based fine-tuning minimizes this because it does not modify the base model’s weights—it only tunes the small adaptation matrices. General capabilities are preserved; the LoRA adapter is just the domain-specific layer.

If we perform fine-tuning on the entire model, the risk of catastrophic forgetting does indeed exist—in such cases, EWC (Elastic Weight Consolidation) or Rehearsal techniques can help. But with LoRA-based fine-tuning, this problem is largely solved.

What deeper pattern is emerging?

The Logic of Competition Democratization

The emergence of the AI middle class is an example of a broader logic of technological democratization. It is a recurring pattern in computing: every technological layer that was once dominant gradually becomes a platform—and a new layer competes on top of that platform.

This is how the internet became a platform for web applications. This is how cloud IaaS became a platform for SaaS. Foundation models are now becoming a platform for fine-tuning-based applications.

The AI middle class is the layer above the platform: those who use open models as a platform and differentiate themselves through fine-tuning.

Learning and Adaptation as Competencies

In the AI middle class, the source of sustainable competitive advantage is not a single fine-tuned model. Rather, it is organizational competence in rapid, disciplined iteration.

This includes:

Data curation competence (what data, in what format, with what quality filtering)
Fine-tuning infrastructure (pipeline, compute, deployment)
Evaluation infrastructure (golden sets, metrics, regression monitoring)
Iteration discipline (what was the hypothesis, what did we measure, what did we learn)

This combination of competencies is what is difficult to replicate—because it involves organizational knowledge and processes, not just technology.

The adaptation race and the slowness of big labs

A paradox: frontier labs are simultaneously the strongest and the slowest.

They develop the best base models—that is indisputable. But when it comes to the speed of relevant, domain-specific adaptation, small fine-tuning teams beat the big labs.

This speed paradox is a structural feature of the AI race: platform developers (frontier labs) and application developers (fine-tuning teams) compete on different dimensions. Platform competition is about capacity—application competition is about the speed of adaptation.

What are the strategic implications of this?

Conditions for AI’s entry into the mainstream

1. Open access to base models. The Llama, Mistral, Qwen, and Phi series—all are openly available and can be used for commercial purposes.

2. Domain data assets. The value of fine-tuning depends on data quality. What internal data assets are available? CRM data, internal documentation, process logs?

3. Fine-tuning pipeline. The structure of the training pipeline: data loading, LoRA configuration, training, evaluation. Axolotl, Unsloth, LLaMA-Factory — these are sufficient to get started.

4. Evaluation infrastructure. Essential: golden sets, automated metrics, regression monitoring. The value of fine-tuning is demonstrated by evaluation.

5. Iteration capacity. Fine-tuning is not a one-time investment—it is a continuous iteration. This is what organizes the necessary capacity.

Where and how does the middle class fill the gap?

Fine-tuning’s middle class performs best where:

The task is well-defined and repetitive
Domain-specific vocabulary, style, and rules are characteristic
The internal data asset is large and high-quality
Precision and reliability are more important than creativity
The inference volume is large, so inference cost is a determining factor

Typical sectors: law, healthcare, finance, e-commerce, manufacturing, internal corporate processes.

What should we be watching now?

Continual fine-tuning as a paradigm

The next evolutionary step for the middle ground of fine-tuning: from static fine-tuning to continual fine-tuning. We don’t tune just once, but continuously update the model using production data.

This is made possible by the low cost of LoRA and the synthetic flywheel logic: production errors → training data → rapid LoRA fine-tuning → deployment → back to the beginning. This is the self-learning domain AI.

Consolidation of the FTaaS market

The FTaaS market is currently fragmented: many small players, different APIs, different pricing models. Consolidation is expected in the next 12–24 months—a few platforms will rise to a dominant position. The winners will likely be those who most effectively integrate the evaluation and monitoring layers.

Conclusion

The AI middle class is not a competitor to frontier labs—not in the development of foundational models.

But in the realm of business-relevant, domain-specific AI capabilities, fine-tuning middle-class organizations are entering serious competition—in their own markets, with their own data, and at their own pace of adaptation.

You don’t need to own a foundation model. It’s enough to learn quickly and systematically from your own data.

This is the most enduring chapter in the democratization of AI—not that everyone can use the frontier model, but that everyone can develop their own domain-specific AI.

Key Takeaways

Fine-tuning has created a new, middle tier of AI — This “middle class” does not develop base models, nor does it merely consume APIs; it builds specialized models fine-tuned with domain-specific data that outperform general-purpose frontier models on specific tasks.
LoRA and QLoRA technologies have democratized fine-tuning — These methods have radically reduced computational requirements, making it possible to perform fine-tuning on a single GPU in a matter of days, even with datasets containing tens of thousands of examples.
The driving force behind the emergence of the domain-specific AI market — The middle-ground approach (open-source base model + domain-specific data + fine-tuning) has led to the emergence of specialized models such as Harvey AI for law or Bloomberg GPT for finance, which outperform even the best general-purpose models within their respective domains.
The main competitive advantage is speed of adaptation, not raw performance — While frontier labs lose months developing new model versions, the fine-tuning-based middle class responds to new data or business rules within days, building cumulative domain knowledge.
Fine-tuning and RAG are not alternatives, but complement each other — RAG provides dynamic, document-level context, while fine-tuning internalizes the model’s behavioral patterns, style, and domain vocabulary; their combination yields the best results.
Catastrophic forgetting is no longer a critical problem — Modern, LoRA-based fine-tuning methods minimize the loss of the base model’s general capabilities, as they do not overwrite the original weights but only add small adapter weights.

Strategic Synthesis

Translate the core idea of “Fine-Tuning and the New AI Middle Class” into one concrete operating decision for the next 30 days.
Define the trust and quality signals you will monitor weekly to validate progress.
Run a short feedback loop: measure, refine, and re-prioritize based on real outcomes.

Next step

If you want your brand to be represented with context quality and citation strength in AI systems, start with a practical baseline and a priority sequence.

Start with AI Scorecard Browse Hungarian originals