AI & MACHINE LEARNING
One in three euros invested in Europe now goes to AI. The gap between real AI and dressed-up software has never been harder to close from the outside.
73% of AI and big data systems fail basic build quality benchmarks. Founders have learned to pitch AI fluently. The difference between genuine proprietary capability and a well-wrapped API call requires operator-level technical assessment — not a generic code review.
Why it’s different
AI investing has its own category of risk. Most DD frameworks weren't built for it.
European AI companies raised €23.5 billion in 2025 — more than 30% of all European VC. The pace of investment has run well ahead of the pace of DD sophistication. Traditional software due diligence is necessary but not sufficient. The questions that matter most in an AI investment are different: Is the model actually proprietary? Are the infrastructure costs sustainable? Is the training data legally clean? Is the performance claim reproducible?
01
AI means something different in every pitch — and the difference is the investment thesis
An AI company might have a ten-year proprietary dataset with fine-tuned domain models and a clear inference cost advantage. Or it might have a GPT wrapper with a well-designed UI and a clever prompt. Both get called AI. One is a durable moat; the other is a distribution bet. The distinction requires someone who can open the technical architecture and read it — not someone who can read the pitch deck fluently. Standard software DD frameworks don't reach far enough into the model layer to make this call.
02
Infrastructure costs are a margin risk that most financial models don't capture accurately
GPU compute costs are a primary cost driver for AI companies — startups in production regularly spend $10,000–$50,000 per month on inference infrastructure, and the cost scales non-linearly with usage growth. Token-based LLM pricing can escalate sharply as enterprise customers increase usage, and companies without active cost optimisation strategies find that gross margin deteriorates as revenue grows. We assess the unit economics of the AI system — cost per inference, cost per output, cost per customer — not just the headline revenue.
03
Training data is becoming a legal liability, not just an asset
AI models trained on improperly sourced data carry litigation risk that is increasingly material. Copyright exposure in training data, unclear licensing chains, and scraping of proprietary databases are real and growing legal risks that affect both model ownership and company valuation. We review training data provenance as a standard part of AI DD.
Assessment Areas
Where we focus in AI & Machine Learning engagements.
AI in AI & Machine Learning
The sector is investing in itself. The risks compound.
AI companies face every risk that any software company faces — plus the specific risks of the AI layer itself. European AI investment reached €23.5 billion in 2025. The opportunities that justify the premium and the risks that the premium often doesn't price are the patterns we see most often in European AI DD engagements.
Opportunities we verify
Proprietary data that cannot be replicated from public sources. The most durable AI moats in 2026 are not in model architecture — foundation models are increasingly commoditised — but in proprietary data. Companies that have accumulated years of domain-specific, structured, high-quality data have something that cannot be bought or scraped. We assess whether the data is genuinely proprietary, legally clean, and whether the data strategy is central to the company's defensibility plan.
Domain-specific fine-tuning that outperforms general models. In narrow, high-stakes domains — legal document analysis, medical coding, industrial fault detection, financial modelling — fine-tuned domain models consistently outperform general-purpose LLMs. We assess whether fine-tuning is genuinely occurring, what it is trained on, and whether the performance differential is measurable and sustainable.
Inference efficiency as a competitive advantage. As LLM costs decline, the companies that win are those that deliver equivalent or better outputs at lower cost per output. Inference optimisation — distillation, quantisation, retrieval-augmented generation, smart caching — is real engineering work that creates durable cost advantages.
Risks we surface
The AI-native label applied to a workflow tool with one API call. We regularly encounter companies described as AI-native whose AI capability is a single call to a public foundation model API — no fine-tuning, no proprietary data, no model evaluation infrastructure. The product may be genuinely useful. The AI moat is not real.
Compute costs that don't scale with the growth model. Many AI investment models show revenue growing at 3x while COGS holds flat. The actual cost structure of GPU compute scales with inference volume. A company processing 10x the current usage will pay 10x the current compute bill. Margin surprises at Series B are often visible in the architecture at Series A.
EU AI Act high-risk classification creating a compliance backlog. AI systems used in consequential decisions — credit, hiring, healthcare — are high-risk under the EU AI Act with full enforcement from August 2026. Many AI founders in these verticals have not conducted a formal classification assessment. The compliance cost is an engineering project, not a legal one.
Know what you’re backing before you commit.
X-Ray delivers a full product and tech verdict on any AI or machine learning target in one business day — going into the model layer, training data, inference costs, and benchmark reproducibility.
250+ European engagements · 100% partner repeat rate