Amazon Web Services' annual flagship conference came with a clear message this year: the cloud infrastructure race is now, first and foremost, an AI infrastructure race. From custom silicon to serverless inference, re:Invent 2025 signalled how deeply AWS is embedding artificial intelligence into every layer of its platform — and what that means for organisations making cloud investment decisions in 2026.
Amazon Bedrock Gets Its Largest Upgrade Yet
Amazon Bedrock, AWS's fully managed generative AI service, received its most substantial feature update to date. The additions that matter most for enterprise teams:
- Bedrock Agents with computer use: Agents can now interact directly with web browsers, desktop applications, and data visualisation tools. This dramatically expands the scope of automation possible without custom integration work.
- Knowledge Bases V2: Significantly improved RAG pipelines featuring hybrid search (semantic plus keyword), metadata filtering, and automatic chunking optimisation. Real-world retrieval accuracy improvements are material for document-heavy use cases.
- Model evaluation dashboard: A native benchmarking interface that lets teams compare model outputs across accuracy, latency, and cost — simplifying the "which model for which task" decision that consumes enormous engineering time today.
- Expanded model catalogue: AWS added models from Mistral, Meta's Llama 3.3, and several domain-specific providers to the Bedrock marketplace, giving teams genuine flexibility without managing their own model infrastructure.
New Silicon: Trainium3 and Inferentia3
Amazon's custom AI chip programme continued its rapid advancement with two new generations announced at the conference.
Trainium3, the third generation of AWS's training-optimised chip, delivers a claimed three-times improvement in training throughput per dollar compared to Trainium2. For organisations fine-tuning large models or running regular retraining cycles, this is a meaningful cost reduction — particularly relevant as fine-tuning becomes the standard approach for domain-specific applications.
Inferentia3 focuses on inference workloads, with lower latency and reduced cost-per-token compared to its predecessor. AWS also unveiled an inference-on-Spot capability that allows teams to run large model inference on Spot Instances with automatic fault tolerance — a significant development for high-volume, cost-sensitive applications that can tolerate occasional interruptions.
"Custom silicon is where the real cost advantage lives. By 2026, running AI workloads on AWS custom chips rather than third-party GPUs will become standard practice for cost-conscious enterprise teams." — AWS re:Invent 2025 keynote
SageMaker Becomes Amazon SageMaker AI
In a notable rebranding and feature expansion, Amazon SageMaker was relaunched as Amazon SageMaker AI, with a new visual workflow builder, tighter Bedrock integration, and expanded MLOps capabilities. The rebrand acknowledges an important reality: the line between traditional machine learning and generative AI is dissolving, and platforms need to serve both without forcing teams to context-switch.
The most practically useful new features include:
- Unified experiment tracking: Runs from traditional ML experiments and LLM prompt evaluations now appear in a single interface, reducing the number of tools teams need to manage.
- Generative AI model monitoring: Automated drift detection for LLM outputs, including hallucination rate tracking over time — addressing one of the most pressing production concerns for enterprise teams.
- Pipeline nodes with agentic steps: Workflows can now include agentic task execution, useful for automated data labelling, quality assurance, and multi-model evaluation pipelines.
AWS re:Invent 2025 confirmed Amazon's long-term strategy: make AI infrastructure so tightly integrated into the core cloud platform that migrating away becomes increasingly costly. For existing AWS customers, the tooling is genuinely excellent — the challenge is keeping up with the pace of change.
What This Means for Your Cloud Strategy
For organisations already running workloads on AWS, re:Invent 2025 reinforced the value of staying within the ecosystem. The platform is more cohesive than it was 12 months ago — moving from Bedrock to SageMaker AI to custom silicon is now a smoother path with fewer seams.
For organisations evaluating multi-cloud strategies, the gap in AI-specific tooling between AWS and its competitors has narrowed rather than widened. Google Cloud and Microsoft Azure both made substantial announcements in December 2025, and the choice between platforms increasingly comes down to existing integrations, team expertise, and specific service requirements rather than fundamental capability gaps.
The most practically important consideration for 2026 planning: if your AI workloads are cost-sensitive and primarily inference rather than training, the Inferentia3 roadmap and Spot inference pricing deserve serious evaluation. The economics are shifting faster than most budgets have accounted for.