CONFERENCE DAY ONE

Early Morning Sessions

Day 1 Sessions

8:00 am - 8:45 am Registration & Coffee

Day 1 Sessions

8:45 am - 9:00 am Opening Remarks

This exclusive panel brings together senior technology leaders to discuss how observability fits into modern engineering strategy. Moving beyond tools and dashboards, the conversation focuses on executive level decision making: ownership, investment priorities, and how observability supports reliability, platform strategy, and business risk management at scale.
• How C suite leaders define and own observability strategy, including governance, accountability, and success metrics
• How to set investment direction at scale, navigating tool consolidation, platform alignment, and cost
• How observability supports executive priorities, like incident risk and resilience

img

Jayashankar Rudrachar

Head of Observability Engineering
TIAA

img

Ruben Vilhena

Head of Software Development
Smartest Energy

img

Marta Lima

Engineering Lead for Observability
Wise

Day 1 Sessions

9:30 am - 10:00 am Mission 40: Reducing Operational Toil and Rebuilding Reliability at Pfizer
Fuat Müminoğlu - Hosting SRE Operations Director, Pfizer

Pfizer's SRE teams faced rising complexity, constant interruptions, and high reliability expectations across research, manufacturing, and enterprise systems. Mission 40 tackled this head-on, aiming to cut operational workload by 40% across multiple teams while also migrating away from a costly vendor low-code platform to in-house automation. With 18 years of experience, Fuat Müminoğlu takes you inside the strategy, mindset, and execution that made this possible in just one year. Learn how careful prioritisation, multi-team collaboration, and targeted automation turned a massive challenge into clear, measurable impact, including the creation of a Continuous Improvement engine that scaled Mission 40's practices across the organization.

As a next step after Mission 40, AI is now being applied to further boost opportunity identification and prioritisation, accelerating improvements on top of the foundations laid during the original project.
• Moving critical automations in-house while maintaining operational stability.
• Driving cultural and process changes that outlast a single project, not just implementing tools or automation.
• Applying AI as a next-level enhancement to continuously identify and prioritise improvement opportunities.


img

Fuat Müminoğlu

Hosting SRE Operations Director
Pfizer

Day 1 Sessions

10:00 am - 10:30 am Presentation: SRE as an Operating Model: Embedding Reliability Across the Business with AI
Jayashankar Rudrachar - Head of Observability Engineering, TIAA

SRE is often introduced as a standalone team, yet reliability gains stall when it isn't embedded into everyday operations. In this session, Jay Rudrachaar explores how organisations can build SRE into the fabric of their business. Drawing on his work in global operations and AI driven observability at TIAA, Jay shows how the right operating model, shared ownership, and AI assisted tooling can shift teams from reactive incident response to prediction and prevention.

• Exploring why SRE works best as an operating model, embedded across teams rather than owned by a single function
• Enabling AI and SRE agents from strong observability foundations, improving incident triage, ownership, and prevention
• Reducing operational burden using AI, moving from faster root cause analysis to automated remediation and governance aware decisioning

img

Jayashankar Rudrachar

Head of Observability Engineering
TIAA

9:30 am - 10:00 am Roundtable Discussion: Moving Faster Without Breaking Things — Risk in AI Accelerated Engineering

AI tools like GitHub Copilot and Claude are accelerating delivery speed, but many teams are seeing reliability incidents rise alongside velocity. This workshop examines why AI assisted development exposes new failure modes and how platform teams can introduce safeguards that preserve reliability without slowing progress. Drawing on examples from regulated FinTech environments, the session focuses on observability gaps, risk signals and quality first practices that help teams recognise when speed becomes unsafe.
• Observability gaps introduced by AI generated and AI assisted code
• Signals that indicate when velocity has crossed into unsustainable risk
• Quality first patterns that protect reliability without creating bottlenecks

10:30 am - 11:00 am Morning Coffee Break

Late Morning Sessions

11:00 am - 11:30 am Building ITV's AI Agent Hub: Audit-Ready Assistants, Safer Usage, Better Self-Serve
Tom Haynes - Lead Platform Engineer, ITV

As AI tools spread across the enterprise, ITV's platform engineering team faced a familiar challenge: high potential value, but real risk around visibility, governance, and liability. Tom has been leading the build of ITV's new AI Agent Hub - an MVP now in controlled testing - designed to give teams a single, ITV-tailored place to access approved assistants, accelerate day-to-day work, and reduce "dark AI" usage. Powered by the platform foundations ITV has laid across Kubernetes, shared templates, and OpenTelemetry, the hub aims to make AI adoption traceable by default, so the organisation can confidently scale usage while protecting content, users, and the business.

• Centralising approved agents to reduce shadow AI and standardise safe usage patterns
• Capturing prompts, outputs, and artefacts via Open Telemetry for traceability and liability defence
• Using RAG to keep documentation searchable, current, and useful for busy teams


img

Tom Haynes

Lead Platform Engineer
ITV

11:30 am - 12:00 pm Presentation: Cloud Intelligence at Scale: How AI Agents Are Redefining SRE in Modern Platforms
Marius Zaharia - Cloud Tech Lead, Societe Generale

Cloud environments now generate volumes of telemetry far beyond what traditional SRE practices were built to handle, pushing teams to rethink how reliability is maintained at scale. Drawing on hands on experience with cloud native operations, Marius Zaharia explores how AI powered assistants and agents are being embedded directly into cloud platforms to interpret signals, accelerate diagnosis and automate safe responses. Using concrete examples from Azure based SRE tooling, this session shows how intelligent agents help teams reduce operational noise, respond faster to incidents and evolve toward human in the loop automation without losing control.
• Embedding AI driven agents into cloud monitoring and incident workflows to speed up detection and diagnosis
• Reducing alert noise by correlating logs, metrics and events across complex cloud services
• Advancing cloud reliability practices through safe, governed automation that keeps humans in control

img

Marius Zaharia

Cloud Tech Lead
Societe Generale

12:00 pm - 12:30 pm Panel Discussion - What Does SRE Mean Now? The Enterprise Interpretation
Dan Herd - Principal Site Reliability Engineer, Dunelm
Marius Zaharia - Cloud Tech Lead, Societe Generale

Most organisations adopt SRE principles without adopting the "classic" model - and that's often the right call. The challenge is avoiding confusion, bottlenecks, and "SRE as the dumping ground" by agreeing what SRE is there to enable.

• Agreeing SRE's purpose: enablement, shared ownership, or embedded support
• Using SLOs to prioritise work without turning reliability into politics
• Designing on-call boundaries so accountability is clear and scalable


img

Dan Herd

Principal Site Reliability Engineer
Dunelm

img

Marius Zaharia

Cloud Tech Lead
Societe Generale

11:00 am - 12:00 pm Roundtable Discussion: User Centric SLOs and Resilience Testing: Measuring What Really Matters

Grounded in a decade of applying DevOps and SRE principles in real engineering environments, this roundtable brings together engineering leaders to discuss how to define SLOs that truly reflect user impact. The session focuses on designing user journey specific SLOs, selecting the right indicators, and testing assumptions through active resilience (chaos) testing, helping teams better understand, monitor, and own the reliability of their applications.
• Designing user journey driven SLOs that capture real customer impact when things go wrong
• Selecting meaningful indicators and signals that help teams understand and own their application behaviour
• Using active resilience (chaos) testing to validate SLOs, strengthen monitoring, and expose blind spots before incidents occur

Lunch

12:30 pm - 1:30 pm Lunch

Early Afternoon Sessions


img

Panos Tsilopoulos

Director - Observability Platform Engineering
Nike

2:00 pm - 2:30 pm Panel Discussion – The Great Shift from Monitoring to Observability: What Actually Changes in Teams, Workflows & Culture?

Andy Slater - Lead on Automation & Observability, Specsavers
Rajasree Ramakrishnan - Product Lead - Agentic Observability, Lloyds Bank
Vinaya Bharathi - Senior Observability Specialist, Pandora

As systems become more distributed and change happens faster, "more dashboards" doesn't equal better reliability. The real shift is organisational: who owns signals, how teams respond, and how observability drives prioritisation rather than noise.

• Defining ownership models that turn signals into action and facilitating the culture shift along with it.
• Killing noise by redesigning alerting around journeys, not infrastructure
• Proving value through faster recovery, safer change, and clearer priorities


img

Andy Slater

Lead on Automation & Observability
Specsavers

img

Rajasree Ramakrishnan

Product Lead - Agentic Observability
Lloyds Bank

img

Vinaya Bharathi

Senior Observability Specialist
Pandora

2:30 pm - 3:00 pm Afternoon Coffee Break

Late Afternoon Sessions

4:00 pm - 4:30 pm Panel Discussion: AI in Engineering Work: Productivity Boost or Reliability Risk?

Andy McMahon - Director - Principal AI & MLOps Engineer, Barclays
Fuat Müminoğlu - Hosting SRE Operations Director, Pfizer

AI is already changing how teams write code, investigate incidents, and ship changes - but adoption stalls when trust is low and governance is unclear. Leaders need clarity on where AI helps today, where it introduces new risk, and how to roll it out responsibly.
• Identifying safe, high-value AI use cases across engineering workflows
• Defining guardrails: access, approvals, auditability, and blast-radius limits
• Building trust through evaluation: quality, MTTR impact, and false positives

img

Andy McMahon

Director - Principal AI & MLOps Engineer
Barclays

img

Fuat Müminoğlu

Hosting SRE Operations Director
Pfizer

3:00 pm - 3:30 pm Greening SRE: Turning Observability, Efficiency and AI into Sustainable Engineering

Ruben Vilhena - Head of Software Development, Smartest Energy

Engineering teams are racing to cut carbon and build smarter, greener systems, and a new space is opening up where SRE can lead the way. With hands on experience rolling out observability, improving pipelines and moving from FinOps to GreenOps, Ruben Vilhena delves into how reliability and sustainability can work together. With AI magnifying both strong and weak engineering habits, creating cost-effective energy aware systems has become one of the most exciting challenges in modern tech.
• Leveraging observability, metrics and deep dive insights to identify carbon hotspots and operational inefficiencies across systems.
• Applying practical GreenOps principles to pipelines, architectures and testing workflows to reduce energy usage and costs while improving performance.
• Evaluating how AI can augment sustainable engineering practices to amplify strong processes and expose weaknesses.

img

Ruben Vilhena

Head of Software Development
Smartest Energy

3:30 pm - 4:00 pm From Black Box to Glass Box: The Observability Blueprint Behind Barclays' AI Agents

Andy McMahon - Director - Principal AI & MLOps Engineer, Barclays

As Barclays expands the use of AI agents across its internal platforms, observability is becoming the foundation that turns experimentation into safe, scalable capability. By embedding observability into the architecture from day one and working in lockstep with security and engineering governance, Andy's team can monitor how agents behave, what data they touch, and where usability gaps emerge in real time. This deeper visibility is enabling the bank to boost productivity and respond to issues faster in a rapidly evolving AI landscape, all while maintaining compliance.

• Monitoring AI agents with integrated observability that reveals behaviour, data usage patterns and real world performance
• Collaborating across security, architecture and platform teams to ensure AI adoption aligns with governance, compliance and organisational standards
• Improving productivity and user experience by rapidly identifying usability gaps and driving targeted interventions

img

Andy McMahon

Director - Principal AI & MLOps Engineer
Barclays

4:30 pm - 6:30 pm Drinks Reception