On Monday 30.06.2025 at approximately 16:19h (CEST), a critical error occurred during the execution of PRISMA's within-day capacity auctions, leading to the following issues: Within-day auctions were not created and therefore not started.
The issue was caused by a database running out of storage. Due to no storage being available an (intentional) lock on the database was not released. This caused the process responsible for creating within-day auctions to fail.
The expected number of within-day auctions for a given hour is at approx. 215 (the exact number can fluctuate, due to availability of capacity on TSO side). During the issue the number of running auctions was 112, which means that approx. 48% of the expected auctions were impacted.
The total duration of the incident from detection on 30.06.2025, 16:19h (CEST) to full restoration of the auction functionality on the same day at 18:00h (CEST) was 1 hour and 41 minutes.
According to PRISMA’s business continuity measures the UMM for resolving the issue is published once all auctions for the impacted transportation period have been conducted. The respective UMM was published on 01.07.2025, 07:45h (CEST) which is an additional 13 hours and 45 minutes later.
Date
|
Time (CEST) |
Responsible |
Description |
30 Jun 2025 |
16:19 |
PRISMA internal |
PRISMA internal resources identify low number of within day auctions triggered by internal auction monitoring. |
30 Jun 2025 |
17:23 |
PRISMA Emergency Guard |
As part of PRISMA’s business continuity measures the Emergency Guard posted a UMM to inform the market about issues with the auction start. |
30 Jun 2025 |
18:00 |
PRISMA internal |
After identifying the processes that caused the database performance issues and actively ending these processes, the database recovered and all remaining sessions were unblocked. |
30 Jun 2025 |
18:00 -20:00 |
PRISMA internal |
Close monitoring of the state of within-day auctions and database performance by PRISMA engineers, to ensure that the fix is persistent. |
1 Jul 2025 |
07:43 |
PRISMA Emergency Guard |
As part of PRISMA’s business continuity measures the Emergency Guard informed the TSO emergency contacts via email that the incident is resolved. |
1 Jul 2025 |
07:45 |
PRISMA Emergency Guard |
As part of PRISMA’s business continuity measures the Emergency Guard updated the UMM with the information that the incident is resolved. |
2 Jul 2025 |
10:00 |
PRISMA internal |
Emergency Guard, Customer Success and involved engineers conducted a post mortem. |
Assessment: The incident was caused by the Shipper API. The existing fail safe of limiting the query and the existing rate limiting was not enough to prevent the incident.
Detection: The problem was identified by PRISMA internal monitoring and alarming.
Intermediate resolution:
In the course of the incident the expensive processes / queries were identified and manually ended.
Long-term resolution:
Introduction of improved query handling (e.g. queueing) and improvement of existing rate limiting for this specific endpoint. In addition the query execution for the Shipper API can be moved to a database replica.
Restoration of service:
The full restoration of the auction functionality was reached after the processes that caused the issues were manually ended.
Preventive Actions: