On Monday 07.07.2025 at approximately 12:34h (CEST), two critical errors occurred during the execution of PRISMA's yearly capacity auctions, leading to the following issues:
Auction Processing Issues |
Bidding Issues |
The issue occurred because two application servers ended up processing the same task in parallel. The first server was delayed in execution, leading the platform to assume it had failed and to reassign the task to a second server. Re-assigning the task to a second server in case of failure is desired behaviour as part of platform redundancy. In this case it was not correctly detected that it was not a failure, but a delay. |
Caused by legacy code shippers received an error message indicating they were not entitled to edit an existing bid or place a new bid.
The legacy code in question incorrectly handled bid ownership checks, when multiple bids of the same person or of multiple persons in the same organisation were involved. |
Auction Processing Issues |
Bidding Issues |
The majority of TSOs and shippers participating in the yearly capacity auctions were impacted by the double processing of auction results. |
21 auctions out of a total of 1574 published yearly auctions were impacted by the bidding issues in the second round of the auctions. |
The total duration of the incident from detection on 07.07.2025, 12:34h (CEST) to deployment of fixes on PROD on 08.07.2025, 13:07h (CEST) and 16:35h (CEST) was 1 day, 4 hours and 1 minute.
Until full resolution including all activities for re-running the auction and data clean-up on platform and TSO side on 11.07.2025, 10:40h (CEST) an additional 2 days, 18 hours and 5 minutes were needed.
Date |
Time (CEST) |
Responsible |
Action Type |
Description |
Jul 7, 2025 |
12:34 |
PRISMA internal |
Send MS Teams message |
Customer Success informed about a possible incident. Shippers were reporting seeing their booking results twice. First analysis did not show any double bookings in internal support tool. Assumption that only cosmetic clean-up will be necessary after the auctions. |
Jul 7, 2025 |
13:05 |
PRISMA internal |
Create ISR |
Customer Success created an ISR (Incident & Service Request), formally starting the internal incident process for the auction processing issues. |
Jul 7, 2025 |
13:13 |
PRISMA Emergency Guard |
Publish UMM |
As part of PRISMA’s business continuity measures the Emergency Guard posted a UMM to inform the market about issues with the auction processing. |
Jul 7, 2025 |
13:19 |
PRISMA internal |
Send MS Teams message |
Customer Success informed about shippers not being able to place a bid in round 2 of the remaining 21 auctions. First analysis showed that it was not an isolated issue, but affecting all 21 auctions. As a result, this was also treated as a formal incident. |
Jul 7, 2025 |
14:13 |
PRISMA Emergency Guard |
Publish UMM |
As part of PRISMA’s business continuity measures the Emergency Guard posted a UMM to inform the market about issues with bidding in round 2. |
Jul 7, 2025 |
15:25 |
PRISMA Emergency Guard |
Align internally |
Internal alignment of Emergency Guard with PRISMA’s Management about the next steps regarding cancellation or continuation of the auctions. Decision: to prevent market distortion, PRISMA recommends cancellation of remaining 21 auctions. |
Jul 7, 2025 |
16:19 |
PRISMA Emergency Guard |
Send Email |
Emergency Guard sent an email to TSO emergency contacts to summarise the call and the decision taken. |
Jul 7, 2025 |
16:28 |
PRISMA Emergency Guard |
Publish UMM |
As part of PRISMA’s business continuity measures the Emergency Guard posted a UMM to inform the market about the cancellation of the auctions. |
Jul 8, 2025 |
10:07 |
Customer Success |
Dismiss UMM |
Customer Success (in alignment with Emergency Guard) dismissed the UMM regarding the bidding issues, since the auctions have been cancelled. |
Jul 8, 2025 |
13:07 |
PRISMA internal |
Deploy fix |
After review and testing the fix for the bidding issues was deployed to the production system. |
Jul 8, 2025 |
13:46 |
PRISMA Emergency Guard |
Update UMM |
Emergency Guard updated the existing UMM about issues with the auction processing to reflect the new marketing time-frame for re-run of the auctions. |
Jul 8, 2025 |
14:04 |
PRISMA Emergency Guard |
Update UMM |
Emergency Guard updated the existing UMM about issues with the auction processing to include also the cases of the missing booking confirmations. |
Jul 8, 2025 |
16:28 |
Customer Success |
Update UMM |
Customer Success updated the existing UMM about issues with the auction processing to reflect the new auction publishing time for the auction re-run. |
Jul 8, 2025 |
16:35 |
PRISMA internal |
Deploy fix |
After review and testing the fix for the auction processing was deployed to the production system. The fix was available since 13:49h, but to avoid interference with the day-ahead auctions, the deployment was scheduled to happen afterwards. |
Jul 11, 2025 |
10:40 |
PRISMA |
Execute steps for data clean-up |
Necessary steps for data clean-up were executed successfully. |
Auction Processing Issues |
Bidding Issues |
Cause: The application server processing the auction evaluation lost the connection to the database. A second application server started processing the same auctions as part of the redundancy implementation in the platform infrastructure. Assessment: Incident was caused by unanticipated high load. Detection: The problem was identified by a user report. Shippers contacted PRISMA’s Customer Success after experiencing first issues with duplicate results. |
Cause: A piece of legacy code in the backend returned wrong permissions to the frontend. Shippers with more than one bid per company were not able to edit bids in the second bidding round, if the bids had been manually placed and not via a bidding plan. Assessment: Incident was caused by an isolated bug. Detection: The problem was identified by a user report. Shippers contacted PRISMA’s Customer Success after experiencing issues with placing bids. |
Auction Processing Issues |
Bidding Issues |
Intermediate resolution: Enlargement of the connection pool for the database connection per application server. This resolution is deployed as per 08.07.2025, 13:07h (CEST). During the runtime of the yearly interruptible auctions 2 weeks later, PRISMA closely monitored the platform systems for possible DOS attacks. No signs for attempted attacks were found. Long-term resolution: Segmentation of the application servers to improve task allocation and system stability. A set of servers is dedicated to scheduled activities, like auction processing and report generation. A different set of servers is dedicated to processing requests originating from (public) endpoints of the platform, allowing PRISMA to reduce the connection pool size per server back to the original number of 100. Restoration of service: Following the intermediate fix the necessary actions for data clean-up (invalidation of double bookings) were aligned and executed with the TSOs. |
Intermediate resolution: A targeted fix was implemented to enhance the query logic in the backend by enabling it to correctly handle and accept multiple bids. This resolution is deployed as per 08.07.2025, 16:35h (CEST). Long term resolution: Refactoring the legacy platform to a modern, independent service-based architecture will reduce the likelihood of such issues massively. This transformation project is already in progress and will be driven further with highest priority. Restoration of service: Following the intermediate fix all cancelled auctions were successfully republished using a custom calendar. On 09.07.2025, 09:00h CEST the affected auctions were successfully re-run and at 13:45h (CEST) the incident was declared closed.
|
Preventive Actions: