Resolved
We have mitigated the vendor outage on the platform.
Monitoring
Models are back and operational with limited capacity but we are resolving request spikes as they appear. We are continuing to monitor the situation and work with our vendor to get the clusters operational again.
Identified
we've migrated majority of the workload to other providers, we are seeing improvements, still working on it
Identified
We are in contact with our vendor and we're working on fixing the issue
Investigating
One of our vendor is experiencing network issues, we are rebalancing the models at the moment.