Single-tenant Artifactory servers returning 500 errors
Incident Report for JFrog Cloud Platform
Postmortem

This issue was caused by timeouts between Artifactory and Access due to intermittent load on the access server. This was also compounded by a separate issue where the server did not recover from those timeouts as it should have.

During the incident, we made configuration changes that helped to mitigate the impact of those problems. We also tuned our alerting so that our teams could respond immediately when proactive hallmarks for this issue arose.

As a long-term solution, we have created a patch that allows Artifactory to recover automatically from those timeouts with no impact to the service. In addition to the patch, we've added additional logging to assist in future debugging should similar errors arise.

Posted Nov 22, 2018 - 22:21 UTC

Resolved
This incident has been resolved. We have been monitoring and we have not seen this issue reoccur.
Posted Nov 21, 2018 - 00:55 UTC
Update
We are continuing to monitor for any further issues.
Posted Nov 08, 2018 - 16:19 UTC
Monitoring
A fix has been implemented and we are monitoring the results.
Posted Nov 08, 2018 - 13:43 UTC
Update
We are continuing to work on a fix for this issue.
Posted Nov 08, 2018 - 06:00 UTC
Identified
We have idenitifed the issue.
Posted Nov 07, 2018 - 22:58 UTC
Update
We are still investigating this issue.
Posted Nov 07, 2018 - 22:18 UTC
Investigating
We have identified an issue with Single-tenant Artifactories failing with 500 errors.
Posted Nov 07, 2018 - 20:52 UTC
This incident affected: AWS US East 1 (N. Virginia) and AWS Europe West (Ireland).