Single-tenant Artifactory servers returning 500 errors

Incident Report for JFrog Cloud

Postmortem

This issue was caused by timeouts between Artifactory and Access due to intermittent load on the access server. This was also compounded by a separate issue where the server did not recover from those timeouts as it should have.

During the incident, we made configuration changes that helped to mitigate the impact of those problems. We also tuned our alerting so that our teams could respond immediately when proactive hallmarks for this issue arose.

As a long-term solution, we have created a patch that allows Artifactory to recover automatically from those timeouts with no impact to the service. In addition to the patch, we've added additional logging to assist in future debugging should similar errors arise.

Posted Nov 22, 2018 - 22:21 UTC

Resolved

This incident has been resolved. We have been monitoring and we have not seen this issue reoccur.

Posted Nov 21, 2018 - 00:55 UTC

Update

We are continuing to monitor for any further issues.

Posted Nov 08, 2018 - 16:19 UTC

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Nov 08, 2018 - 13:43 UTC

Update

We are continuing to work on a fix for this issue.

Posted Nov 08, 2018 - 06:00 UTC

Identified

We have idenitifed the issue.

Posted Nov 07, 2018 - 22:58 UTC

Update

We are still investigating this issue.

Posted Nov 07, 2018 - 22:18 UTC

Investigating

We have identified an issue with Single-tenant Artifactories failing with 500 errors.

Posted Nov 07, 2018 - 20:52 UTC

This incident affected: US - East1 (N. Virginia) - AWS (Artifactory) and Europe - West1 (Ireland) - AWS (Artifactory).