So Tuesday morning we got a report in IRC that a committer was trying to get a release out
and could not deploy. Shortly after a Nexus issue was reported in Jira INFRA-8321. A few
hours later another issue INFRA-8322 related to Nexus was opened. So far, nothing unusual
Yesterday, more issues reported on IRC/HipChat, and more issues opened.
(INFRA-8326,INFRA-8327,INFRA-8328, INFRA-8334). By then it was obvious this more than
a coincidence and it was already being looked into.
Twitter notifications and emails were sent out declaring the degraded performance an outage
and On Call was full time looking into the issue. Others joined the call to assist and eventually
the outage was determined to be a change to LDAP configuration made 2 days ago by Infra.
(See infra:r921805 for the revert of that.)
The LDAP change was made to improve response times as it was being reported as being slow
to return queries. Reverting the change cured the issues Nexus was having contacting the
groups that committers belonged to.
There will be another avenue looked into for improving LDAP query response times whilst not
affecting those services that connect via anon bind.
Infra thanks everyone for their patience whilst this was looked into and resolved.
Thanks go to those involved in working towards the solution:-
Gavin McDonald (gmcdonald)
Tony Stevenson (pctony)
Chris Lambertus (cml)
Daniel Gruno (humbedooh)
Brian Fox (brianf)