Junior Sysadmin’s First Lines Of Code Set Off Alarms. His Next Lot Crashed Thecompany

Who, Me? Welcome again to Who, Me? It’s the Monday morning column in which readers of The Register admit to making big mistakes and somehow swerving the consequences.

This week, meet a reader we’ll Regomize as “Logan” who told us he acquired a psychology degree that somehow led to a role as an IT generalist.

A few years into that gig, Logan wanted a change and scored a job as a junior sysadmin – a big change because it involved working on a structured team.

We don’t discipline honest mistakes. We do discipline shifting blame and shirking responsibility

“We had four developers, a product owner, and one other sysadmin,” Logan told Who, Me?

That setup was a little confronting. “I’d never worked somewhere with a proper office, or indeed, proper unwritten rules,” Logan told us. He also learned that the second sysadmin wanted to get out of the job and become a developer.

As you do when starting a new job, Logan tried to impress.

So when he found an inelegant dashboard for the Nagios network monitoring tool, he volunteered to tidy it up.

“I grouped the boxes by function, reordered a few things, and wrote a user story,” Logan told Who, Me? He showed his work to the product owner and the rest of the team, all of whom agreed it was a fine upgrade.

“Ten minutes after pushing the change, a few boxes on the dashboard went orange and one of the devs looked like he might pass out,” Logan confessed.

The lead developer stared at the dashboard and struggled to decipher it.

Thankfully he soon realized the new design Logan created meant some familiar metrics appeared on different parts of the screen, and that the orange alerts reported a regular cron job that always stressed MySQL backups.

Logan reverted the change anyway.

“The Product Owner later told me he’d suspected it might cause issues – but had let me go ahead anyway so I could learn by doing,” Logan wrote. “I wasn’t sure whether to thank him or cry.”

The big one

A week later, the product owner went on holiday and left Logan under the supervision of the other sysadmin – the one who had mentally checked out of the job so he could become a developer.

Come Friday, Logan volunteered to take on one of the routine end-of-week chores – running the company’s version of /update-servers.sh – a script that SSHes into servers and updates packages.

The sysadmin gave his permission, Logan ran the script, and watched in terror as most of the Nagios dashboard turned bright red or angry orange.

“Phones started ringing. Orders stopped syncing. Sales couldn’t update products. The website was up, but everything behind it was well and truly dead,” Logan admitted to Who, Me?

The senior sysadmin asked Logan if he had run the script, checked his work and quickly diagnosed the problem – Logan hadn’t commented out one line of code.

Nobody had told Logan that was necessary. And it wasn’t his fault that someone had committed the script with that line active.

Before the blame game could begin, there was work to do because the script initiated an upgrade of production MySQL servers from version 4 to 5.

“The upgrade had filled the disk partway through, deleting the old binaries and failing before migrating the data,” Logan told Who, Me? “Everything with a hardcoded MySQL 4 dependency – which was basically everything – had gone down.”

And then the CEO materialized to ask what was wrong, learned about the upgrade, and demanded development of a recovery plan within the hour.

Suffice to say Logan and his supervisor found a way to re-install MySQL 4 binaries, quickly restored service, learned that the impact had been tiny – 15 missed orders out of 25,000 – and by 6 pm the incident was effectively over.

Which was when the senior sysadmin handed Logan an envelope containing a formal disciplinary warning.

“I’d expected a bollocking, but that stung,” Logan told Who, Me?

Managerial matters

The following week, the product owner returned from his break. Logan came to the office early so he could give his side of the story.

He arrived to find the product owner already debriefing the CEO.

Logan joined their meeting and handed over a seven-page incident analysis he’d prepared, and the disciplinary letter.

“The CEO turned purple,” Logan wrote. “He read the letter, crossed out my name, replaced it with the sysadmin’s, and walked over to hand-deliver it.”

And then the CEO stepped up, bigtime.

“You were on probation. You were supposed to be supervised,” he told Logan. “And we don’t hand out disciplinaries for honest mistakes. We hand them out for shifting blame and shirking responsibility.”

“I never touched update-servers.sh again,” Logan told Who, Me? “But I never forgot what I learned about good leadership and bad handovers.”

How have you survived the blame game after an error? Has the boss bailed you out? Or bailed you up? Click here to send us an email with your story so we can share it in a future edition of Who, Me? ®


Original Source


A considerable amount of time and effort goes into maintaining this website, creating backend automation and creating new features and content for you to make actionable intelligence decisions. Everyone that supports the site helps enable new functionality.

If you like the site, please support us on “Patreon” or “Buy Me A Coffee” using the buttons below

To keep up to date follow us on the below channels.