Matt Simmons sent a tweet this morning regarding today’s xkcd comic. His observation was that it would make a large number of sysadmin blogs and I’d have to agree. The rest of his commentary from his blog is also quite apt. I remember having this discussion at the LISA conference multiple times, specifically the hero complex and how dangerous it is to stability and reliability for all the reasons listed.
In our profession (yes, it is one), invisibility is the name of the game. This unfortunately has the side effect of no one really understanding what you do or why they employ you. If you are good, there are no problems, so why do they need you? On the other hand, you get the person who isn’t as experienced yet and who runs around fixing the symptoms rather than the problem and, due to the visible results, gets praised. This leads into the hero complex and it’s a difficult thing to turn around, as everyone likes to know they are doing a good job.
It is quite the set of standards we have:
- We only get a call when something is wrong
- Few people know what we do
- If we are doing a good job, we mustn’t be working
- If we are doing a bad job (not sufficiently experienced), we are praised for fixing the problems, as they had risen to a noticeable level
- The better we get, the less recognition we get
- As time goes on, we become better generalists, rather than specialists
The real problem is more along the lines of promoting the idea that if all is quiet, we are working effectively and if all is chaos we are not. From an external point of view, the apparent effort is inverse to the actual situation which is counter intuitive to most people. This problem is exacerbated with the convention that people expect computers to have problems, so outages are the norm, not the exception. Realistically, we should be handling exceptions to the norm, rather than our visibility being the norm.
So far the only thing that comes to mind is a set of shameless self-promotion items which show that the work we do is affecting the bottom line by avoiding problems rather than trying to correct them as they happen. It’s kind of like the Y2K issue - perception is that nothing happened, so we wasted all that effort on nothing. The real statement should be that we had minimal issues because we fixed the problems before they hit. I know I spent a lot of time in advance patching machines and installing new versions of software in a controlled manner in the months leading up to Y2K. Personally I’d prefer to have fixed it all in advance rather then be scrambling after the fact. I guess I’m just lazy.
Comments are welcome.