Archive for category App Support thoughts

Customer care – Dilbert way !

Today’s Dilbert comic strip is really awesome and how apt it is for people who work in support & maintenance projects to try this in their work !!

Lol !!

 

Dilbert.com

Tags: , , ,

Adopting technology for automating business usual work in support .. still reluctance?

So frustrating … ? Isn’t it ?

I do not know why in the world I see people doing the same and same jobs such as health check of system, carrying out scheduled maintenance & housekeeping jobs and not thinking of automation.  I find that kind of stuff really boring, frustrating and irritating if someone asks me to do so even twice! 

I have seen lot of people who never say a word and will keep on doing the same and same job again and again and never complaining about that. They do not even question the value of the work and efforts they are putting in. Especially in the support and maintenance projects I have seen plenty of examples of tasks that are done month on month without anyone looking to review and take a look at how to automate them? The typical tasks such as health check monitoring, scheduled maintenance & housekeeping jobs, proactive monitoring of processes & instances etc, is done each day / week by spending hours on the task.  Stunning fact isn’t it ?

When I think deep into the reason of why people do not like to think laterally and review what they are doing and what value addition they are doing to the project, I find few common patterns which could very well be managed and nurtured to change for the good of the resources and eventually the project.

Few of them I could write as follows,

  1. Resources do not know what they are doing – Most of the times I have observed that the routine work is delegated to a new trainee and they do not really take efforts to understand WHAT they are doing and WHY they are doing.  They only know HOW things should be done.
  2. Self motivation  – While most of the times the routine tasks are carried out by newcomer or a trainee person in the project, I have also observed that these guys are not given enough knowledge sharing to get them a start in the project. They are just given tasks and asked to learn from that. Only telling HOW of a job, does not give sufficient information to the resources & hence they lack self motivation to question themselves on the job.
  3. Hierarchy in the organization – You can not really question your senior manager if you are given a task of doing a routing work, can you? Especially when you are a team member of a support team. In my opinion, one should be brave enough to ask questions and especially justify the value of a person putting hours of efforts for silly work that could very well be automated. But simply the fear of asking your senior, sometimes kills that motive.
  4. Traditional tasks .. this is how it was always done – One of the most common reason when I ask people why can not they improve on the current situation and look for automation in their area. The knowledge that is passed over from a team leader to a team member is observed as often limited to HOW a job done rather than WHAT is it and WHY you are doing. That further means that they are just meant to do what they are told and not deviate from anything else!  Thus, when I asked one of team member of a vendor support team about providing some information on incident investigation, the answer I got back was very typical .. “I do not know much because I did what was always done and was told to me! Its a traditional way of doing it.” 

People in many offshore support teams simply turn up for the job and do it, without getting into the soul of the job and using common sense to automate and in effect introducing efficiencies in the project.

Sometimes its so frustrating … ? Isn’t it ?

Tags: , ,

Practical problem & incident management

 

I found this few days ago while surfing on the net and liked it very much really. In all honesty its nothing but the truth in today’s scenario of problem management and incident management.

Sometimes this is exactly what happens !

Unsure of who is the author but cool work here !

problem-solving-flow-chart

Tags: , , ,

Managing the self inflicted incidents – why its important and how you avoid it?

I wrote quite a few articles recently on Incident Management and you can find all of them here.  In this article I am trying to put my thoughts on the incidents that are categorized as “Self inflicting” or “invited incidents” (henceforth SIIs) and how to protect / prevent them from occurring.

Self inflicting incidents that result into service outage or disruption are normally followed by remedies to the vendor providing the application support services. The customers now a days are sensitive in putting the remedy clauses in their contracts and thus its overly more important to keep the incidents, let alone self-inflicted incidents away by doing additional monitoring & proactive measures in place.

I wrote earlier about the DDR framework to manage the incidents and how it is important for major incidents to be detected earlier, diagnosed quicker and resolved sooner. It is worth reading the article if you have not done as yet.

It is important to understand if the incident could be classified as a self inflicted or not, while you are doing the incident management.  The sooner you detect the type of incident (self inflicted or situational) the more chances you have to “manage” the incident appropriately and avoid heavy fines / remedies against your organization.

The most often cause of having SIIs is manual overlooking, carelessness while doing a change to the production system. Any change done to the production system without understanding its implications could be really harmful and could come back and bite you hard. Hence its really important for application support teams to understand each change going on the platform, around the platform and then line up the implementation steps, pre & post implementation checks accordingly to safeguard from potential SIIs.

Once you detect an incident as an SII, its very important to “manage” it properly. Two key lessons you can keep in mind while managing an SII are,

  • Never hide from customer about any SIIs
  • Never lie about the facts around the SIIs

Most of the times, I think the support team management would take a political route for handling the aftermath of the SIIs to save themselves from potential remedies & fines. While, in some cases, it make sense to do so, more often than not, for a wiser and slightly smarter customer, it falls right on the face. After all, your reputation is on the line !

If you have a very good working relationship with the customer, try to speak with the customer and explain the situation in a full honesty. While you do that, its equally important to learn the lesson and ensure to take steps not to repeat the incident again. No point in giving fake promises to the customers if your team can not keep it. If there are situations that have forced your team into managing a SII, then explain the customer about the situation and see how this could be overcome. In most the cases, where the customer is slightly sensible (rather than horrible :-) ), this trick would prevail.

Remember ! It is always important to keep the customer informed and not keep him in dark over the investigation. After all, you are the service provider and he is paying you for your services.

Now, moving on to tricks on avoiding the SIIs. Well, there is no defined process or guaranteed path that would ensure that there will not be any SII while you are providing application support, but surely there are enough tips and tricks that would help you reduce the probability.

First of all, find out the most common root causes of the incidents happened in past one year. More often than not if the incident has happened in past one year and root cause has been found and the fix has applied, there is a learning you could take from that experience.

Have a very good checklist for doing the health check of the system. Automate the monitoring of the components and potential failure points as much as you could so that in case of an incident, they would be useful to gather any evidence.

Have a useful incident checklist handy with you. You can read about how to prepare a incident checklist on my previous topic here. You need to take all your understanding of the platform, its connection points, failure points in consideration when you create the incident checklist to detect and diagnose the incident.

Most importantly, for all scheduled / unscheduled changes on the platform, ensure that they are thoroughly checked, implications are understood and risk is flagged accordingly. There is no point in keeping quiet if you know that a network change might cause an outage to your portal if its switched over. You might want to give a heads up to the customer and seek an approval prior to such change than keep on explaining why you allowed it on production later on !  If the application support team are able to detect and predict which changes are potentially harmful to the system before they are approved for implementation, your more than half of the job is done.

While doing the change implementation, obviously be very careful on what you are working on. Even if the change sounds simple and non intrusive or disruptive to the service, there no point in being careless about doing it.  I have an experience of managing an incident where one of my colleague (few years ago) had deleted production database tables, instead of the reference database tables and the system went down for full 3 days !

There are lot of things you could as a application support team to avoid the potential SIIs and then eventually ensuring you maintain a stable system. I have noted few of them above, you might want to let me know if there are more and share your knowledge with me too !

Cheers !

Tags: , ,

Incident checklists – why one should have it & how one should prepare it

I wrote few weeks ago about how you can manage the incidents effectively. The post is available here for your reading. I mentioned there about the Detect Diagnose Resolve framework and how you can use it to effectively to manage the incidents.

It is very important to quickly detect the incident cause when you are into the incident management process. Unless you find out where the cause lies, it would take a long time to actually diagnose & resolve the issue.

‘Incident Checklist’ is the most important tool / process one should have with every application support analyst so as to quickly start the incident analysis and rule out obvious causes of the errors.

However what is the ideal way one should prepare and subsequently the incident checklists?

Why

I guess I do not need to convince the application support community about the need to have incident checklists prepared for their use. They are handy documents / tools that give essential information that you could use during the incident. Such as,

  • Important phone numbers & emails
  • Stakeholder lists
  • Technical task list for carrying out health check
  • Quick tips to help make decisions
  • Escalation paths
  • Other support group contacts

Once you have a good checklist consisting the above details, you should try and review & update it as often as you should to ensure that stays useful. 

How

The most important aspects of the incident checklist are that it should,

  • Not be overly cluttered
  • Simple and easy to understand
  • With clear instructions on to do’s & not to do’s.
  • Not contain any sensitive data i.e., passwords, user ids etc.

You might be wondering what is the best format for you to prepare the incident checklist? Should it be a MS word document, Excel, PowerPoint or an Image or a PDF or an online tool?

Qantas380_1QF380_2In my opinion, “Flight safety cards” are the best example of the incident checklists. They give all the necessary information of how one should react to the emergency situation, important information such as nearest exits, Do’s & Don’ts during the crisis and so on.

Support teams should actually take this example and prepare their incident checklists in a way it satisfies the criteria I mentioned earlier.

Cheers

Tags: , , , ,