Critical environments can include operational buildings, or areas within buildings, such as data centres, hospitals, manufacturing plants, and trading floors (such as a stock exchange) among others. Critical facilities are the systems that support the effective and efficient operation of these environments. These systems include, but are not limited to: electrical infrastructure, BMS and monitoring systems, UPS equipment, fire detection and suppression systems, access control, and mechanical cooling systems.
Designing, operating, and maintaining critical facilities is a vastly different endeavour to other fields of engineering. While personal physical risk is greatest in the mining and construction sectors, the financial and reputational losses that can be incurred due to down-time in critical environments can be truly staggering, even if only experienced for a short period of time. Damages can run into tens of millions of dollars in the space of just a few minutes if the trading floor of a major financial institution loses power. Reputational loss, despite being difficult to quantify, can be even more detrimental to the businesses and individuals involved following a serious incident, and can take many years to recover from. Hospital operating theatres are another example, and have an obvious criticality unlike almost every other environment that can be imagined.
The task of critical facilities engineering design teams is to ensure that critical systems have an inherent resiliency and redundancy to cope with the failure of almost any single system component. Data centre electrical systems for instance, would without fail, have at least two alternate sources of supply as well as being backed up with standby diesel generators in addition to UPS, and/or flywheel systems. The loss of any one source of supply therefore, would not have the effect of a total power loss within the data centre. The same goes for mechanical cooling systems. Failure of one chiller and pump, or one condenser water system, or one supply air fan system, depending on the type of system installed, would automatically result in the starting of a redundant system to maintain effective cooling within the data centre without interruption.
Critical facilities managers have the task of testing, maintaining and monitoring all these systems on a regular basis to comply with both the expectations and demands of the facility owners, and those of any relevant regulating authorities. It is generally not the role of critical facility managers to have a hands on role in this overarching task though. In most instances, the task of maintaining specific items of critical equipment would fall to the original equipment manufacturers (OEM), or another vendor with intimate knowledge of the particular equipment. Vendor management therefore, is a key role of the facility manager, and one that can consume much of his or her time. The facility manager is, for all intents and purposes, the orchestra conductor where critical facilities are concerned. His or her knowledge of the relevant vendors, their technicians, and the development of sound relationships should be a key driver of the facility manager. The ability to generate quick response times from vendors to any incident is vital, and while contractual arrangements definitely count, having faith in your vendors, and them having faith in, and cordial and respectful relationships with the facility manager counts for just as much in my opinion.
Incident management is where good critical facility managers really prove their worth. Being able to keep a cool head, and avoiding getting “panicky” when critical systems fail comes with experience. Sometimes, the best thing you can do as a facility manager is to do nothing. I know this sounds counter-intuitive, but it is so easy to make a false diagnosis when things go awry, that the best course is to simply take that extra minute or two, to achieve a greater understanding of what exactly is happening. When alarms are sounding, and the phones are ringing, you need to shut some of these things out for a brief period and just observe. All the mechanical and electrical systems within critical environments can be expected to fail at some point; they are just machines after all. No UPS manufacturer for instance, will ever give you a guaranteed 0% failure rate for their equipment, and warranties are nothing more than a marketing tool when all is said and done. When critical systems fail, the facility manager will quickly need to assume responsibility for multiple tasks, all of which need to be coordinated and managed to bring about a satisfactory outcome. A correct and timely general diagnosis of which system has failed needs to be made, so that the relevant vendor, or vendors, can be mobilised in order to effect repairs and bring systems back to normal operating condition. The same diagnosis also needs to be escalated to the facility owners and any relevant authorities. Continual communication with the facility owners, and regular updating of how the incident and resolution is progressing is a fundamental area within incident management, the importance of which simply can’t be overstated.
Producing detailed and accurate reporting following an incident, so as to identify the root cause of a component or system failure, will go some way towards circumventing any repeat of the failure in the future, and is an essential part of the incident management process. This reporting is part and parcel of not just incident management, but also the above mentioned vendor management roles of the facility manager.
The facility manager should have a good “feel” for his or her facility. Monitoring via both human and electronic means is critical, but even with the best, most sophisticated electronic monitoring systems available, there is still a need for the facility manager who knows when something doesn’t smell, feel or sound quite right. That being said, electronic monitoring is one of the cornerstones of critical facilities management. Without reliable monitoring of critical facilities infrastructure, along with alarm notification and acknowledgement systems, facility managers would be essentially running blind and deaf. We should avoid any confusion here though, between building management systems and environmental monitoring systems. While the two above mentioned systems are more often than not linked, and even form part of the one system, it is of the utmost importance, particularly in unmanned sites, that alarm notification and acknowledgement systems are 100% reliable. Being able to receive, decipher, and acknowledge alarms quickly and without confusion at any time of day or night is non-negotiable, and the ability to easily configure escalation order and timing is essential. It is simply not reasonable to assume on-call staff will be reachable by phone 24 hours a day without fail, so appropriate resourcing of technical staff for escalations is vital. It is hugely important that on-call staff have absolute faith in the ability of alarm notification and acknowledgement systems to work as planned every time, otherwise people will simply lose sleep when on-call, and that can lead to mistakes which are unaffordable.
Planned preventative maintenance of all of the above mentioned systems and more, is once again, a cornerstone of critical facilities management. All too often though, this has become a “box ticking” exercise. Making sure that the maintenance vendors of your critical infrastructure know what your expectations are with regard to preparation of risk assessment and work method statements, access arrangements, building and facility inductions and change control protocols is extremely important. It is time consuming for vendors, and the sooner your expectations are understood and reinforced, the sooner a streamlined maintenance program can be implemented. Detailed maintenance reports need to be a contractual requirement, and need to have their scope and content clearly defined for the benefit of all stakeholders. Being able to view a maintenance reporting history for every critical asset in your facility will go a long way towards minimising the risk of component failure, and therefore, down time. Maintenance vendors as well as facility managers will have peace of mind, knowing that all contractual obligations can be shown to be successfully carried out. This however will not guarantee that systems will not fail; they will.
Critical facility managers in short, must simply leave nothing to chance. He or she must also acknowledge, and be prepared for the fact, that despite his or her best efforts, things will at some stage go wrong. Systems and components will fail and alarms will sound. Their superiors will be hounding them for information, demanding to know the reasons for failure, and seek a quick return to normal conditions. Always remember, that when the pressure is really on, the truth is your friend. Always! If you’re unsure of exactly what is going on in the heat of an incident, don’t be afraid to admit it. Never give unrealistic resolution times in order to pacify your superiors, and never try to cover up a mistake; you’ll be found out almost every time, and your own reputation may suffer.
In summary, know your systems, know your vendors, know your limits and capabilities, and stay calm when things go wrong.