The Anatomy of Security Disasters

Get Started. It's Free
or sign up with your email address
The Anatomy of Security Disasters by Mind Map: The Anatomy of Security Disasters

1. Article Source

1.1. Tenable Network Security Blog

1.2. Marcus Ranum, March 2009

1.3. HTML, PDF and PDF of PPT

1.3.1. 10 pages

2. Article Section Outline

2.1. Intro

2.1.1. "they aren’t taking security seriously"

2.1.2. Was it PCI that got security its current place at the table, or was it Heartland Data, ChoicePoint, TJX, and the Social Security Administration? does not seem to matter since both are external

2.1.3. Our challenge, as security practitioners, has always been to balance risk – the tradeoff between the danger of doing something and the opportunity it presents. Since we’re not working in a field where the probabilities are simple, like they are on a roulette wheel, we’ve had to resort to making guesses, and trying to answer unanswerable questions. I don’t know a single senior security practitioner who has not, at some point or other, had to defend an estimated likelihood of a bad thing happening against an estimated business benefit.

2.1.4. In those cases, the result has less to do with security and more to do with whose meeting-organizational skills are superior, or who’s better at explaining their viewpoint.

2.1.5. I’ve seen major security-critical business decisions get made based on whose golf buddy runs what business unit – I’m very skeptical of the notion that "Risk Management" has any value beyond the butt-covering obviousness of having made an attempt.

2.2. Disconnect

2.2.1. inspiration for this paper came from a discussion I had with Alan Paller, the founder of SANS and the CIO Forum.

2.2.2. CIOs were regularly lied to regarding security by their technical staff Technical staff had cheated by doing naughty things like leaving unauthorized connections between critical networks, leaving systems unpatched and so forth

2.2.3. Corporate executives felt that they had done their job when they told technical staff to "make it secure." NFR

2.2.4. Paller believes anything worth doing, can be done safely. Or, at least, with controlled risk.

2.2.5. there is a huge disconnect between what management hears and what they are told – a disconnect so severe that senior management like Alan can deride security practitioners as "whiners" while still expecting them to enable business securely.

2.2.6. short vs. long term view

2.3. It Hasn't Happened Yet

2.3.1. Computer security hasn’t been tagged with an epic failure, yet. not sure this is true or what he means by epic perhaps well something more than billion sof dollars of losses worth wondering why we have not had an epic failure

2.3.2. In its simplest form, the problem with computer security is that (like most risky propositions) it's easy to simply not worry about it as long as "nothing has gone wrong, yet."

2.3.3. That is the anatomy of simple disasters. here is the title

2.3.4. In my opinion, this stage of the Internet security disaster was passed in the late 1990s

2.3.5. It seems as if computer security has had some massive failures (TJX, Choicepoint, the US DOD NIPRnet) - and they are, indeed, significant, but we're now at the point where people are starting to realize critical infrastructure attacks are practical. first CII ref

2.3.6. Quietly, while most security practitioners were worrying about how to disable ActiveX in their browsers, massive numbers of control and process systems were hooked up to networks that, simply put, they shouldn't be. here again recall CII on WEF and that other challenge

2.3.7. I describe that as having happened in the past tense because it's important to emphasize that the computer security disaster has already happened - we simply have not yet reached the end of the sequence of events that started being put into motion in the mid 1990s. what is the math of the risk here? are we inceasing the sev or prob or both over time? difference between something being quite unlikely to increased prob over time

2.3.8. If (or when) something starts to go wrong in that area, we will be seeing the end result of a disaster that occurred in the mid 1990s. cyber terror on CII impact is "vast"

2.3.9. Remember: every fatal skydiving accident is that diver’s first fatal accident.

2.4. Time Line of Disaster

2.4.1. The time line of a typical disaster is straightforward.

2.4.2. At the beginning of the disaster, a bad idea is proposed. Often, someone immediately tries to shoot it down, or point out its flaws.

2.4.3. Anyone who has ever been involved in this kind of disaster, from the technical side, will doubtless recall the horrifying feeling you get when you realize that you're trapped trying to deal with a bad idea. "bad idea zombie." No matter how many times you shoot it, or hit it with a shovel, it just keeps crawling forward: the longer it survives as a zombie, the higher the likelihood that it will be un-killable in the long-term. Its sponsors dig in their heels and get emotionally invested: after all, it may be a zombie, but it’s theirs.

2.4.4. Then comes the most crucial part of the disaster: the point at which management's expectations begin to form a reality gap.

2.4.5. Example One security disaster I was involved with happened in exactly this manner: a senior executive hit upon a bad idea and asked the security team for their input. The security team explained why it was a bad idea; in fact they wrote a brilliantly clear, incisive report that definitively framed the problem. So the executive asked the web design team, who declared it a great idea and "highly do-able" and implemented a prototype. (it can be done, but not securely) Months later, the "whiners" in the security team were presented with a fait accompli in the form of "we're ready to go live with this, would you like to review the security?" Once any significant effort has been expended on the zombie bad idea, the chance of it being killed drops to near zero.

2.5. The Post Disaster

2.5.1. Unfortunately, in most businesses, senior level managers are recruited for being "can do" types who get the job done, which means that you're particularly in danger of having to deal with a senior executive that is comfortably living with a serious reality gap.

2.5.2. What usually happens is that senior management simply claims they were not fully apprised of the decision and that they never would have approved it if they had been - in short "they were lied to" as Alan Paller would say.

2.5.3. What is fascinating and sad to me is that when the dust clears and the bodies are buried, very little has changed that would prevent another disaster from happening.

2.5.4. The reason for this is that all the focus is paid on the tail end of the disaster, when the real disaster happened at the moment when the bad idea was allowed to become a zombie. recall comment from NTT that clean up of disaster creates heroes not preventing it

2.5.5. The only way to prevent security disasters is to have a security team that is fearless about feeding back information up to the top of the chain of command, and to have senior executives who make decisions based on reality rather than a projection of their fantasies. righteous but ...

2.6. Risk Management: Disaster Waiting to Happen

2.6.1. It used to be difficult for a security practitioner to argue against the idea of risk management.

2.6.2. But then the crash Unfortunately for us all, the Wall St crash of Dec 2008 serves as a complete debunking of the value of risk management. All the big firms that lost billions or went out of business had risk management departments and practices and felt they were taking acceptable risks. Perhaps the risk management departments were wrong, or perhaps management was living with a reality gap.

2.6.3. "risk management" is a fiction that plays into the disaster-cycle. The premise of risk management is that you will quantify the risk/reward of a decision, then assess the likely failure modes and attempt to reduce them appropriately in detail. Inherently, the risk management approach is too late in the cycle: we've already chosen to execute a bad idea, and now we're arguing about what we can do to reduce the impact when it goes wrong - not if. reality gaps negates RM I've actually seen this happen before, in an outsourcing project in which it became clear that the outsourcers were going to: a) say whatever took to win the project and b) do whatever they were going to do, anyhow, after they did. The premise of risk management, that the risks of certain activities can be understood and managed, falls apart when you're dealing with a reality gap. quotes Feynman It appears that there are enormous differences of opinion as to the probability of a failure with loss of vehicle and of human life. The estimates range from roughly 1 in 100 to 1 in 100,000. The higher figures come from the working engineers, and the very low figures from management. What are the causes and consequences of this lack of agreement? Since 1 part in 100,000 would imply that one could put a Shuttle up each day for 300 years expecting to lose only one, we could properly ask "What is the cause of management's fantastic faith in the machinery?" Ultimately, risk management is a numbers game; you multiply a wild-ass guess by a fudge factor. Worse, the potential cost of failure is estimated in as a factor, too. So you're trying to balance an unjustified estimate of cost of failure against a wild-ass guess multiplied by a fudge factor. Generally, what is really going on is that risk management is used as a sort of statistical shell-game to manipulate the perceived value of security when dealing with a clueless senior manager. Bluntly: it's lying with statistics. Those who engage in it do so because they think their managers are idiots. The fact that they are often right is sad, but should not surprise anyone. Feynman's description of the foolishness of trying to estimate the "effective lifetime" of a space shuttle main engine should be required reading for anyone who claims to believe risk management is practical. does he say that in the report? To summarize it: you can only play Las Vegas odds-maker when you're working on small numbers of variables and extremely well-understood conditions. refer back to the intro

2.7. Improving Communication and Education

2.7.1. "You need executive management that does not make bad decisions, takes security into account and listens."

2.7.2. In the organizations where I have seen effective communication about security it begins and ends with senior management asking direct questions about security considerations and not accepting hand-waving for an answer.

2.8. Legislating Security Failures

2.8.1. The problem with legislative approaches to encouraging security is that the legislation always happens too late.

2.8.2. think PII Of course, it is way too late for most businesses to even form a vague idea of who has access to what; consequently the press is filled with accounts that read: "laptop full of customer data left in airport departure lounge." The focus is on "what do we do about that data leak?" not "why on earth is customer data wandering around airports on laptops?" the first Gordian Knot In case I am not being sufficiently clear, I think the IT world crossed a Rubicon in the late 1980s, in which control over information was effectively abdicated. that has huge implications for the scope and lethality of security disasters because it generally means that a single penetration into an organization is effectively a complete penetration of all the organization's information assets. raised severity "The wrong decisions got made 15 years ago and now it's too late to go back and un-make them."

2.8.3. Legislation leaves us simply with formalizing damage control when the disasters occur, with a few good ideas for damage containment thrown in where possible. But the assumption (as with risk management) is that security disasters are going to be inevitable, huge, and frequent.

2.9. The Reality Gap

2.9.1. My suspicion is that the "reality gap" between management's expectations and what they actually have out there on their networks is larger than they realize. I think it is vastly larger. If there is a 'reality gap' between how secure our networks are expected to be, and how secure they actually are, how do we bring them back in line with expectations? At this point, I expect most of you to be scratching your heads, thinking, "that's impossible. Those of us who have spent our lives as security "whiners" and "nay-sayers" have cause to be concerned because most of us see that business enablement has always held the upper hand.

2.9.2. In fact, there is no way that it can get better. Because of the depth of the reality gap, "throw it out and start over" is not a justifiable option (after all, nothing terrible has happened yet!) and there is now a huge installed base that represents massive intellectual and financial inertia. web 2.0 Perhaps a few of you have already had the experience of trying to encourage a client to think a little bit before embarking on a Web 2.0 rewrite of a crucial application. That particular train has already left the station and is coming toward us. Old timer security practitioners know that the place to build in security is at design-time, but we are faced with a vast mass of moving code, all of which is past the critical point at which it could really be improved. think about cloud

2.9.3. the only option remaining to the industry is "disaster and patch"

2.10. Space Shuttles

2.10.1. It used to be that the most complicated thing ever built by humans was the space shuttle.

2.10.2. We could argue a lot about how to measure complexity, but nowadays the popular wisdom is that software has become vastly more complicated than the space shuttle ever was. really?

2.10.3. If there were as many space shuttles being flown as commercial jets, they would be raining out of the sky every day.

2.10.4. When the Space Shuttle Challenger blew up on take-off, NASA went into disaster management mode.

2.10.5. A truly epic failure had occurred on prime time television, and everyone was asking "what went wrong?!" here is the definition of epic

2.10.6. The simple answer to give would have been "space travel is dangerous" but unfortunately NASA management had distanced itself from that reality; there was a gigantic reality gap between the actual safety of space flight and the expected safety of space flight.

2.10.7. NASA chartered a blue-ribbon panel of experts to stumble around and write a set of conclusions that would basically read "space travel is dangerous, but NASA's doing a great job."

2.10.8. Instead, something unique happened: Nobel Laureate Richard Feynman got invited to join the panel, and he wound up conducting his own investigation and produced his own report - a masterpiece that succinctly and brilliantly explained how an organization like NASA could establish a reality gap regarding safety explanations. accepts compromises that reduce the performance level - without adjusting their expectations Feynman describes an environment in which management expects a certain level of performance from a component, then accepts compromises that reduce the performance level - without adjusting their expectations. Furthermore, he goes on describe how NASA rocket scientists attempted to wave away significant component failures as "acceptable flight risks" because they had not resulted in a flight failure yet. If the attitude of "this risk is acceptable because it has not resulted in a failure yet" sounds familiar to you, it should. That’s the history of the computer security disaster in a nutshell.

2.10.9. when the Space Shuttle Columbia broke up on re-entry, the failure analysis revealed exactly the same type of expectation to reality gap had evolved regarding the shuttle's tendency to lose tiles, as in the Challenger's solid rocket booster leaks.

2.10.10. Basically, NASA managed to learn nothing from the first failure, for exactly the same reasons that the computer security industry manages to learn nothing from its failures:

2.10.11. fixing the problem would entail re-visiting decisions that were made decades ago and, besides, it would be too expensive to re-visit them now.

2.10.12. Unlike virtually everyone else, the pilots of the space shuttles have a fairly realistic assessment of the situation: "space travel is dangerous."

2.10.13. Putting an organization or a country's information assets online is dangerous, too. Putting them on a network is even more dangerous, and exposing them to the Internet is most dangerous of all There is simply no other conclusion that you can realistically reach. Consequently, I argue that there are very many places where it would make sense to retrench capabilities off the Internet entirely and to reduce the number of network-controllable SCADA systems.

2.11. Breaking the Cycle

2.11.1. The most important thing is to make sure you are direct and honest about expectations at all times.

2.11.2. Do not allow management or clients to believe that they can do dumb things in safety, and do not hide behind bogus probability guesses.

2.11.3. Pre-allocating blame is crucial to keeping the reality gap as small as possible. When management negotiates a control out of the loop, do not simply allow them to assume "it's OK" - go back and remind them that the parameters of the design have changed.

2.11.4. To help bridge the reality gap, you must keep your communications as clear and unambiguous as possible.

2.12. Hope

2.12.1. The failures I am describing are failures of hope - they are the consequence of human optimism.

2.12.2. Web 2.0 Speaking of "dangerous to the point of stupidity," the next disaster is already beginning, and it's in the form of Web2.0. I don't think it will be possible for it to be as bad as Web1.0 was - we are nowhere near done reaping the "benefits" of that one - but the Web2.0 model encourages dis-integration of information assets at the data level. Nobody will even have an idea where their data is getting processed, because it's all being sent "out there" for mysterious things to be done to it. What does "trust model" even mean in that kind of environment?


2.13. Giving Up

2.13.1. When things get sufficiently bad, eventually you simply give up.

2.13.2. Think about that for a second: if it is prohibitively expensive to figure out what went wrong, then it's impossible to fix the problem. You're left with no alternative but to slap duct tape on it and keep going and hope that the duct tape landed on the right spot.

2.13.3. But, unless you understand the problem, you're left with the fact that you have a flawed design and your failure rate (all things being equal) will be pretty constant.

2.13.4. Put another way: we can state that space shuttles seem to have about a 1% failure rate (with catastrophic loss) per flight - which means that NASA will run out of shuttles in a few years. Fixing the basic design is not an option.

2.13.5. If you look at complex Internet systems or, say, a government's IT infrastructure, if the failure rate remains constant but we depend on them more and more, the cost of IT security failures will inexorably go off the chart.

2.13.6. We will reap what we are sowing today, and it will be a horrible, stinking crop of failure.

2.13.7. Obviously, from the content and the tone of this presentation, I think it is already too late.

2.13.8. There is too much momentum to an inherently dangerous process, and it will go forward until there are severe-enough disasters that something has to change.

2.13.9. But, consider when you look at an organization like NASA that can lose not one, but two, multibillion-dollar space shuttles and their pilots to the same kind of reality gap, it will take something extremely severe to wake up a national-level response.

2.13.10. What might that be? We already have US Pentagon spokespeople alleging that "Chinese hackers" have stolen "10+ terabytes" of information from the DoD's unclassified networks - such an information leak could result in a superpower transitioning into a 3rd rate power, but the failure would be too complex for anyone to figure out.