Two recent stories from the world of ‘big’ engineering got me thinking: the massive delays in the Crossrail Project and the fatal errors in the Boeing 737 Max, both of which seem to have been blighted by issues related to software.
Crossrail, prior to the announcement of delays and overspend, was being lauded as an example of an exemplar on-time, on-budget complex project; a real feather in the cap for British engineering. There were documentaries celebrating the amazing care with which the tunnelling was done to avoid damage at the surface, using precise monitoring and accurately positioned webs of hydraulic grouting to stabilise the ground beneath buildings. Even big data was used to help interpret signals received from a 3D array of monitoring stations, to help to actively manage operations during tunnelling and construction. A truly awesome example of advanced engineering, on an epic scale.
The post-mortem has not yet been done on why the delays came so suddenly upon the project, although the finger is being pointed not at the physical construction, but the digital one. To operate the rail service there must be advanced control systems in place, and to ensure these operate safely, a huge number of tests need to be carried out ‘virtually’ in the first instance, to ensure safety is not compromised.
Software is something that the senior management of traditional engineering companies are uncomfortable with; in the old days you could hit a machine with a hammer, but not a virtual machine. They knew intuitively if someone told them nonsense within their chosen engineering discipline; for example, if a junior engineer planned to pour 1000 cubic metres of cement into a hole and believed it would be set in the morning. But if told that testing of a software sub-system will take 15 days, they wouldn’t have a clue as to whether this was realistic or not; they might even ask “can we push to get this done in 10 days?”.
In the world of software, when budgets and timelines press, the most dangerous word used in projects is ‘hope’. “We hope to be finished by the end of the month”; “we hope to have that bug fixed soon”; and so on Testing is often the first victim of pressurised plans. Junior staff say “we hope to finish”, but by the time the message rises up through the management hierarchy to Board level, there is a confident “we will be finished” inserted into the Powerpoint. Anyone asking tough questions might be seen as slowing the project down when progress needs to be demonstrated.
You can blame the poor (software) engineer, but the real fault lies with the incurious senior management who seem to request an answer they want, rather than try to understand the reality on the ground.
The investigations of the Boeing 737 Max tragedy are also unresolved, but of course, everyone is focusing on the narrow question of the technical design issue related to a critical new feature. There is a much bigger issue at work here.
Arguably, Airbus has pursued the ‘fly by wire’ approach much earlier than Boeing, whose culture has tended to resist over automation of the piloting. Active controls to overcome adverse events has now become part of the design of many modern aircraft, but the issue with the Boeing 737 Max seems to have been that this came along without much in the way of training; and the interaction between the automated controls and the human controls is at the heart of the problem. Was there also a lack of realistic human-centric testing to assess the safety of the combined automated/ human control systems? We will no doubt learn this in due course.
Electronics is of course not new to aerospace industries, but programmable software has grown in importance and increasingly it seems that the issue of growing complexity and how to handle the consequent growth in testing complexity, has perhaps overtaken the abilities of traditional engineering management systems. This is extending to almost every product or project – small and large – as the internet of everything emerges.
This takes me to a scribbled diagram I found in an old notebook – made on a train back in 2014, travelling to London, while I debated the issue of product complexity with a project director for a major engineering project. I have turned this into the Figure below.
There are two aspects of complexity identified for products:
- Firstly, the ‘design complexity’, which can be thought of as the number of components making up the product, but also the configurability and connectivity of those components. If printed on paper, you can thinking of how high the pile of paper would be that identified every component, with a description of their configuration and connection. This would apply to physical aspects but also software too; and all the implied test cases. There is a rapid escalation in complexity as we move from car to airliner to military platform.
- Secondly, the ‘production automation complexity’, which represents the level of automation involved in delivering the required products. Cars as they have become, are seen as having the highest level of production automation complexity.
You can order a specific build of car, with desired ‘extras’, and colour, and then later see it travelling down the assembly line with over 50% of the tasks completely automated; the resulting product with potentially a nearly unique selection of options chosen by you. It is at the pinnacle of production automation complexity but it also has a significant level of design complexity, albeit well short of others shown in the figure.
Whereas an aircraft carrier will in each case be collectively significantly different from any other in existence (even when originally conceived as a copy of an existing model) – with changes being made even during its construction – so does not score so high on ‘production automation complexity’. But in terms of ‘design complexity’ it is extremely high (there are only about 20 aircraft carriers in operation globally and half of these are in the US Navy, which perhaps underlines this point).
As we add more software and greater automation, the complexity grows, and arguably, the physical frame of the product is the least complex part of the design or production process.
I wonder is there a gap between the actual complexity of the final products and an engineering culture that is still heavily weighted towards the physical elements – bonnet of a car, hull of a ship, turbine of a jet engine – and is this gap widening as the software elements grow in scope and ambition?
Government Ministers, like senior managers, will be happy being photographed next to the wing of a new model of airliner – and talk earnestly about workers riveting steel – but what may be more pivotal to success is some software sub-system buried deep in millions of lines of ‘code’; no photo opportunities here.
As we move from traditional linear ‘deterministic’ programming to non-deterministic algorithms – other questions arise about the increasing role of software.
Given incomplete, ambiguous or contradictory inputs the software must make a choice about how to act in real time. It may have to take a virtual vote between independently written algorithms. It cannot necessarily rely on supplementary data from external sources (“no, you are definitely nose diving not stalling!”), for system security reasons if not external data bandwidth reasons.
And so we continue to add further responsibility, onto the shoulders of the non-physical elements of the system.
Are Crossrail and the 737 Max representative of a widening gap, reflected in an inability of existing management structures to manage the complexity and associated risks of the software embedded in complex engineering products and projects?
© Richard W. Erskine, 2019