Managing The Pitfalls of Persistent Testing Environments
Despite living over two hundred miles from my immediate family, I have always been called upon as dedicated tech support when gadgets go wrong. I can regale many stories of fixing PCs, printers and mobiles on family visits or over phone calls. The age old strategy of switch if off and on again or some variant has proven to be successful most of the time.
Similar tactics are employed to keep some of our persistent testing environments alive. I dream of a day where we no longer exhibit a reliance on permanent testing environments. The ultimate fantasy would be for all services, queues and any other infrastructure to be spun up and down at the click of a button.
While we have made significant strides in some of our newer microservice components, our legacy applications fall far short of this goal. To this day they are reliant on physical infrastructure that cannot yet be created on demand. Here I reflect on past and present challenges of maintaining our legacy testing environments, and the effects on team productivity.
Constructing the Connection
Accommodating connected systems has been the greatest complication of late. Working in a large multinational corporation introduces many challenges on agility. In the midst of our current transformation, adoption of agile techniques is taking time to permeate through the organisation. While the message echos through the grapevine, we need to engage with traditional waterfall teams and Agile evangelists alike.
Managing upstream applications with less frequent releases means also managing their expectations on our testing environment availability. If their instance is up and running for three months to support a long laboured testing cycle, their default expectation is ours must also be continually available for the same period. This manifests itself in urgent, short notice requests that are expected to be fulfilled, fostering frustration among our developers.
Communication channels need to be well established on both sides for this relationship to succeed. Striking a balance between facilitating their testing as well as our own is critical. Agreements on notice and availability of our testing environments have only just been established. The jury is out to measure the effectiveness of these SLAs. Only time will tell if our ongoing strategic transformation will better support all applications.
Inside Out
The outside perceptions are important. Looking to the inward affect on team productivity has provided some fascinating observations. Despite having support rotations to ease the burden, fixing fractured environments does often fall on teams testing features. Obviously production takes precedence. Regardless, even a short stint of fixing testing environments that break every few days is far from satisfying.
Automated deployments only get us so far. Without collective ownership and automated verification techniques, shared components can be left in a broken state. Nothing breeds animosity more than feeling you are continually firefighting issues introduced by another squad.
Introduction of the reset button process to refresh the environment has addressed many of these issues. An unintended side-effect has been engineers are less likely to investigate and diagnose issues before applying the fix. Mentoring by more experienced engineers addresses knowledge gaps. Nevertheless, instilling the same level of ownership by senior developers takes far more time to transfer to more junior programmers.
Stop Breaking Down
Continuous improvement is an imperative technique for addressing the trials of our testing environments. The aforementioned reset button and communication protocols with other teams are great strides forward by the team. These small increments should be nurtured by managers, and balances with delivering of client features.
These changes can only go so far. A legacy system will remain legacy without significant intervention to reduce the behemoth of technical debt. Microservices and utilisation of container frameworks can be used to better scale solutions. Nevertheless, this solution should be used with caution to avoid creating a distributed monolithic monster.
Infrastructure such as queues and databases are the greater challenge to address. Large organisations need to invest in technologies for generating full application environments, including communication and persistence layers. Replacing the reset button with start and stop will cease with ongoing productivity killer that is our testing environments.
Thanks for reading!