Start Me Up

Managing The Pitfalls of Persistent Testing Environments

Despite living over two hundred miles from my immediate family, I have always been called upon as dedicated tech support when gadgets go wrong. I can regale many stories of fixing PCs, printers and mobiles on family visits or over phone calls. The age old strategy of switch if off and on again or some variant has proven to be successful most of the time.

Similar tactics are employed to keep some of our persistent testing environments alive. I dream of a day where we no longer exhibit a reliance on permanent testing environments. The ultimate fantasy would be for all services, queues and any other infrastructure to be spun up and down at the click of a button.

Managing our legacy testing environments is like trekking through an overgrown and unfamiliar jungle

While we have made significant strides in some of our newer microservice components, our legacy applications fall far short of this goal. To this day they are reliant on physical infrastructure that cannot yet be created on demand. Here I reflect on past and present challenges of maintaining our legacy testing environments, and the effects on team productivity.

Constructing the Connection

Accommodating connected systems has been the greatest complication of late. Working in a large multinational corporation introduces many challenges on agility. In the midst of our current transformation, adoption of agile techniques is taking time to permeate through the organisation. While the message echos through the grapevine, we need to engage with traditional waterfall teams and Agile evangelists alike.

In large Agile transformations, opinions are changed via grapevine whispers as it ripples through the organisation

Managing upstream applications with less frequent releases means also managing their expectations on our testing environment availability. If their instance is up and running for three months to support a long laboured testing cycle, their default expectation is ours must also be continually available for the same period. This manifests itself in urgent, short notice requests that are expected to be fulfilled, fostering frustration among our developers.

Communication channels need to be well established on both sides for this relationship to succeed. Striking a balance between facilitating their testing as well as our own is critical. Agreements on notice and availability of our testing environments have only just been established. The jury is out to measure the effectiveness of these SLAs. Only time will tell if our ongoing strategic transformation will better support all applications.

Inside Out

The outside perceptions are important. Looking to the inward affect on team productivity has provided some fascinating observations. Despite having support rotations to ease the burden, fixing fractured environments does often fall on teams testing features. Obviously production takes precedence. Regardless, even a short stint of fixing testing environments that break every few days is far from satisfying.

Automated deployments only get us so far. Without collective ownership and automated verification techniques, shared components can be left in a broken state. Nothing breeds animosity more than feeling you are continually firefighting issues introduced by another squad.

Hitting the reset button restores order for a little while, at the expense of building engineer expertise

Introduction of the reset button process to refresh the environment has addressed many of these issues. An unintended side-effect has been engineers are less likely to investigate and diagnose issues before applying the fix. Mentoring by more experienced engineers addresses knowledge gaps. Nevertheless, instilling the same level of ownership by senior developers takes far more time to transfer to more junior programmers.

Stop Breaking Down

Continuous improvement is an imperative technique for addressing the trials of our testing environments. The aforementioned reset button and communication protocols with other teams are great strides forward by the team. These small increments should be nurtured by managers, and balances with delivering of client features.

These changes can only go so far. A legacy system will remain legacy without significant intervention to reduce the behemoth of technical debt. Microservices and utilisation of container frameworks can be used to better scale solutions. Nevertheless, this solution should be used with caution to avoid creating a distributed monolithic monster.

The legacy application monster will continue to live only until we commit to the significant intervention to eradicate technical debt

Infrastructure such as queues and databases are the greater challenge to address. Large organisations need to invest in technologies for generating full application environments, including communication and persistence layers. Replacing the reset button with start and stop will cease with ongoing productivity killer that is our testing environments.

Thanks for reading!

%d bloggers like this: