Software Archaeology – Software Architectural Recovery for Legacy Code

It is always hard to work on code that you did not originally build. It is even harder when the original developers who worked on the code are all gone, there is no documentation, and no one is exactly sure how it is implemented. Unfortunately, this is an all-too-common problem for businesses with legacy software applications. These legacy applications typically have been a driver of revenue and profits for the business for a long time, so they are vital. This means that as the business changes, these legacy applications need to be updated as well. But because of architectural drift and erosion, updating and fixing problems is hard and becoming harder all the time.

This is where software archaeology and software architectural recovery is necessary. Software archaeology is the study of poorly documented legacy software. Software architecture recovery is a set of methods used in the extraction of architectural information. Updating and maintaining a legacy software application requires that you start with software archaeology to get an understanding of the software, and how it is implemented and used. Then you can move on to software architectural recovery where you resurrect the original architecture or at least create a more modular/maintainable architecture.

Software archaeology and software architectural recovery are tedious and time-consuming when done by hand. Luckily, there are tools like Lattix and Parasoft that can reduce the time to generate data to move forward with updating and maintaining a legacy software application. Parasoft helps with the static analysis and testing while Lattix helps you understand the architecture and the software at a high level.

How to do Software Archaeology and Software Architectural Recovery on Legacy Applications

When you are first given a legacy software application, it is a good idea to do both software archaeology and software architectural recovery. These two processes will help you evaluate the system and estimate the impact of change. The impact of change will affect how much it will cost to maintain and update the system. Like archaeology, software archaeology is about the study of the past. The goal is to get as good of an understanding of the software and its history as you can. Once you have that understanding, you can begin the software architectural recovery process.

Here are the steps to follow for proper software archaeology:

  • Understand how it is used: This is critical to understanding the history of the legacy application and how it got to its current state. There have probably been many special use cases that have been added over the years that are unique to the business and are reflected in the way the software was designed and implemented. The usage of the software also influences the performance requirements, responding to interactive users is very different from responding to high-frequency programmatic access.
  • Understand how it is deployed: This is something that is usually missed in software archaeology, as many legacy applications were monoliths and there was not much to understand. Now, applications can be distributed in multiple containers and can be broken up into multiple libraries. All of this has an impact on how you maintain and update the software.
  • Understand how it is built: It is necessary to understand how each artifact is built. This is especially true with C/C++, where there are many different compile options and targets for embedded systems. This can mean that there are many different run-time variants from the same source code. If you do not understand these options, you will not be able to analyze the code.
  • Understand how it is structured: When you are maintaining and updating software, it is helpful to know its software architecture. This will allow you to understand the impact of changes you make to the system. Unfortunately, this can be the most complex part of the software archaeology process. It is also very hard to do this by hand. This is where a tool like Lattix can be invaluable, it can help you:
    • Examine the existing artifacts like the folder structure and files at a high level so you can get an overall picture of the structure.
    • Apply partitioning and clustering algorithms. The Lattix Dependency Structure Matrix (DSM) technology allows you to reorder the software with its algorithms to discover layers and independent components.
    • Experiment with what-if architectures. You can create different logical modules and examine dependencies, cycles, and areas of complexity. This knowledge will improve your understanding of the overall system.

Picture_of_DSM

Once you understand the history of the software and its current implementation, you can start the software architectural recovery process. The goal of software architecture recovery is to arrive at an agreement about the organization of a software application and how to update/maintain it going forward. The parts for software architectural recovery are:

  • Visualization: This is one of the results of the software archaeology. Having a visual representation of the current code and a reasoning on why it is structured the way it is, you can reverse engineer the original architecture or at least create an architecture that is more understandable and maintainable.
  • Health of code: The health of the code is typically captured in metrics. Key architectural metrics that will give you the overall health of your software are Stability, Cyclicality, and Coupling. When you track these metrics over time, you can understand whether the changes you are making improve the health (maintainability) of the code.
  • Code violations (static analysis): Finding bugs in your software with static analysis can help improve the performance and user satisfaction with the software. This step can only be done with a static analysis tool, like Parasoft. Usually, areas with a high concentration of bugs are areas that are overly complex and are candidates for refactoring.
  • Testing: Once you understand the code but before you start making changes, you need to make sure you have good test coverage. Again, this is something that is hard to do by hand, but easy with a tool like Parasoft.
  • Documentation: The final part of software architectural recovery is to not repeat the same mistakes as your predecessors, and to document everything that you have learned. This includes software architecture diagrams, the metrics that you are tracking for code health, and tests. This will allow future developers to become productive more quickly.

Evolution: monitor and control the source code and architecture

Once the code is understandable and maintainable, you will need a way to prevent further erosion of the architecture. With Lattix, you can create design rules to prevent changes from causing architectural drift and erosion. With the architectural map of the elements and their dependencies and an understanding of the intended architecture, you can create your design rules. Design rules are permitted dependencies that allow you to follow the intended architecture and ensure that you do not erode the architecture as you change the code. Building the source code in a CI/DevOps pipeline with Lattix and Parasoft, you can constantly monitor the source code changes for static analysis violations, failed tests, and architectural violations. You can also monitor key metrics to make sure that the overall health of the system is maintained.

Conclusion

Software archaeology and software architectural recovery are very valuable when given a legacy application to maintain. These processes will give you a head start in tackling the complex problem of updating and fixing a legacy application. While it is possible to do this all by hand, this can be very time-consuming, especially for large legacy applications. Tools like Lattix and Parasoft can help.