Object-oriented Programming (OOP)

Refactoring

Motivation

There are two extreme attitudes towards changing the design of software:

  • Software engineer that never wants to refactor (If it ain't broken don't fix it)
  • Software engineer that always wants to improve the design of the software

When software is designed there are three "levers" to be adjusted during a project. However, only at the expense of another dimension / attribute. E.g. having both internal and external quality usually results in a prolonged deadline. Usually, the deadline cannot be adjusted so one of the other dimensions have to be sacrificed. The external quality can also often not be sacrificed. Sometimes the scope can be adjusted (delivering less functionality). So in most cases the internal quality is sacrificed.

  • Deadline
  • External Quality: Functionality of software as presented to the end user
  • Internal Quality: Design, factorization, naming, etc.

There are other reasons besides delivery pressure (too many requirements, too little time) that affect the internal quality of software. Explorative programming usually leads to makeshift designs (e.g. no clear vision of what product is supposed to do). Also structural changes are frequently postponed if there is pressure on the external quality / deadline dimension. This leads to technical debts that accumulate.

Programmer as Reader

When software is developed, more code is read than actually written (90% reading, 10% writing). Another problem is that in most cases the code tends to be the only up-to-date "documentation". Creating and keeping all documentation up-to-date is expensive if the documentation is not close to the code (e.g., in a wiki). The code is the document with the most impact on customers (customer doesn't care about documentation). As a consequence, code might be regarded as "executable specification". However, the code is not always able to convey the original intention of a system. Code might do things correctly, but not do the correct things (e.g., customer wanted something different).

"We read to learn to write, we read to find information, and we read in order to rewrite."
A. Goldberg in "Programmer as Reader"

Two things are required to read code: tools with reading support (e.g. IDE) and code that is actually readable.

Principles

What is Refactoring?

In a nutshell refactoring is: Improving the internal structure without changing the external behavior. Software tends to deteriorate over time and thus timely countermeasures are required. Adding new functionality usually increases the "entropy" (e.g. duplications) that we need to fight. To increase the longevity of software, continuous improvement of the design is necessary even after the software has been designed and implemented.

The goal of improvements (i.e. the refactoring) is to increase maintainability and comprehensibility. The improvements are carried out in small steps (incremental) so that the system is consistent after each refactoring. Usually, even intermediate steps of refactoring guarantee this consistency. The test cases act as a safeguard and verify the correctness of the transformations.

Why Refactoring?

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
Martin Fowler

The primary goal of refactoring is improving the design. Adding functionality usually leads to the deterioration of the design (e.g. duplicated code). This is because local changes are applied without knowledge of the larger context. Each inferior change deteriorates the design.

The second goal is to create comprehensible software. When introducing new changes into a software successfully, readability is a key prerequisite for it. In most cases the reader of the code will not be the person who wrote it. Alas, there is nothing worse than changes to a software system one believes to comprehend. If a system is not comprehensive, reading and refactoring at the same time can improve the comprehensibility (e.g. changing the name, moving methods, etc.).

Refactoring also has the goal of increasing productivity. A bad design takes more time for adding new extensions. Sometimes it is better to refactor the software before the extensions are added. When design is not cared about, the development is usually faster in the beginning. However, over time adding new functionality will require more and more time up until the point where it is almost impossible (Design Stamina Hypothesis by Martin Fowler).

Refactoring is the key tactic to enable long-term evolution of software systems. Successful evolution requires substantial changes to architecture and design over time which need to be taken into account. If this evolution is taken into account, over-engineering of first release (Minimum Viable Product) can be avoided and the architecture and the design may grow with the requirements.

It also ensures / improves object-orientation and thus avoids procedural tendencies, improves factorization and introduces design patterns (refactoring to patterns). It can also help to find defects since it might reveal errors or help verifying assumptions.

When Refactoring?

Once is happenstance, twice is coincidence.
The third time it's enemy action.
Aurice Goldfinger in Ian Fleming's Goldfinger

Primarily, refactoring happens when adding functionality. Either the extension is added first, leading to an inferior design that needs to be refactored or the design is improved first which allows the easy extension afterwards. Refactoring also shouldn't be done too aggressively, the first time a design flaw is observed, the extension can just be added. The second time you should complain (e.g. talk to another developer). The third time the refactoring should be done.

The second opportunity to perform refactoring is when fixing errors. If the error was not obvious this can hint at readability / comprehensibility problems that can be solved by a refactoring.

The third opportunity is during code review.

Bad smells in Code

It is hard to decide when a design is bad. Code metrics are too vague:

  • What is the maximum inheritance depth permitted?
  • How many instance variables are allowed?
  • What is acceptable maximum length of methods? (usually more than 50 lines is too much, but depends heavily on context)

As a consequence, code "goodness" is a large grey area and proper assessment requires a lot of experience.

Code reviews may reveal the necessity of design improvements. They can either be included explicitly as part of the development process (e.g., supported by automated tool). Additionally, code review can happen implicitly while fixing errors, adding functionality etc.

Duplicated Code

  • Same expression in two methods: extract expression into distinct method
  • Same expression in two related subclasses: extract expression into base class, pull up field
  • Similar expression in two related subclasses: create template method (factorizes similarities), pull up instance fields
  • Same expression in unrelated classes: Extract class

Long Method

Short methods are easier to use and to override. Additionally, they are easier to test and to reuse. The naming of these short methods is essential. The name should convey the intention and not its operation (Convey what it does, not how it does it). In the best case, a good method name saves you from looking at the body.

  • Standard procedure with long methods: Extract method
  • Problem: often many parameters and local variables:
  • Replace temp with query
  • Introduce Parameter Object
  • Complex logical expressions:
  • Decompose Conditional
public void prepareShipment(List<Item> items) {
  double weight = items.stream.mapToDouble(Item::getWeight()).sum();
  int itemCount = items.size();
  double volume = items.stream.mapToDouble(Item::getVolume()).sum();
  
  // calculate shipping cost <-- typical code smallif (receiver.isPremium()) {
    shippingCost = itemCount*0.1 + volume*0.3;
  } else {
    shippingCost = weight*10.0 + itemCount*0.2 + volume*0.5;
  }
}
// better design: extract into own method
public BigDecimal calculateShippingCost(double weight, int itemCount, double volume) { ... }
// better design: replace temp variable with query
public double getWeight(List<Item> items) {
  return items.stream.mapToDouble(Item::getWeight()).sum();
}
// better design: introduce parameter object
public void prepareShipment(List<Item> items) {
  double weight = items.stream.mapToDouble(Item::getWeight()).sum();
  double itemCount = items.size();
  double volume = items.stream.mapToDouble(Item::getVolume()).sum();
  ShippingParameters shippingParameters = 
     new ShippingParameters(weight, itemCount, volume);
  // parameter object is used
  BigDecimal shippingCost = calculateShippingCost(shippingParameters);
}


Comments

Comments can be useful, however they can also act as "deodorant" and hint on "bad smell". The engineer knows about the bad design and tries to hide it with comments. (e.g. method is too long; it is split into multiple blocks that are commented).

Useful comments describe the rationale for design / implementation. The rationale behind a design decision is usually not visible when the code is read. Also, provide hints about insecurity of developer regarding potentially better approaches that another developer could take.

Bad comments should be substituted with extracted methods, introduced assertions or renamed methods.

Testing

Without automated testing, refactoring is hardly feasible. The tests assert that the refactoring did not change the external behavior / quality of the system. If there are no test cases, the system should first be documented and secured with test cases. Not all tests are equally affected by refactoring:

  • Unit tests: are white-box tests -> likely to be affected
  • Integration tests: are black-box tests -> less likely to be affected
  • Functional tests: should not be affected at all

Most IDEs support automated refactoring today. However, test cases are still needed. In general, statically typed languages are better suited for automated refactoring (IDE knows more about code).

Summary

Refactoring improves design. It does not have to be perfect from the beginning. The software can still be shaped after the implementation. In most cases, starting from scratch (i.e. reimplementation) cannot be afforded.

Refactoring avoids the accumulation of technical debts and makes adding functionality less risky.

The following things should be kept in mind when refactoring:

  • Factorization and encapsulation are paramount
  • Pick a goal and do one step at a time
  • Stop when unsure (consider throwing away changes)
  • In case of troubles backtrack (rather than debugging)
  • Duet: refactoring in pairs is more effective
  • Write test cases
  • Don't refactor ahead! (Do it, when it is necessary.)