Finding duplicated code with Maven and PMD

Table of Contents

One of the rules of clean code and software development principle is the DRY - Don’t repeat yourself. One of the assumption of this principle tells that developers should not repeat the same code in many places. This is the theory, but in practice developers tend to copy and paste some parts of the code from time to time (especially in bigger projects). Not following the DRY rule may cause a lot of problems: the codebase is bigger, the maintenance is harder, and it may lead to introducing a bugs if someone will forget to update the copied blocks of code.

In this post I will show how to find code duplicated using Maven with Programming Mistake Detector (PMD) plugin and Copy/Paste detector (CPD). I will also describe how to view the report and improve it by adding links to source code.

Running PMD for first time in big project

Example project

Our project structure looks like this:

> tree
.
├── pom.xml (Maven configuration)
├── readme.md
└── src
    └── main
        └── java
            └── codes
                └── hubertwo
                    └── maven
                        └── pmd
                            ├── ByeService.java (Sample code with duplicated code)
                            └── HelloService.java (Sample code with duplicated code)

For readability purpose the project is pretty simple, we have pom.xml with Maven configuration and two Java classes with duplicated code that we will try to show in PMD/CPD report. How does the duplicated code look like? Both HelloService.java and ByeService.java contain the following method:

private String buildMessage(String name, String message) {
    return String.format("%s %s", message, name);
}

Now when we know the structure of the project let’s configure Maven to scan the code and show us the duplicates.

Maven PMD plugin

First thing first, we need to add plugin in pom.xml file. At the moment we won’t add any configuration parameters to keep it as simple as possible.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-pmd-plugin</artifactId>
            <version>3.16.0</version>
            <configuration/>
        </plugin>
    </plugins>
</build>

Run the detectors and generating the report

Now when everything is in place we can run the plugin and troubleshoot possible issues.

Generating report

To run the CPD and generate report we will use Maven goal pmd:aggregate-cpd.

> mvn pmd:aggregate-cpd 
[INFO] Scanning for projects...
[INFO] 
[INFO] -----------------< codes.hubertwo.maven.pmd:maven-pmd >-----------------
[INFO] Building maven-pmd 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] --- maven-pmd-plugin:3.16.0:aggregate-cpd (default-cli) @ maven-pmd ---
[WARNING] Unable to locate Source XRef to link to - DISABLED
[INFO] PMD version: 6.42.0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  1.233 s
[INFO] Finished at: 2022-03-26T18:50:45+01:00

As we can see the PMD/CMD was executed. Let’s find and open the report.

> tree target 
target
├── cpd.xml
└── site
    └── cpd.html (Report in HTML format)

After opening the report cpd.html we can see that detector did not find anything. Which is a small surprise since from previous steps we know that method buildMessage is identical in both HelloService and ByeService. Let’s move to next step and understand why it happened and see how we can fix it.

We have duplicated code, but PMD report is empty

CPD report does not show duplicated code

As described in previous step we generated report but there were no duplicates found. That’s because our project is small and does not have a lot of code in it.
To fix this we need to change configuration of PMD plugin and decrease the number of minimum tokens.

What does minimum tokens stand for? It’s the number of minimum duplicate size. Let’s add the below configuration to our plugin configuration in pom.xml.


<configuration>
    <minimumTokens>10</minimumTokens> <!-- default value is 100  which might be too high for small projects-->
</configuration>

After changing minimum tokens value let’s run PMD again and open the report as described in previous steps.

CPD report with code duplicates

Changing the number of minimum tokens worked, our report contains list of duplicated code.

By default, during the report generation, PMD plugin tries to add links to source code in report. To achieve this it needs JXR plugin which is not included in Maven PMD plugin. That may lead to warnings during the goal execution:

[WARNING] Unable to locate Source XRef to link to - DISABLED

If you don’t want the links, or you want to get rid of the warning message above you can simply disable the link generation by adding additional configuration param:

<linkXRef>false</linkXRef>

Tweaking the report is out of scope of this post, however I will add it here since it’s super simple to configure it and adds a lot value to the report. Let’s open ‘pom.xml’ one more time and add JXR plugin.

<reporting>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jxr-plugin</artifactId>
            <version>3.2.0</version>
        </plugin>
    </plugins>
</reporting>

Now we can generate the report one more time and test our last changes. Please notice that new goal was added to executejxr:jxr.

> mvn jxr:jxr pmd:aggregate-cpd
mvn  jxr:jxr pmd:aggregate-cpd 
[INFO] Scanning for projects...
[INFO] -----------------< codes.hubertwo.maven.pmd:maven-pmd >-----------------
[INFO] Building maven-pmd 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] 
[INFO] >>> maven-jxr-plugin:3.2.0:jxr (default-cli) > generate-sources @ maven-pmd >>>
[INFO] <<< maven-jxr-plugin:3.2.0:jxr (default-cli) < generate-sources @ maven-pmd <<<
...
[INFO] -----------------< codes.hubertwo.maven.pmd:maven-pmd >-----------------
[INFO] Building maven-pmd 1.0-SNAPSHOT
[INFO] --------------------------------[ jar ]---------------------------------
[INFO] --- maven-pmd-plugin:3.16.0:aggregate-cpd (default-cli) @ maven-pmd ---
[INFO] PMD version: 6.42.0
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS

Perfect, the warnings are gone. From now on our report should also contain the links to source code.

CPD report with links to source code

TL;DR - Maven PMD and JXR configuration

You can find the link to repository with example code at the bottom of this page. However, if you are just looking for ready to use solution, you can find full configuration below.

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-pmd-plugin</artifactId>
            <version>3.16.0</version>
            <configuration>
                <targetJdk>${maven.compiler.source}</targetJdk>
                <minimumTokens>10</minimumTokens>
            </configuration>
        </plugin>
    </plugins>
</build>
<reporting>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-jxr-plugin</artifactId>
            <version>3.2.0</version>
        </plugin>
    </plugins>
</reporting>

Summary

In few steps we configured Maven to run Programming Mistake Detector (PMD). Using the plugin we run Copy/Paste detector (CPD) to check for duplicated code in the project and generated the report to analyze the findings. Even more we improved the report readability by adding links to source code by adding JXR plugin to Maven configuration.

Source code

All code samples and Maven project described in this post is available on GitHub Finding code duplicates with Maven and PMD.
If you found this post useful do not forget to leave a ⭐️ on GitHub :) Thanks!

Read more

  1. Maven PMD plugin documentation
  2. Maven JXR plugin documentation
  3. Don’t repeat yourself rule
  4. PMD documentation