Announcing Duplicate Code Cross-Check
Contents · 4
Shipping a new GitHub Action: Duplicate Code Cross-Check — also available on the GitHub Marketplace.
Why
Agentic programming makes duplicate code a much bigger problem. LLM-based agents tend to copy patterns they’ve already seen in the codebase rather than refactor to reuse them. Without a hard gate, duplication quietly piles up.
I wanted a CI check that:
- Runs on every PR automatically.
- Fails the build if a PR increases duplication — not just if it exceeds some fixed ceiling.
- Posts a clear comparison comment so the author can see exactly what changed.
- Points at specific lines of new duplication in the PR diff.
No existing action on the GitHub Marketplace did all four. So I built one.
How it works
Two complementary engines run on every PR:
- PMD CPD — deep language awareness for Java, Kotlin, Python, Go, Ruby, Scala, JavaScript, C++, and more. Fewer false positives from imports, annotations, and literals.
- jscpd — language-agnostic token matching. Catches duplication PMD CPD may miss, works on languages PMD doesn’t support.
Running both gives cross-validation. If both flag the same block, it’s almost certainly real. If only one flags it, the signal is weaker but worth a look.
The action does a base-vs-PR comparison, so it catches new duplication introduced by the PR rather than just reporting the total count. You get:
- One unified PR comment with a section per engine, deltas, pass/fail status
- Inline review comments on the diff pointing at new duplicate blocks
- CI failure if duplication exceeds a ceiling or increases vs base
How it’s different
The most prominent existing tool is platisd/duplicate-code-detection-tool (~205 stars). It uses gensim TF-IDF cosine similarity to score whole-file similarity. That’s great for catching files that should be merged at an architectural level.
This action is the inverse: it finds specific copy-pasted blocks with line numbers. The two tools are complementary, not competing. I recommend running both.
A few other actions wrap jscpd or PMD CPD, but none of them compare against the base branch. You can’t tell if a PR introduced duplication or just inherited it. That delta comparison is the whole point.
Try it
- uses: astubbs/duplicate-code-cross-check@v1
with:
github-token: ${{ secrets.GITHUB_TOKEN }}
directories: src
That’s the minimal setup. See the README for full config options, or grab it from the GitHub Marketplace.
Feedback and PRs welcome.
Shipped a new GitHub Action: Duplicate Code Cross-Check. Agentic coding makes duplicate code a bigger problem. LLMs copy patterns instead of refactoring, and it accumulates fast. Two engines (PMD CPD + jscpd), base-vs-PR comparison, fails the build if a PR adds duplication. https://github.com/marketplace/actions/duplicate-code-cross-checkPost this