← back esc

Announcing Duplicate Code Cross-Check

· 2 min read
Contents · 4

Shipping a new GitHub Action: Duplicate Code Cross-Check — also available on the GitHub Marketplace.

Why

Agentic programming makes duplicate code a much bigger problem. LLM-based agents tend to copy patterns they’ve already seen in the codebase rather than refactor to reuse them. Without a hard gate, duplication quietly piles up.

I wanted a CI check that:

  1. Runs on every PR automatically.
  2. Fails the build if a PR increases duplication — not just if it exceeds some fixed ceiling.
  3. Posts a clear comparison comment so the author can see exactly what changed.
  4. Points at specific lines of new duplication in the PR diff.

No existing action on the GitHub Marketplace did all four. So I built one.

How it works

Two complementary engines run on every PR:

Running both gives cross-validation. If both flag the same block, it’s almost certainly real. If only one flags it, the signal is weaker but worth a look.

The action does a base-vs-PR comparison, so it catches new duplication introduced by the PR rather than just reporting the total count. You get:

How it’s different

The most prominent existing tool is platisd/duplicate-code-detection-tool (~205 stars). It uses gensim TF-IDF cosine similarity to score whole-file similarity. That’s great for catching files that should be merged at an architectural level.

This action is the inverse: it finds specific copy-pasted blocks with line numbers. The two tools are complementary, not competing. I recommend running both.

A few other actions wrap jscpd or PMD CPD, but none of them compare against the base branch. You can’t tell if a PR introduced duplication or just inherited it. That delta comparison is the whole point.

Try it

- uses: astubbs/duplicate-code-cross-check@v1
  with:
    github-token: ${{ secrets.GITHUB_TOKEN }}
    directories: src

That’s the minimal setup. See the README for full config options, or grab it from the GitHub Marketplace.

Feedback and PRs welcome.

The tweet (draft)
Shipped a new GitHub Action: Duplicate Code Cross-Check. Agentic coding makes duplicate code a bigger problem. LLMs copy patterns instead of refactoring, and it accumulates fast. Two engines (PMD CPD + jscpd), base-vs-PR comparison, fails the build if a PR adds duplication. https://github.com/marketplace/actions/duplicate-code-cross-check
Post this
Share: X LinkedIn