Contribution guideΒΆ
claude.md fileΒΆ
This file provides guidance to Claude Code when working with code in this repository.
Project overviewΒΆ
Mail Deduplicate (mdedup) is a CLI that finds and removes duplicate mails across mail boxes. It reads maildir, mbox, MH, Babyl and MMDF boxes, groups mails into duplicate sets by a hash of selected headers, applies a user-chosen strategy to pick which copies to keep, then copies, moves or deletes the rest.
Upstream conventionsΒΆ
This repository uses the reusable workflows and pyproject.toml configuration from kdeldycke/repomatic and follows the conventions established there. For code style, typing, documentation, testing and design principles, the upstream claude.md is the canonical reference. This file records only what is specific to mail-deduplicate.
Contributing upstream: if you spot a gap or improvement in the reusable workflows or shared conventions, propose it at kdeldycke/repomatic.
Source of truth hierarchyΒΆ
This file defines the rules; the codebase and CI are what those rules are measured against. When they disagree, fix the code to match. If a rule itself is wrong, fix this file.
Keeping this file leanΒΆ
Record only conventions, rationale and non-obvious rules that cannot be discovered by reading the code. Do not paste the module tree, source snippets or general Python knowledge here: reference the source instead.
CommandsΒΆ
# Run the test suite (test dependencies live in the `test` group).
$ uv run --group test -- pytest
# Run a single test.
$ uv run --group test -- pytest tests/test_strategy.py::test_name
# Type-check with the CI-pinned mypy and minimum Python version.
$ uvx repomatic run mypy -- mail_deduplicate tests
# Build the documentation into the gitignored output directory.
$ uv run --group docs -- sphinx-build -b html ./docs ./docs/_build/html
# Run the CLI from the working tree.
$ uv run -- mdedup --help
ArchitectureΒΆ
mdedup runs a four-step pipeline, orchestrated by the Deduplicate class in deduplicate.py. Each step is documented in depth in docs/design.md:
Load the source boxes and read their mails (
mail_box.py,mail.py).Hash mails into
DuplicateSets keyed by a hash of selected headers (deduplicate.py).Select which mails to keep within each set, via a selection strategy (
strategy.py).Act on the selected or discarded mails: copy, move or delete (
action.py).
Module |
Responsibility |
|---|---|
|
The |
|
|
|
|
|
Box formats ( |
|
Selection strategies (oldest/newest, size, content, matching-path, β¦). |
|
Actions applied to the selected or discarded mails. |
Non-obvious rulesΒΆ
DedupMailMixinis mixed into eachmailbox.Messagesubclass at runtime bymake_dedup_mail(), so every box format shares one dedup implementation. Keep format-specific code inmail_box.py; keep format-agnostic dedup logic on the mixin.Three opt-in safeguards make destructive runs safer: a minimal-headers floor, a size threshold and a content threshold. They are described in
docs/design.md; runmdedup --helpfor the exact option names. When changing selection or hashing, re-read those safeguards: they exist to avoid deleting mails that only look like duplicates.