CLI parameters

Help screen

Options

mdedup

Deduplicate mails from multiple sources.

Process:
- Step #1: load mails from their sources.
- Step #2: compute the canonical hash of each mail based on their headers (and
optionally their body), and regroup mails sharing the same hash.
- Step #3: apply a selection strategy on each subset of duplicate mails.
- Step #4: perform an action on all selected mails.
- Step #5: report statistics.
mdedup [OPTIONS] MAIL_SOURCE_1 MAIL_SOURCE_2 ...

Options

-i, --input-format <input_format>

Force all provided mail sources to be parsed in the specified format. If not set, auto-detect the format of sources independently. Auto-detection only supports maildir and mbox format. Use this option to open up other box format, or bypass unreliable detection.

Options:

maildir | mbox | mh | babyl | mmdf

-u, --force-unlock

Remove the lock on mail source opening if one is found.

-h, --hash-header <Header-ID>

Headers to use to compute each mail’s hash. Must be repeated multiple times to set an ordered list of headers. Header IDs are case-insensitive. Repeating entries are ignored.

-m, --minimal-headers <INTEGER>

Minimum number of headers required in a mail to compute its hash. Below this value, we consider not having enough headers to compute a solid hash. Increase this value to be more strict and avoid hashing mails with too few headers (e.g., corrupted mails).

-b, --hash-body <hash_body>

Method used to hash the body of mails. Defaults to skip, which doesn’t hash the body at all: it is the fastest method and header-based hash should be sufficient to determine duplicate set. raw use the body as it is (slow). normalized pre-process the body before hashing, by removing all line breaks and spaces (slowest).

Options:

skip | raw | normalized

-H, --hash-only

Compute and display the internal hashes used to identify duplicates. Do not performs any selection or action.

-s, --strategy <strategy>

Selection strategy to apply within a subset of duplicates. If not set, duplicates will be grouped and counted but all be skipped, selection will be empty, and no action will be performed. Description of each strategy is available further down that help screen.

Options:

select-older | select-oldest | select-newer | select-newest | discard-newest | discard-newer | discard-oldest | discard-older | select-smaller | select-smallest | select-bigger | select-biggest | discard-biggest | discard-bigger | discard-smallest | discard-smaller | select-matching-path | select-non-matching-path | discard-non-matching-path | discard-matching-path | select-one | select-all-but-one | discard-all-but-one | discard-one

-t, --time-source <time_source>

Source of a mail’s time reference used in time-sensitive strategies.

Options:

date-header | ctime

-r, --regexp <REGEXP>

Regular expression on a mail’s file path. Applies to individual mail location for folder-based boxes (maildir, mh). But for file-based boxes (mbox, babyl, mmdf), applies to the whole box’s path, as all mails are packed into one single file. Required in discard-matching-path, discard-non-matching-path, select-matching-path and select-non-matching-path strategies.

-S, --size-threshold <BYTES>

Maximum difference allowed in size between mails sharing the same hash. The whole subset of duplicates will be skipped if at least one pair of mail exceeds the threshold. Set to 0 to enforce strictness and apply selection strategy on the subset only if all mails are exactly the same. Set to -1 to allow any difference and apply the strategy whatever the differences.

-C, --content-threshold <BYTES>

Maximum difference allowed in content between mails sharing the same hash. The whole subset of duplicates will be skipped if at least one pair of mail exceeds the threshold. Set to 0 to enforce strictness and apply selection strategy on the subset only if all mails are exactly the same. Set to -1 to allow any difference and apply the strategy whatever the differences.

-d, --show-diff

Show the unified diff of duplicates not within thresholds.

-a, --action <action>

Action performed on the selected mails. Defaults to copy-selected as it is the safest: it only reads the mail sources and create a brand new mail box with the selection results.

Options:

copy-selected | copy-discarded | move-selected | move-discarded | delete-selected | delete-discarded

-E, --export <MAIL_BOX_PATH>

Location of the destination mail box to where to copy or move deduplicated mails. Required in copy-selected, copy-discarded, move-selected and move-discarded actions.

-e, --export-format <export_format>

Format of the mail box to which deduplication mails will be exported to. Only affects copy-selected, copy-discarded, move-selected and move-discarded actions.

Options:

maildir | mbox | mh | babyl | mmdf

--export-append

If destination mail box already exists, add mails into it instead of interrupting (default behavior). Affect copy-selected, copy-discarded, move-selected and move-discarded actions.

-n, --dry-run

Do not perform any action but act as if it was, and report which action would have been performed otherwise.

--time, --no-time

Measure and print elapsed execution time.

--config <CONFIG_PATH>

Location of the configuration file. Supports local path with glob patterns or remote URL.

--no-config

Ignore all configuration files and only use command line parameters and environment variables.

--validate-config <validate_config>

Validate the configuration file and exit.

--accessible

Accessibility mode: disable colors and render tables in a plain, screen-reader-friendly format.

--color <color>

Colorize the output. A bare –color is the same as –color=always.

Options:

auto | always | never

--no-color

Disable colorization (alias of –color=never).

--progress, --no-progress

Show progress indicators during long operations. Disabled for non-interactive output (pipes, dumb terminals, CI) and by –accessible.

--theme <theme>

Color theme used for help screens.

--show-params

Show all CLI parameters, their provenance, defaults and value, then exit.

--table-format <table_format>

Rendering style of tables.

Options:

aligned | asciidoc | colon-grid | csv | csv-excel | csv-excel-tab | csv-unix | double-grid | double-outline | fancy-grid | fancy-outline | github | grid | heavy-grid | heavy-outline | hjson | html | jira | json | json5 | jsonc | latex | latex-booktabs | latex-longtable | latex-raw | mediawiki | mixed-grid | mixed-outline | moinmoin | orgtbl | outline | pipe | plain | presto | pretty | psql | rounded-grid | rounded-outline | rst | simple | simple-grid | simple-outline | textile | toml | tsv | unsafehtml | vertical | xml | yaml | youtrack

--verbosity <LEVEL>

Either CRITICAL, ERROR, WARNING, INFO, DEBUG.

Options:

CRITICAL | ERROR | WARNING | INFO | DEBUG

-v, --verbose
-q, --quiet
--man

Show the command’s man page (roff) and exit.

--version

Show the version and exit.

--jobs <jobs>

Number of parallel jobs used to hash mails (step #2). Accepts an integer, ‘auto’ (one fewer than the host’s logical CPUs) or ‘max’. Defaults to 1 (sequential); higher values speed up –hash-body raw/normalized on large boxes.

Default:

1

Arguments

MAIL_SOURCE_1 MAIL_SOURCE_2 ...

Optional argument(s)

Mail sources to deduplicate. Can be a single mail box or a list of mails.

Environment variables

('MDEDUP_INPUT_FORMAT',)

Provide a default for -i

('MDEDUP_FORCE_UNLOCK',)

Provide a default for -u

('MDEDUP_HASH_HEADER',)

Provide a default for -h

('MDEDUP_MINIMAL_HEADERS',)

Provide a default for -m

('MDEDUP_HASH_BODY',)

Provide a default for -b

('MDEDUP_HASH_ONLY',)

Provide a default for -H

('MDEDUP_STRATEGY',)

Provide a default for -s

('MDEDUP_TIME_SOURCE',)

Provide a default for -t

('MDEDUP_REGEXP',)

Provide a default for -r

('MDEDUP_SIZE_THRESHOLD',)

Provide a default for -S

('MDEDUP_CONTENT_THRESHOLD',)

Provide a default for -C

('MDEDUP_SHOW_DIFF',)

Provide a default for -d

('MDEDUP_ACTION',)

Provide a default for -a

('MDEDUP_EXPORT',)

Provide a default for -E

('MDEDUP_EXPORT_FORMAT',)

Provide a default for -e

('MDEDUP_EXPORT_APPEND',)

Provide a default for --export-append

('MDEDUP_DRY_RUN',)

Provide a default for -n

('MDEDUP_TIME',)

Provide a default for --time

('MDEDUP_CONFIG',)

Provide a default for --config

('MDEDUP_CONFIG',)

Provide a default for --no-config

('MDEDUP_VALIDATE_CONFIG',)

Provide a default for --validate-config

('MDEDUP_ACCESSIBLE',)

Provide a default for --accessible

('MDEDUP_COLOR',)

Provide a default for --color

('MDEDUP_NO_COLOR',)

Provide a default for --no-color

('MDEDUP_PROGRESS',)

Provide a default for --progress

('MDEDUP_THEME',)

Provide a default for --theme

('MDEDUP_SHOW_PARAMS',)

Provide a default for --show-params

('MDEDUP_TABLE_FORMAT',)

Provide a default for --table-format

('MDEDUP_VERBOSITY',)

Provide a default for --verbosity

('MDEDUP_VERBOSE',)

Provide a default for --verbose

('MDEDUP_QUIET',)

Provide a default for --quiet

('MDEDUP_MAN',)

Provide a default for --man

('MDEDUP_VERSION',)

Provide a default for --version

('MDEDUP_JOBS',)

Provide a default for --jobs