mail_deduplicate packageΒΆ

Expose package-wide elements.

SubmodulesΒΆ

mail_deduplicate.action moduleΒΆ

mail_deduplicate.action.copy_mails(dedup, mails)[source]ΒΆ

Copy provided mails to a brand new box or an existing one.

Return type:

None

mail_deduplicate.action.move_mails(dedup, mails)[source]ΒΆ

Move provided mails to a brand new box or an existing one.

Return type:

None

mail_deduplicate.action.delete_mails(dedup, mails)[source]ΒΆ

Remove provided mails in-place, from their original boxes.

Return type:

None

mail_deduplicate.action.copy_selected(dedup)[source]ΒΆ

Copy all selected mails to a brand new box.

Return type:

None

mail_deduplicate.action.copy_discarded(dedup)[source]ΒΆ

Copy all discarded mails to a brand new box.

Return type:

None

mail_deduplicate.action.move_selected(dedup)[source]ΒΆ

Move all selected mails to a brand new box.

Return type:

None

mail_deduplicate.action.move_discarded(dedup)[source]ΒΆ

Move all discarded mails to a brand new box.

Return type:

None

mail_deduplicate.action.delete_selected(dedup)[source]ΒΆ

Remove in-place all selected mails, from their original boxes.

Return type:

None

mail_deduplicate.action.delete_discarded(dedup)[source]ΒΆ

Remove in-place all discarded mails, from their original boxes.

Return type:

None

class mail_deduplicate.action.Action(*values)[source]ΒΆ

Bases: Enum

Define all available action IDs.

COPY_SELECTED = 'copy-selected'ΒΆ
COPY_DISCARDED = 'copy-discarded'ΒΆ
MOVE_SELECTED = 'move-selected'ΒΆ
MOVE_DISCARDED = 'move-discarded'ΒΆ
DELETE_SELECTED = 'delete-selected'ΒΆ
DELETE_DISCARDED = 'delete-discarded'ΒΆ
property action_function: CallableΒΆ

Return the action function associated with this action.

perform_action(dedup)[source]ΒΆ

Performs the action on selected mail candidates.

Return type:

None

mail_deduplicate.cli moduleΒΆ

mail_deduplicate.cli.DEFAULT_HASH_HEADERS: tuple[str, ...] = ('Date', 'From', 'To', 'Subject', 'MIME-Version', 'Content-Type', 'Content-Disposition', 'User-Agent', 'X-Priority', 'Message-ID')ΒΆ

Default ordered list of headers to use to compute the unique hash of a mail.

By default we choose to exclude:

CC

Since mailman apparently sometimes trims list members from the CC header to avoid sending duplicates. Which means that copies of mail reflected back from the list server will have a different CC to the copy saved by the MUA at send-time.

BCC

Because copies of the mail saved by the MUA at send-time will have BCC, but copies reflected back from the list server won’t.

Reply-To

Since a mail could be CC’d to two lists with different Reply-To munging options set.

class mail_deduplicate.cli.Config[source]ΒΆ

Bases: TypedDict

Holds global configuration.

input_format: BoxFormat | NoneΒΆ
force_unlock: boolΒΆ
hash_headers: tuple[str, ...]ΒΆ
hash_body: BodyHasherΒΆ
hash_only: boolΒΆ
size_threshold: intΒΆ
content_threshold: intΒΆ
show_diff: boolΒΆ
strategy: StrategyΒΆ
time_source: TimeSourceΒΆ
regexp: re.Pattern | NoneΒΆ
action: ActionΒΆ
export: Path | NoneΒΆ
export_format: BoxFormatΒΆ
export_append: boolΒΆ
dry_run: boolΒΆ
mail_deduplicate.cli.normalize_headers(ctx, param, value)[source]ΒΆ

Validate headers provided as parameters to the CLI.

Headers are case-insensitive in Python implementation, so we normalize them to lower-case.

We then deduplicate them, while preserving order.

Mail headers are expected to be composed of ASCII characters between 33 and 126 (both inclusive) according to RFC-5322.

Return type:

tuple[str, ...]

mail_deduplicate.cli.compile_regexp(ctx, param, value)[source]ΒΆ

Validate and compile regular expression provided as parameters to the CLI.

Return type:

Pattern[str] | None

class mail_deduplicate.cli.MdedupCommand(*args, version=None, extra_option_at_end=True, populate_auto_envvars=True, **kwargs)[source]ΒΆ

Bases: ExtraCommand

List of extra parameters:

Parameters:
  • version (str | None) – allows a version string to be set directly on the command. Will be passed to the first instance of ExtraVersionOption parameter attached to the command.

  • extra_option_at_end (bool) – reorders all parameters attached to the command, by moving all instances of ExtraOption at the end of the parameter list. The original order of the options is preserved among themselves.

  • populate_auto_envvars (bool) – forces all parameters to have their auto-generated environment variables registered. This address the shortcoming of click which only evaluates them dynamiccaly. By forcing their registration, the auto-generated environment variables gets displayed in the help screen, fixing click#2483 issue. On Windows, environment variable names are case-insensitive, so we normalize them to uppercase.

By default, these Click context settings are applied:

Additionally, these Cloup context settings are set:

Click Extra also adds its own context_settings:

  • show_choices = None (Click Extra feature)

    If set to True or False, will force that value on all options, so we can globally show or hide choices when prompting a user for input. Only makes sense for options whose prompt property is set.

    Defaults to None, which will leave all options untouched, and let them decide of their own show_choices setting.

  • show_envvar = None (Click Extra feature)

    If set to True or False, will force that value on all options, so we can globally enable or disable the display of environment variables in help screen.

    Defaults to None, which will leave all options untouched, and let them decide of their own show_envvar setting. The rationale being that discoverability of environment variables is enabled by the --show-params option, which is active by default on extra commands. So there is no need to surcharge the help screen.

    This addresses the click#2313 issue.

To override these defaults, you can pass your own settings with the context_settings parameter:

@command(
    context_settings={
        "show_default": False,
        ...
    }
)
format_help(ctx, formatter)[source]ΒΆ

Extend the help screen with the description of all available strategies.

Return type:

None

mail_deduplicate.deduplicate moduleΒΆ

mail_deduplicate.deduplicate.STATS_DEF = {'mail_copied': 'Number of mails copied from their original mailbox to another.', 'mail_deleted': 'Number of mails deleted from their mailbox in-place.', 'mail_discarded': 'Number of mails discarded from the final selection.', 'mail_duplicates': 'Number of duplicate mails (sum of mails in all duplicate sets with at least 2 mails).', 'mail_found': 'Total number of mails encountered from all mail sources.', 'mail_hashes': 'Number of unique hashes.', 'mail_moved': 'Number of mails moved from their original mailbox to another.', 'mail_rejected': 'Number of mails rejected individually because they were unparsable or did not have enough metadata to compute hashes.', 'mail_retained': 'Number of valid mails parsed and retained for deduplication.', 'mail_selected': 'Number of mails kept in the final selection on which the action will be performed.', 'mail_skipped': 'Number of mails ignored in the selection step because the whole set they belong to was skipped.', 'mail_unique': 'Number of unique mails (which where automatically added to selection).', 'set_deduplicated': 'Number of valid sets on which the selection strategy was successfully applied.', 'set_single': 'Total number of sets containing only a single mail with no applicable strategy. They were automatically kept in the final selection.', 'set_skipped_content': 'Number of sets skipped from the selection process because they were too dissimilar in content.', 'set_skipped_encoding': 'Number of sets skipped from the selection process because they had encoding issues.', 'set_skipped_size': 'Number of sets skipped from the selection process because they were too dissimilar in size.', 'set_skipped_strategy': 'Number of sets skipped from the selection process because the strategy could not be applied.', 'set_total': 'Total number of duplicate sets.'}ΒΆ

All tracked statistics and their definition.

exception mail_deduplicate.deduplicate.SizeDiffAboveThreshold[source]ΒΆ

Bases: Exception

Difference in mail size is greater than threshold..

exception mail_deduplicate.deduplicate.ContentDiffAboveThreshold[source]ΒΆ

Bases: Exception

Difference in mail content is greater than threshold..

class mail_deduplicate.deduplicate.BodyHasher(*values)[source]ΒΆ

Bases: Enum

Enumeration of available body hashing methods.

SKIP = 'skip'ΒΆ
RAW = 'raw'ΒΆ
NORMALIZED = 'normalized'ΒΆ
hash_function()[source]ΒΆ

Returns the hashing function corresponding to the body hasher.

class mail_deduplicate.deduplicate.DuplicateSet(hash_key, mail_set, conf)[source]ΒΆ

Bases: object

A set of mails sharing the same hash.

Implements all the safety checks required before we can apply any selection strategy.

Load-up the duplicate set of mail and freeze pool.

Once loaded-up, the pool of parsed mails is considered frozen for the rest of the duplicate set’s life. This allows aggressive caching of lazy instance attributes depending on the pool content.

selection: set[Message]ΒΆ

Mails selected after application of selection strategy.

discard: set[Message]ΒΆ

Mails discarded after application of selection strategy.

confΒΆ

Configuration shared from the main deduplication process.

pool: frozenset[DedupMailMixin]ΒΆ

Pool referencing all duplicated mails and their attributes.

stats: CounterΒΆ

Set metrics.

property size: int[source]ΒΆ

Returns the number of mails in the duplicate set.

property newest_timestamp[source]ΒΆ

Returns the newest timestamp among all mails in the set.

property oldest_timestamp[source]ΒΆ

Returns the oldest timestamp among all mails in the set.

property biggest_size[source]ΒΆ

Returns the biggest size among all mails in the set.

property smallest_size[source]ΒΆ

Returns the smallest size among all mails in the set.

check_differences()[source]ΒΆ

Ensures all mail differs in the limits imposed by size and content thresholds.

Compare all mails of the duplicate set with each other, both in size and content. Raise an error if we’re not within the limits imposed by the threshold settings.

diff(mail_a, mail_b)[source]ΒΆ

Return difference in bytes between two mails’ normalized body.

Todo

Rewrite the diff algorithm to not rely on naive unified diff result parsing.

pretty_diff(mail_a, mail_b)[source]ΒΆ

Returns a verbose unified diff between two mails’ normalized body.

categorize_candidates()[source]ΒΆ

Process the list of duplicates for action.

Run preliminary checks, then apply the strategy to the pool of mails.

The process results in two subsets of mails: the selected and the discarded.

class mail_deduplicate.deduplicate.Deduplicate(conf)[source]ΒΆ

Bases: object

Load-up messages, search for duplicates, apply selection strategy and perform the action.

Similar messages sharing the same hash are grouped together in a DuplicateSet.

sources: dict[str, Mailbox]ΒΆ

Index of mail sources by their full, normalized path. So we can refer to them in Mail instances. Also have the nice side effect of natural deduplication of sources themselves.

mails: dict[str, set[Message]]ΒΆ

All mails grouped by hashes.

selection: set[Message]ΒΆ

Mails selected after application of selection strategy.

discard: set[Message]ΒΆ

Mails discarded after application of selection strategy.

confΒΆ

Configuration shared across the deduplication process.

stats: CounterΒΆ

Deduplication statistics.

add_source(source_path)[source]ΒΆ

Registers a source of mails, validates and opens it.

Duplicate sources of mails are not allowed, as when we perform the action, we use the path as a unique key to tie back a mail from its source.

Return type:

None

hash_all()[source]ΒΆ

Browse all mails from all registered sources, compute hashes and group mails by hash.

Displays a progress bar as the operation might be slow.

build_sets()[source]ΒΆ

Build the selected and discarded sets from each duplicate set.

We apply the selection strategy one duplicate set at a time to keep memory footprint low and make the log easier to read.

close_all()[source]ΒΆ

Close all open boxes.

report()[source]ΒΆ

Returns a text report of user-friendly statistics and metrics.

assert_stats(first, operator, second)[source]ΒΆ

Render failed stats assertions in plain English.

..hint ::

If inconsistent metrics are detected, the CLI will exit with a code numbered 115.

This has been arbitrarily chosen in PR #842, to make it unlikely to conflict with other exit codes. Users can rely on 115 meaning that the statistics checks failed.

check_stats()[source]ΒΆ

Perform some high-level consistency checks on metrics.

Helps users reports tricky edge-cases.

mail_deduplicate.mail moduleΒΆ

exception mail_deduplicate.mail.TooFewHeaders[source]ΒΆ

Bases: Exception

Not enough headers were found to produce a solid hash.

class mail_deduplicate.mail.TimeSource(*values)[source]ΒΆ

Bases: Enum

Enumeration of all supported mail timestamp sources.

DATE_HEADER = 'date-header'ΒΆ

Timestamp sourced from the message’s Date header.

CTIME = 'ctime'ΒΆ

Timestamp is from the email’s file on the filesystem.

Attention

Only available for maildir sources.

mail_deduplicate.mail.ADDRESS_HEADERS = frozenset({'bcc', 'cc', 'delivered-to', 'disposition-notification-to', 'envelope-to', 'from', 'original-recipient', 'reply-to', 'resent-bcc', 'resent-cc', 'resent-from', 'resent-reply-to', 'resent-sender', 'resent-to', 'return-path', 'sender', 'to', 'x-envelope-from', 'x-envelope-to', 'x-original-to'})ΒΆ

Headers that contain email addresses.

Hint

Headers from which quotes should be discarded.

E.g.:

"Bob" <bob@example.com>

should hash to the same thing as:

Bob <bob@example.com>

Attention

These IDs should be kept lower-case, because they are compared to the one provided to those provided to the -h/--hash-header option, that is carried by the hash_headers property of the configuration.

mail_deduplicate.mail.MINIMAL_HEADERS_COUNT = 4ΒΆ

Below this value, we consider not having enough headers to compute a solid hash.

class mail_deduplicate.mail.DedupMailMixin(message=None)[source]ΒΆ

Bases: Message

Message with deduplication-specific properties and utilities.

Extends standard library’s mailbox.Message, and shouldn’t be used directly, but composed with mailbox.Message sub-classes.

Initialize a Message instance.

source_path: str | NoneΒΆ

Normalized path to the mailbox this message originates from.

mail_id: str | NoneΒΆ

Mail ID used to uniquely refers to it in the context of its source.

path: strΒΆ

Real filesystem location of the mail.

Returns the individual mail’s file for folder-based box types (maildir & co.), but returns the whole box path for file-based boxes (mbox & co.). Only used by regexp-based selection strategies.

conf: ConfigΒΆ

Global configuration

add_box_metadata(box, mail_id)[source]ΒΆ

Post-instantiation utility to attach to mail some metadata derived from its parent box.

Called right after the __init__() constructor.

This allows the mail to carry its own information on its origin box and index.

Return type:

None

property uid: tuple[str | None, str | None][source]ΒΆ

Unique ID of the mail.

property parsed_date: float | None[source]ΒΆ

Parse the mail’s date header into float timestamp.

Returns None if the mail has no valid date header.

property timestamp: float | None[source]ΒΆ

Compute the normalized canonical timestamp of the mail.

Sourced from the message’s Date header by default. In the case of maildir, can be sourced from the email’s file from the filesystem.

Warning

ctime does not refer to creation time on POSIX systems, but rather the last time the inode data changed.

Todo

Investigate what mailbox.MaildirMessage.get_date() does and if we can use it.

property size: int[source]ΒΆ

Returns canonical mail size.

Size is computed as the length of the message body, i.e. the payload of the mail stripped of all its headers, not from the mail file persisting on the file- system.

Todo

Allow customization of the way the size is computed, by getting the file size instead for example: `python size = os.path.getsize(mail_file) `

property body_lines: list[str][source]ΒΆ

Return a normalized list of lines from message’s body.

property subject: str[source]ΒΆ

Normalized subject.

Only used for debugging and human-friendly logging.

hash_key()[source]ΒΆ

Returns the canonical hash of a mail.

Caution

This method hasn’t been made explicitly into a cached property in order to reduce the overall memory footprint.

Return type:

str

property hash_raw_body: str[source]ΒΆ

Returns the canonical body hash of a mail.

property hash_normalized_body: str[source]ΒΆ

Returns the normalized body hash of a mail.

property canonical_headers: tuple[tuple[str, str], ...][source]ΒΆ

Returns the full list of all canonical headers names and values in preparation for hashing.

pretty_canonical_headers()[source]ΒΆ

Renders a table of headers names and values used to produce the mail’s hash.

Caution

This method hasn’t been made explicitly into a cached property in order to reduce the overall memory footprint.

Returns a string ready to be printed.

Return type:

str

serialized_headers()[source]ΒΆ

Serialize the canonical headers into a single string ready to be hashed.

At this point we should have at an absolute minimum of headers.

Caution

This method hasn’t been made explicitly into a cached property in order to reduce the overall memory footprint.

Return type:

bytes

normalized_header_values(header_id)[source]ΒΆ

Returns all normalized values of a header.

Values are cleaned-up into their canonical form.

Return type:

Iterator[str]

mail_deduplicate.mail_box moduleΒΆ

Utilities to read and write mail boxes in various formats.

Based on Python’s standard library mailbox module.

class mail_deduplicate.mail_box.BoxStructure(*values)[source]ΒΆ

Bases: Enum

Box structures can be file-based or folder-based.

FOLDER = 1ΒΆ
FILE = 2ΒΆ
class mail_deduplicate.mail_box.MaildirDedupMail(message=None)[source]ΒΆ

Bases: DedupMailMixin, MaildirMessage

Extend the default message factory for Maildir with deduplication capabilities.

Initialize a MaildirMessage instance.

class mail_deduplicate.mail_box.mboxDedupMail(message=None)[source]ΒΆ

Bases: DedupMailMixin, mboxMessage

Extend the default message factory for mbox with deduplication capabilities.

Initialize an mboxMMDFMessage instance.

class mail_deduplicate.mail_box.MHDedupMail(message=None)[source]ΒΆ

Bases: DedupMailMixin, MHMessage

Extend the default message factory for MH with deduplication capabilities.

Initialize an MHMessage instance.

class mail_deduplicate.mail_box.BabylDedupMail(message=None)[source]ΒΆ

Bases: DedupMailMixin, BabylMessage

Extend the default message factory for Babyl with deduplication capabilities.

Initialize a BabylMessage instance.

class mail_deduplicate.mail_box.MMDFDedupMail(message=None)[source]ΒΆ

Bases: DedupMailMixin, MMDFMessage

Extend the default message factory for MMDF with deduplication capabilities.

Initialize an mboxMMDFMessage instance.

class mail_deduplicate.mail_box.BoxFormat(base_class, structure, message_class)[source]ΒΆ

Bases: Enum

IDs of all the supported box formats and their metadata.

Each entry is associated to: - their original base class, - the structure they implement (file-based or folder-based), - the custom message factory class to use.

From these, we can derive the proper constructor with our own custom DedupMail factory.

Hint

This could be extended in the future to add support for other mailbox formats and sources, like Gmail accounts, IMAP servers, etc.

MAILDIR = (<class 'mailbox.Maildir'>, BoxStructure.FOLDER, <class 'mail_deduplicate.mail_box.MaildirDedupMail'>)ΒΆ
MBOX = (<class 'mailbox.mbox'>, BoxStructure.FILE, <class 'mail_deduplicate.mail_box.mboxDedupMail'>)ΒΆ
MH = (<class 'mailbox.MH'>, BoxStructure.FOLDER, <class 'mail_deduplicate.mail_box.MHDedupMail'>)ΒΆ
BABYL = (<class 'mailbox.Babyl'>, BoxStructure.FILE, <class 'mail_deduplicate.mail_box.BabylDedupMail'>)ΒΆ
MMDF = (<class 'mailbox.MMDF'>, BoxStructure.FILE, <class 'mail_deduplicate.mail_box.MMDFDedupMail'>)ΒΆ
property constructorΒΆ

Return a constructor for this box format with our custom message factory.

mail_deduplicate.mail_box.FOLDER_FORMATS = (BoxFormat.MAILDIR, BoxFormat.MH)ΒΆ

Box formats implementing a folder-based structure.

Is a tuple to keep natural order defined by BoxFormat.

mail_deduplicate.mail_box.FILE_FORMATS = (BoxFormat.MBOX, BoxFormat.BABYL, BoxFormat.MMDF)ΒΆ

Box formats implementing a file-based structure.

Is a tuple to keep natural order defined by BoxFormat.

mail_deduplicate.mail_box.MAILDIR_SUBDIRS = frozenset({'cur', 'new', 'tmp'})ΒΆ

List of required sub-folders defining a properly structured maildir.

mail_deduplicate.mail_box.autodetect_box_type(path)[source]ΒΆ

Auto-detect the format of the mailbox located at the provided path.

Returns a box type as indexed in the BOX_TYPES dictionary above.

If the path is a file, then it is considered as an mbox. Else, if the provided path is a folder and feature the expecteed sub-directories, it is parsed as a maildir.

Todo

Future finer autodetection heuristics should be implemented here. Some ideas:

  • single mail from a maildir

  • plain text mail content

  • other mailbox formats supported in Python’s standard library:

    • MH

    • Babyl

    • MMDF

Return type:

BoxFormat

mail_deduplicate.mail_box.open_box(path, box_format=None, force_unlock=False)[source]ΒΆ

Open a mail box.

Returns a list of boxes, one per sub-folder. All are locked, ready for operations.

If box_format is provided, forces the opening of the box in the specified format. Else, defaults to autodetection.

Return type:

list[Mailbox]

mail_deduplicate.mail_box.lock_box(box, force_unlock)[source]ΒΆ

Lock an opened box and allows for forced unlocking.

Returns the locked box.

Return type:

Mailbox

mail_deduplicate.mail_box.open_subfolders(box, force_unlock)[source]ΒΆ

Browse recursively the subfolder tree of a box.

Returns a list of opened and locked boxes, each for one subfolder.

Skips box types not supporting subfolders.

Return type:

list[Mailbox]

mail_deduplicate.mail_box.create_box(path, box_format, export_append=False)[source]ΒΆ

Creates a brand new box from scratch.

Return type:

Mailbox

mail_deduplicate.strategy moduleΒΆ

Strategy definitions.

mail_deduplicate.strategy.select_older(duplicates)[source]ΒΆ

Select all older duplicates.

Discards the newests, i.e. the subset sharing the most recent timestamp.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_oldest(duplicates)[source]ΒΆ

Select all the oldest duplicates.

Discards the newers, i.e. all mail of the duplicate set but those sharing the oldest timestamp.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_newer(duplicates)[source]ΒΆ

Select all newer duplicates.

Discards the oldest, i.e. the subset sharing the most ancient timestamp.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_newest(duplicates)[source]ΒΆ

Select all the newest duplicates.

Discards the olders, i.e. all mail of the duplicate set but those sharing the newest timestamp.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_smaller(duplicates)[source]ΒΆ

Select all smaller duplicates.

Discards the biggests, i.e. the subset sharing the biggest size.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_smallest(duplicates)[source]ΒΆ

Select all the smallest duplicates.

Discards the biggers. i.e. all mail of the duplicate set but those sharing the smallest size.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_bigger(duplicates)[source]ΒΆ

Select all bigger duplicates.

Discards the smallests, i.e. the subset sharing the smallest size.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_biggest(duplicates)[source]ΒΆ

Select all the biggest duplicates.

Discards the smallers, i.e. all mail of the duplicate set but those sharing the biggest size.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_matching_path(duplicates)[source]ΒΆ

Select all duplicates whose file path match the regular expression provided via the –regexp parameter.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_non_matching_path(duplicates)[source]ΒΆ

Select all duplicates whose file path doesn’t match the regular expression provided via the –regexp parameter.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_one(duplicates)[source]ΒΆ

Randomly select one duplicate, and discards all others.

Return type:

set[DedupMailMixin]

mail_deduplicate.strategy.select_all_but_one(duplicates)[source]ΒΆ

Randomly discard one duplicate, and select all others.

Return type:

set[DedupMailMixin]

class mail_deduplicate.strategy.Strategy(*values)[source]ΒΆ

Bases: Enum

Selection strategies to apply on a sets of duplicate mails.

Each strategy in the Enum points to the function implementing the selection logic, by the way of the strategy_function() method.

Strategies whose member value is a string are simply aliases to other strategies, pointing to the name of the function implementing the logic. The other members have integer values, to indicate their function ID is to be derived from the member name. This arrangement allow for each member to have its own existence without being hidden by the aliasing mechanism of Enum.

Aliases are great usability features to represent inverse operations. They helps users to better reason about the selection operators depending on their mental models.

SELECT_OLDER = 1ΒΆ
SELECT_OLDEST = 2ΒΆ
SELECT_NEWER = 3ΒΆ
SELECT_NEWEST = 4ΒΆ
DISCARD_NEWEST = 'select_older'ΒΆ
DISCARD_NEWER = 'select_oldest'ΒΆ
DISCARD_OLDEST = 'select_newer'ΒΆ
DISCARD_OLDER = 'select_newest'ΒΆ
SELECT_SMALLER = 5ΒΆ
SELECT_SMALLEST = 6ΒΆ
SELECT_BIGGER = 7ΒΆ
SELECT_BIGGEST = 8ΒΆ
DISCARD_BIGGEST = 'select_smaller'ΒΆ
DISCARD_BIGGER = 'select_smallest'ΒΆ
DISCARD_SMALLEST = 'select_bigger'ΒΆ
DISCARD_SMALLER = 'select_biggest'ΒΆ
SELECT_MATCHING_PATH = 9ΒΆ
SELECT_NON_MATCHING_PATH = 10ΒΆ
DISCARD_NON_MATCHING_PATH = 'select_matching_path'ΒΆ
DISCARD_MATCHING_PATH = 'select_non_matching_path'ΒΆ
SELECT_ONE = 11ΒΆ
SELECT_ALL_BUT_ONE = 12ΒΆ
DISCARD_ALL_BUT_ONE = 'select_one'ΒΆ
DISCARD_ONE = 'select_all_but_one'ΒΆ
property strategy_function: CallableΒΆ

Return the function’s ID is the value of the Enum member.

apply_strategy(duplicates)[source]ΒΆ

Perform the selection strategy on the provided duplicate set.

Returns a set of selected mails objects.

Return type:

set[DedupMailMixin]