What does it do?

Allows you to handle duplicate records within a dataset.
The module gives you a high degree of control over the deduplication method and what to do with duplicate records.

Settings – Dedupe Columns

Setting Description Notes
Column Defines which column to base the deduplication on You can select more than one column. Two records are considered duplicates if all selected columns match in each record

Settings – Options

Setting Description Notes
Dedup Behaviour Defines what to do if a duplicate record is found Determines which records are excluded in the case of a duplicate (e.g. exclude first, exclude randomly, etc…)
Output Type Defines the method for outputting duplicates You can output records into a separate output or mark duplicates with a flag
Missing Column Behaviour Defines the behaviour in the event of a missing dedupe column



Tips & Tricks