Deduplicate

What does it do?

Allows you to handle duplicate records within a dataset.
The module gives you a high degree of control over the deduplication method and what to do with duplicate records.

Settings – Dedupe Columns

Setting	Description	Notes
Column	Defines which column to base the deduplication on	You can select more than one column. Two records are considered duplicates if all selected columns match in each record

Settings – Options

Setting	Description	Notes
Dedup Behaviour	Defines what to do if a duplicate record is found	Determines which records are excluded in the case of a duplicate (e.g. exclude first, exclude randomly, etc…)
Output Type	Defines the method for outputting duplicates	You can output records into a separate output or mark duplicates with a flag
Missing Column Behaviour	Defines the behaviour in the event of a missing dedupe column

Examples

None

Tips & Tricks

None