What does it do?
Allows you to handle duplicate records within a dataset.
The module gives you a high degree of control over the deduplication method and what to do with duplicate records.
Settings – Dedupe Columns
Setting | Description | Notes |
---|---|---|
Column | Defines which column to base the deduplication on | You can select more than one column. Two records are considered duplicates if all selected columns match in each record |
Settings – Options
Setting | Description | Notes |
---|---|---|
Dedup Behaviour | Defines what to do if a duplicate record is found | Determines which records are excluded in the case of a duplicate (e.g. exclude first, exclude randomly, etc…) |
Output Type | Defines the method for outputting duplicates | You can output records into a separate output or mark duplicates with a flag |
Missing Column Behaviour | Defines the behaviour in the event of a missing dedupe column |
Examples
None
Tips & Tricks
None