What does it do?
Allows you to handle duplicate records within a dataset.
The module gives you a high degree of control over the deduplication method and what to do with duplicate records.
Settings – Dedupe Columns
| Setting | Description | Notes |
|---|---|---|
| Column | Defines which column to base the deduplication on | You can select more than one column. Two records are considered duplicates if all selected columns match in each record |
Settings – Options
| Setting | Description | Notes |
|---|---|---|
| Dedup Behaviour | Defines what to do if a duplicate record is found | Determines which records are excluded in the case of a duplicate (e.g. exclude first, exclude randomly, etc…) |
| Output Type | Defines the method for outputting duplicates | You can output records into a separate output or mark duplicates with a flag |
| Missing Column Behaviour | Defines the behaviour in the event of a missing dedupe column |
Examples
None
Tips & Tricks
None
