Introduction:
            Fuzzy grouping transformation will add few additional
columns to the “transformation” output along with source columns. In those,
with the help of “_Key_in and _Key_out” columns we can find the
duplicate records.
Example:
- Find the below table with source data and corresponding “Key_in and Key_out” values.
 
- _Key_in returns the unique number for each record.
 
- _Key_out returns unique number for each group (duplicate record set), and same number for a group of records.
 
- Key_out value for a group of duplicate records is the Minimum “_Key_in” value of the same group.
 
- If “_Key_in is equal to _Key_out” then that record is valid record otherwise duplicate record.
 
Sample package creation:
Requirement:-
            I have a source table that contains duplicate
records. I want load the source data into target table, while the time of
loading skip the duplicate records.
Package creation:-
- From control flow items, drag and drop “Data flow task” into designer surface.
 - Right click on “Data Flow Task” and select “Edit” option then it will go to “Data flow”
 - From data flow items, drag and drop “OLEDB source” into designer surface. Create the source table mappings.
 
- After source mappings, drag and drop the “Fuzzy Grouping” transformation into designer surface. Connect the source green arrow to the transformation shown below screen shot.
 
- Edit the “Fuzzy grouping” transformation, then “Fuzzy grouping transformation editor” window will open.
 
- Specify the connection manager. The OLEDB connection is required to create temp table for performing “Fuzzy grouping” operation.
 - Click on “Columns” tab.
 
- Select the required columns to find the duplicate records.
 
Four
your information: ID column data type is “INT” so match type is “Exact”. “Name”
column data type is “varchar” so matched type is “Fuzzy”. 
- Click on “Advanced” tab.
 
- Name column is having string value so we need to find the fuzzy match.
 - If we give “Similarity threshold:” as “0.80” the string column values should match more than 80% then only it will treat as duplicate record other wise it will treat as a valid record.
 - Click “OK” button.
 - Drag and drop the “Conditional split” into designer surface. Connect the “Fuzzy grouping” transformation green arrow to the “Conditional split”. Then write the required condition to filter duplicate records.
 - Condition: [_key_in]== [_key_out]
 
- Click “OK”.
 - Drag and drop the “OLEDB destination” into designer surface, then map the target table. After that execute the package.
 
Target table output: