7 tips for finding duplicates in your database
Ok, you’ve been somehow landed yourself the job of being the data monkey. Someone has given you a list of clients, donors, customers, three-headed atlantic goats and you’ve been asked to do something with it. It could be really simple like: how many clients do we have?
Hmm… let me rewind. That might be simple if your non-profit Organisation has invested in a good database. Or go back another step, even has just ONE database. Chances are if you are a small not for profit Organisation you probably don’t have a dedicated data monkey and your [insert one type of clinican here] have one lot of data and your [insert second type here] their own little list. (Counsellors are the usual culprits for harbouring separate lists)
For the sake of simplicity, I’m going to presume that you’ve managed to round up all the lists and throw them together into one spreadsheet.
How hard can it be to find duplicates? Just look for people with the same name and address right? (I’m presuming that given your Organisation doesn’t have a data monkey it’s unlikely to have software that will de-dupe either).
Well, it is just about looking for doubles but here’s a few tips and tricks.
- Know who you are dealing with: if you’re working with a list containing children you’ll probably have siblings. What’s more, with low birth weight associated with some disabilities, you’re likely to have a few more twins than a normal database (assuming you’re a disability Org). With this in mind be careful of automatically ‘deleting’ duplicates based on surname, address and date of birth.
- Know how the data will be used: why are you de-duping this list? Is it to count the number of clients? To mail donors for a fundraising campaign? To phone customers? Mail clients about a new service? What you’re going to do with it will help you decide how aggressive to be with your de-duping.
- Be greedy: sometimes it’s best to ask for more data than you think you need. Commonly if you need a mailing list, you’ll pull out Title, Name, Address and Postcode. I recommend that you also pull out phone numbers and email addresses even if you don’t plan on calling anyone. These fields can be handy for detecting duplicates.
- Change it up: It can be tempting to use the same criteria each time you de-dupe a list (gasp, yes, you’re likely to have to do this more than once!) The trick of using just the first character of the person’s first name so R Goulburn and Robert Goulburn so that both will show in your output is a good one. But every now and then, ditch the initial. I recommend this all the more if your clientele are older. This is because names where the nickname and full name start with different letters seem to be more prevalent in older generations (I’m sure some social scientist would be able to tell me whether this is just my perception or not). Common pairings to be on the lookout for: Edward and Ted, Alfred and Fred, Robert and Bob, William and Bill, Margaret and Peg, Elizabeth and Betty… and so on.
- Swap it around: Another ‘every now and then’ trick is to search for duplicates where the first and last name fields have been entered backwards. This could be where people have a last name which can also be a first name, or for cultural reasons the name appears the otherway and it has been shuffled in your database. This is when your phone numbers come in handy. Exclude name fields altogether from your match criteria and throw in phone, email or DOB with the address component. You may surprised what you find.
- Know when to quit: if you are doing a ‘manual’ style de-dupe (i.e. not specialised software), it can be time consuming. If my search combinations are showing up only a handful of duplicates, I know that it’s time to call it a day. If I’m still getting a fat list of dupes every time I tweak the criteria, I’ll persist a little longer.
- Address duplicates where they breed: I know, I know, it seems really obvious doesn’t it? However I’m sure that many people have been through a process, saved what they thought they needed to update the database and then later found they needed something else. I’ve often been sent emails alerting me to a pair of duplicates, many people kindly providing the reference numbers and I will still need to email back and ask – which one is the primary record? Which ones has the correctly spelt first name etc. So start with the end in mind and you’ll save yourself some grief.
Posted on June 6, 2011, in Data, Direct Mail, Fundraising, Not for Profit and tagged data, database issues, dedpuing data, Direct Mail, duplicates, finding duplicates. Bookmark the permalink. Leave a comment.