We are a group of users of Blackbaud products and are not affiliated with Blackbaud. We'd love to have you join our community to help and be helped in getting the most from your Blackbaud software.
Register now to join us to get independant advice on your system, connect with 3rd party consultants to help you maximize your database and have a real alternative to the official Blackbaud website.
I am currently in the process of running a duplicate report on RE but wanted to find out what criteria everyone uses to ensure you are picking up every duplicate on the database.
If you have the time please can you tell me the criteria you use for both Individual and Organisational searches.
At the moment I am searching on:
Ind - Surname=10 Org - Org Name= 40
First Name= 3 Post Code= 8
Post Code= 8
It really depends on how many duplicates you have and how much staff time you have to devote to cleaning it up.
We had three separate databases before we converted to RE. We meticulously went through the two secondary databases and coded each of the records with the ID for the main database. Our conversion team callously disregarded all of that and created separate records anyhow. Needless to say, we had a massive duplicate problem as a direct result of this; everyone was busy trying to get up and running on the new system; and we had very few resources to devote to cleaning up duplicates.
We used very strict duplicate criteria, because we would have been overwhelmed otherwise.
Drew
__________________ J. Drew Allen
Children's Hospital of Philadelphia
Crystal Reports and SQL Server Consultant
It is better to live your destiny imperfectly than to live an imitation of somebody else's life with perfection.
We meticulously went through the two secondary databases and coded each of the records with the ID for the main database. Our conversion team callously disregarded all of that and created separate records anyhow.
__________________ Allison Landry
Donations Coordinator
Austin Humane Society
I have been going through a MAJOR house cleaning this summer - starting with a few weeks of de-duping that went way beyond running the de-dupe report. I printed out Melissa's fixit suggestions and started those too. Allison's icons don't even begin to describe this process.
I just have to keep telling myself how lucky I am to have a job....
__________________ Andrea Shlasko
Database Manager
Big Brothers Big Sisters of the Sun Coast Inc.
"Just another day in Paradise, where every meal's a banquet and every day's a holiday"
Ryan says:
"At the moment I am searching on:
Ind - Surname=10 Org - Org Name= 40
First Name= 3 Post Code= 8
Post Code= 8"
---> Please pardon my ignorance, but I don't understand this. I'm somewhat of a geek, but at databases I'm pretty much a newbie, as they say. Nonetheless, I'd like to know more about de-duping our RE database - and while it seems there are clues in this thread, it's unfortunately as good as Greek to me.
Also, from my perspective, I don't see a real answer to the question Ryan put. Of course it's all relative - you do as much as you can find time or personnel for, but - some examples would still be helpful?
Is there someone who doesn't mind translating the above for me into something for more of a beginner to understand?
Ryan is referring to the criteria used in Business Rules for running a duplicates report. You can change the criteria to check on however much or little information you deem necessary. So for example 'Ind - Surname=10 First Name= 3 Post Code= 8' means that RE will go through all your constituents and anyone who has the same 10 letters in their surname AND the same 3 letters in their first name AND the same 10 digits in their postcode/zipcode will all come out in a report for you to then go through and determine which (if any) you want to merge and delete.
You have to experiment a little with the criteria, but it all depends on how well maintained your database is. For example including postcode in your criteria won't match people up if one of them has no postcode, and J Smith won't get matched up with John Smith if you have 3 digits in the first name search criteria...etc
Ah, thank you, yes, very much! Occasionally I feel very lost in jargon in these forums - I know there are a lot of really knowledgeable folks here to learn from, but I can't track the conversations sometimes, and that's a novel situation for me since I'm not exactly a dummy myself.
I would leave your surname, address, postcode criteria as they are. I would change the first name or initials field to 1, as there is too high a likelihood of having the same person listed as eg "J Smith", "J H Smith" and "John Smith" and if you have more than 1 character in your first name field, the report will not show these 3 records as dupes.
Basically my method of running dupe reports is to start with a "perfect" duping, which is which you're currently running. Then when you've got rid of the majority of them, start to knock your numbers down so the report starts to pick up on less than perfect dupes, as with Mr Smith above.
So today you would run:
Surname 10, First name 1, Address 40, Org Name 40
then when you've cleared that list, or got it down to 100 or so, run:
Surname 5, First name 0, address 10, Org name 10
so you're changing the criteria to first pick up the obvious dupes, get all those merged, and then running it to pick up the ones which may or may not be dupes and will need individually checking before merging.
You have to experiment a little with the criteria, but it all depends on how well maintained your database is. For example including postcode in your criteria won't match people up if one of them has no postcode, and J Smith won't get matched up with John Smith if you have 3 digits in the first name search criteria...etc
Actually, neither of these statements is correct. We had several times where a constituent with a Zip Code matched a constituent without a Zip Code. We use five characters of the Zip Code.
We also had one case where "Hart" matched "Hartwell" even though we were using ten characters for the last name.
This occurs, because the matches are programmed incorrectly. It should be taking the first n characters of the field on each record and seeing if they are equal. Instead, it is seeing if the field on one record is LIKE the first n characters of the field on the other record. Note that this is not commutative!
For example, consider taking the first five characters.
'John' LIKE 'J%' evaluates to true. '%' is the SQL wildcard for any number of characters.
'J' LIKE 'John%' evaluates to false.
In our case, we were getting a match, because
'Hartwell' LIKE 'Hart%' evaluates to true.
Drew
__________________ J. Drew Allen
Children's Hospital of Philadelphia
Crystal Reports and SQL Server Consultant
It is better to live your destiny imperfectly than to live an imitation of somebody else's life with perfection.
In that case I stand corrected. Thanks Drew.
Won't the duplicate query always look at 'J' LIKE 'John' rather than 'John' LIKE 'J' though, or does it not run alphabetically?
Won't the duplicate query always look at 'J' LIKE 'John' rather than 'John' LIKE 'J' though, or does it not run alphabetically?
I haven't investigated in that much detail. If the initial is entered with a period '.' then the two strings won't match for lengths greater than one. I suspect that in many cases this is why they don't match despite the flaws in the programming.
Drew
__________________ J. Drew Allen
Children's Hospital of Philadelphia
Crystal Reports and SQL Server Consultant
It is better to live your destiny imperfectly than to live an imitation of somebody else's life with perfection.
This occurs, because the matches are programmed incorrectly. It should be taking the first n characters of the field on each record and seeing if they are equal. Instead, it is seeing if the field on one record is LIKE the first n characters of the field on the other record. Note that this is not commutative!
----8<----
Drew
Ouch. [sigh]
A "sounds-like" search option (using generally accepted homonym logic) might also be useful, given errors I've seen created by telephone services who don't double-check spellings ...
__________________ Simon Stacey Database Officer Centrepoint
I'd just like to note for posterity here that some of us LOVE doing this kind of clean-up, and we're out there, in your city, waiting to do a freelance cleanup of the database that has you hitting your head against a wall.
I'd just like to note for posterity here that some of us LOVE doing this kind of clean-up
You're not the only one - although I'd really like to have enough time to do that, in between the regular donation processing and appeal package segmentation that takes up most of my time! I'm making a little list of cleanups that need to be done at some point ...
__________________ Simon Stacey Database Officer Centrepoint
Oh, for a volunteer like that in the Medford, Oregon area... like Simon, I enjoy that type of work myself although I'm relatively new at it. Also like Simon, I've been forced to spend my time on other things, and to relegate this kind of cleanup to lists of things I'd like to see done on a regular basis.
I've also toyed with the idea of trying to get an intern in to help with things like this, among other tasks.
forced to ... relegate this kind of cleanup to lists of things I'd like to see done on a regular basis.
I think that everyone who works in the charity/nfp data sector has a wishlist like that - "All the things we'd do if we had time or resources to do this properly." Rather than the more common "Aaaaaaaaaaaaaaaaagh! Got to get that done by lunchtime!" which we end up having to do.
__________________ Simon Stacey Database Officer Centrepoint