[Tech] Database Merge Duplicates Program nearly done

Jamie O'Keefe jokeefe at jamesokeefe.org
Sun Oct 21 03:40:09 EDT 2007


I have a working program to merge duplicates from the effort to
combine all supporter records into one database.  Dan will be happy to
know that each new record notes its voter id and contributor db id.
The whole process took about a minute to run.

I need to correct the phone number matching and we need to finish
reviewing the names for errors, but I am hopeful that it will be
finished by tomorrow night.

I started with 61856 records and after duplicates were merged, 36182
records were left.  We haven't corrected all of the names, so there
are more duplicates to be found.  With this uncorrected data here are
some stats:

Address info

Bad Address 	408
Updated Add 	5967
Other Addr.	27000+

Party breakdown

F	192
G	1267
J	8580
D	2959
R	112
U	2468

Note that this only has the latest F/G/Js.  We have note combed
through the voter database to correct anyone's record who might have
moved out of state or changed party.

email info

email	9222
blank	26960

Anyway, this is great progress that I hope the campaigns will be able
to use soon.  Once we have merged the duplicate records, I will load
them into our web db and then give out logins to the campaigns.

peace,

Jamie


More information about the Tech mailing list