[Tech] Database clean up status
Jamie O'Keefe
jokeefe at jamesokeefe.org
Fri Jun 22 03:33:38 EDT 2007
Jeez where do I start. I must have put in 15-20 hours cleaning up
this database. I haven't been removing dups since there are just
plain too many. I'll let the mailing list clean up service do that.
However, I have been making sure that every record has a city, state
and zip, by carefully sorting by one of the three items and then
filling in the missing data. Been cleaning up some of the more
questionable addresses by finding the correct ones in mapquest. was
also able to sort by area code and exchange an fill in some records
where we had the number, but not the city/state/zip. I have also
expanded city names such as W Newburyport to West Newburyport and
corrected typos. Found the right states for maybe 100 records from
out of state addresses that had their state as MA.
I will do a last pass to look for more irregularities vis-a-vie city &
zip code, then sort by city and see if I can get a few more zipcodes.
Then it is off to the mailing list cleanup service for the address
lookup and then phone number lookup and verification.
Dan, can I get the latest dump of GRP/RC/GPUSA voters and a dump of
the contributor db (with the contributor people record ids, so we
don't lose that data)? Since the list clean up service will flag all
duplicates, that would allow us to sync the two databases better that
way.
We have over 10,000 phone numbers currently, though I don't know how
many are still good and a smaller, but still significant, number of
email addresses.
One last thing. I setup a GRP group on facebook and Rich Zitola has
one on Orkut. Neither, are listed on the site, but it looks pretty
easy to do and I have some draft html up on the site that is not yet
exposed. I believe that facebook has a php/python/etc. api for
getting contact info of members of a group. Orkut does not appear to
be so helpful.
see you tech folks on Saturday.
Jamie
More information about the Tech
mailing list