So I’m building a BuddyPress site for a client. The task at hand is data migration – I have to distribute their users and companies data, which now exists as two tables, into the various WordPress tables that support the group and user constructs.

The problem

All in all this means inserting about 20,000 records. Our server was not happy when I tried to do 15,000 of them at once, while loading the data from an XML file. My solution was to punish it further by turning off the execution time limit with

set_time_limit(0);

I got the data in, but shortly after the machine basically melted down and we had to reboot.

Sometimes I think my ability to learn and understand code is way ahead of my common sense as a person who works with computers . . .

The solution

Having spoken with a DB admin guy who knows his stuff, I’ve set about doing this task in a smarter way. We talked about the fact that XML is a slow and “verbose” data format. So we modularized the process. My original method inserted data and metadata in a single script; I did it this way because

$wpdb->insert_id

is a nice way to access the last inserted record, which property I could pass into the $id field of update metadata functions.

But this semantic convenience cost too much in performance. Now, each step of the migration is independent of all others – each written as a WordPress plugin that triggers a query upon activation. The first step was dumping the contents of the XML file into two tables that mirror it exactly. (I should have just gotten an SQL dump as a source in the first place.)

Insert id’s for metadata are gathered from a SELECT query on either the groups table or wp_users, then update metadata functions are executed.

That’s as far as I’ve gotten right now.

Leave a Reply