Idea drop: db::multi_query

We usually do things sequentially in PHP. Any point where we can get two things done at the same time is an opportunity to reduce the total execution time. It is generally safe to execute processes in parallel when no process has a side effect that may impact any other process. Most MySQL SELECT statements fall into this class. Under certain conditions it might help to run these queries in parallel.

So I got to thinking: does PHP provide a function to issue a query and return immediately, instead of waiting while the database churns? Such a function would let me spread a batch of queries across several servers and then collect the results. It should not involve forking or making OS calls. Maybe mysql_unbuffered_query is that function.

Assuming it is, here is the basis for a parallel query system in PHP. It is obviously incomplete. I may complete it and try it out when I need to query a dataset partitioned across separate MySQL instances. The function connect_reserved($query) returns a locked MySQL link identifier, opening new connections as needed. The function release($query) removes the lock, returning the link to the pool of available database connections.

function multi_query($queries);
    foreach ( $queries as $i => $query ) {
        $link = connect_reserved($query);
        $res[$i] = mysql_unbuffered_query($query, $link);
        $ret[$i] = array();
    }

    do {
        foreach ( $res as $i => $r ) {
            if ( $row = mysql_fetch_row($r) ) {
                $ret[$i][] = $row;
            } else {
                release($queries[$i]);
                unset($res[$i]);
            }
        }
    } while ( count($res) );

    return $ret;
}

I have not used mysql_unbuffered_query. For best results, it should return as soon as the server determines that the query is valid. I have assumed that it can return while the database is still looking for records. (If it can not, this whole idea should be forgotten.) This oversimplified diagram helps illustrate how the queries run in parallel. The green line shows the beneficial overlap of query processing time.

quasi-parallel queries in PHP

Published by

Andy Skelton

Code Wrangler @ Automattic youtube.com/AndySkelton

8 thoughts on “Idea drop: db::multi_query”

  1. Don’t think that’ll work.

    “Note: The benefits of mysql_unbuffered_query() come at a cost: You cannot use mysql_num_rows() and mysql_data_seek() on a result set returned from mysql_unbuffered_query(). You also have to fetch all result rows from an unbuffered SQL query, before you can send a new SQL query to MySQL.”

    So you can’t send a new query until you’ve fetched all the previous rows.

  2. Additional: Whoops, I didn’t see that you were using multiple DB connections. In that case, yes, this’ll work, but you’ll have to have one DB connection per parallel query.

    Also, when you do an unbuffered query, then it does wait for the database search to actually occur, and it even retrieves the first row of the result set into php for you. It just doesn’t retrieve the remaining rows from mySQL until you actually ask for them with a fetch_row.

  3. What about this:

    On the other hand, you can start working on the result set immediately after the first row has been retrieved: you don’t have to wait until the complete SQL query has been performed.

  4. you might be able to emulate something like this with pcntl_fork()….

    i dont know how well this will paste… but… this illustrates the idea…

    $query ) {
    $sockets[$idx] = fork_query($query, $idx);
    $results[$idx][‘query’] = $query;
    $results[$idx][‘rows’] = array();
    }
    while ( true ) {
    $poll = array_values($sockets);
    if ( false === socket_select($poll, $w=null, $e=null, 0) )
    break;
    foreach ( $poll as $idx => $sock ) {
    $res = unserialize(base64_decode(socket_read($sock, 80960, PHP_NORMAL_READ)));
    if ( $res === false ) {
    socket_close($sock);
    break;
    }
    $id = $res[0];
    $results[$id][‘rows’][] = $res[1];
    }
    foreach( $sockets as $idx => $val ) {
    if ( ‘Socket’ != get_resource_type($val) )
    unset($sockets[$idx]);
    }
    if ( !count($sockets) )
    break;
    }
    return $results;
    }

    print_r(multi_query(array(“blah1”, “blah2”, “blah3”)));

    ?>

Comments are closed.