Atomically update serialized PHP arrays in MySQL

Okay, okay, it’s hard to find a use case for this when it’s so obvious that the correct way to handle one-to-many is with JOIN. But if you’re already committed to your schema and you decide you need to append serialized PHP data to a row atomically, you can cons serialized values with this query:

INSERT INTO tbl
  …
  serialized = "i:1;"
  ON DUPLICATE KEY UPDATE
    serialized = CONCAT(
      'a:3:{i:0;s:4:"cons";i:1;',
      VALUES(serialized),
      'i:2;',
      serialized,
      '}'
    )

After you have performed this three times with the serialized values 1, 2, and 3, the row contains this:

'a:3:{i:0;s:4:"cons";i:1;i:3;i:2;a:3:{i:0;s:4:"cons";i:1;i:2;i:2;a:2:{i:0;s:4:"cons";i:1;i:1;}}}'

After unserializing, deconstruct it with this function:

function decons($list) {
    $res = array();
    while ( $list != array() ) {
        if ( $list[0] === 'cons' ) {
            array_unshift( $res, $list[1] );
            $list = $list[2];
        } else {
            array_unshift( $res, $list );
            break;
        }
    }
    return $res;
}

The result:

array(1, 2, 3)

I haven’t actually used it (probably never will) but you are welcome to try this at home!

Proving that this is stupid is left as an exercise for the reader.

WordPress code surprise: wp_sprintf

_DSC3433

by Ulf Wendel, on Flickr

Everybody loves PHP’s sprintf(). We use it everywhere. There are just some things it doesn’t do, like format lists in sentences. Three years ago, while working on the Media Library, I needed a way to list categories or tags in a sentence. And, by golly, it would need Oxford commas and localization to be worthy of WordPress.

 

Problem solved

I wrote wp_sprintf() as a wrapper for sprintf(). It uses WordPress filters to customize the formatting directives. WordPress only uses this in one place, the function where I needed to list categories and tags: get_the_taxonomies(). This is used in standard template tags so I estimate wp_sprintf() has run at least a trillion times without fanfare. The time has come to give wp_sprintf() the attention it deserves.

wp_sprintf

The calling interface is identical to sprintf():

string wp_sprintf ( string $format [, mixed $args [, mixed $... ]] )

Internally, wp_sprintf() splits the format string into fragments that begin with a single ‘%’. It sends each fragment through the 'wp_sprintf' filter with the appropriate $args, respective of numbered directives (‘%1$s’). If the filter did not modify the fragment, it is passed through sprintf(). The processed fragments are concatenated and then returned.

Functionally, wp_sprintf() should be identical to sprintf() until a filter is added which implements a new formatting directive, or supersedes any of the standard ones. It is certainly less optimized so it should only be used when a customized directive is needed.

wp_sprintf_l

The only 'wp_sprintf' filter now in core is wp_sprintf_l(), which adds a new directive to format arrays into lists, %l:

wp_sprintf('%s: %l.', 'Tags', array('Cats', 'Dogs', 'Birds'));
=> 'Tags: Cats, Dogs, and Birds.'

Our filter, wp_sprintf_l(), receives a format fragment and the $args that corresponds with its position or number. It returns the first fragment, ‘%s: ‘, unchanged. When it sees ‘%l.’ it replaces the %l directive with a formatted list.

WordPress always tries to make text beautiful. To that end, wp_sprintf_l formats lists with Oxford commas. Two items are “cats and dogs”. Three or more items are “cats, dogs, and birds”. However, if you don’t like Oxford commas you can remove them:

function remove_oxford_commas( $separators ) {
    $separators[ 'between_last_two' ] = ' and ';
    return $separators;
}
add_filter( 'wp_sprintf_l', 'remove_oxford_commas' );

Our list formatter also respects language differences. WordPress translations include wp_sprintf_l()‘s separators so that, for example, the Spanish translation always separates the last two items with ‘ y ‘ with no comma.

You can use <code>wp_sprintf</code> with the ‘%l’ directive anywhere in WordPress since 2.5.0. It’s a lot easier than writing a <code>foreach</code> loop every time.

Something new

Recently a prominent bug crept into some code when the integer argument for %d was wrapped in a call to number_format(). Everything after the first comma was lost. I wished there were a formatting directive to mimic number_format.

While I’m thinking about it, here’s a possibility: ‘%n’. It should automatically localize the thousands separator and decimal point; this alone makes it a compelling upgrade. It should also accept at least one specifier for precision. The other specifiers (sign, padding, alignment, width) would be nice but not necessary.

Lisp alien

Image via Wikipedia

p.s. Lisp rocks!

Yesterday I finished reading Conrad Barski’s entertaining Lisp primer, Land of Lisp. He gives only a whiff of Lisp’s format function but I was blown away. It provides tabulation, justification, iteration, recursion, conversion, conditions, and never mind making a cup of coffee, it could run a chain to compete with Starbucks. Here’s a taste:

(format nil
        "~{~a~#[~;, and ~:;, ~]~}"
        (list "Cats" "Dogs" "Birds"))
"Cats, Dogs, and Birds"

Idea drop: db::multi_query

We usually do things sequentially in PHP. Any point where we can get two things done at the same time is an opportunity to reduce the total execution time. It is generally safe to execute processes in parallel when no process has a side effect that may impact any other process. Most MySQL SELECT statements fall into this class. Under certain conditions it might help to run these queries in parallel.

So I got to thinking: does PHP provide a function to issue a query and return immediately, instead of waiting while the database churns? Such a function would let me spread a batch of queries across several servers and then collect the results. It should not involve forking or making OS calls. Maybe mysql_unbuffered_query is that function.

Assuming it is, here is the basis for a parallel query system in PHP. It is obviously incomplete. I may complete it and try it out when I need to query a dataset partitioned across separate MySQL instances. The function connect_reserved($query) returns a locked MySQL link identifier, opening new connections as needed. The function release($query) removes the lock, returning the link to the pool of available database connections.

function multi_query($queries);
    foreach ( $queries as $i => $query ) {
        $link = connect_reserved($query);
        $res[$i] = mysql_unbuffered_query($query, $link);
        $ret[$i] = array();
    }

    do {
        foreach ( $res as $i => $r ) {
            if ( $row = mysql_fetch_row($r) ) {
                $ret[$i][] = $row;
            } else {
                release($queries[$i]);
                unset($res[$i]);
            }
        }
    } while ( count($res) );

    return $ret;
}

I have not used mysql_unbuffered_query. For best results, it should return as soon as the server determines that the query is valid. I have assumed that it can return while the database is still looking for records. (If it can not, this whole idea should be forgotten.) This oversimplified diagram helps illustrate how the queries run in parallel. The green line shows the beneficial overlap of query processing time.

quasi-parallel queries in PHP

Persistent PHP processes in Erlang/OTP

Running PHP code from within Erlang is easy: os:cmd("php -r 'echo \"Hello, World!\";'"). This is fine when you need to run simple commands. When you demand more from PHP, this approach becomes awkward, wasteful, and eventually unusable. If your Erlang-to-PHP calls require large PHP applications, open connections to databases, or somehow incur significant initialization overhead, you should maintain pool of reusable PHP processes.

My first complete application for Erlang/OTP is php_app. It manages a pool of persistent PHP processes and provides a simple API to evaluate PHP code. I designed php_app to be robust and easy to use. It’s so easy, in fact, that I now use it to debug WordPress functions from within Erlang. Here is a sample session using start/0 and eval/1:

$ erl
Eshell V5.6.4  (abort with ^G)
1> php:start().
ok
2> php:eval("echo 'Hello, World!';
2>           trigger_error('Uh-oh!');
2>           return array(true, true);").
{ok,<<"Hello, World!">>,
    [{0,true},{1,true}],
    <<"Uh-oh!">>,continue}
3> 

In the resulting tuple we have the output, the return value, and the last error. The atom continue indicates that the PHP process is eligible for reuse, determined by its size in memory after evaluating my code. In the next example I’ll reserve a PHP process to demonstrate persistence and what happens when we hit the memory limit.

3> Ref = php:reserve().
#Ref<0.0.0.52>
4> php:eval("$a = array_fill(0, 200000, rand());
4>           return count($a);", Ref).
{ok,<<>>,200000,<<>>,continue}
5> php:eval("$a = array_merge($a, array_fill(0, 200000, rand()));
5>           return count($a);", Ref).
{ok,<<>>,400000,<<>>,break}
6> php:eval("return count($a);", Ref).
{ok,<<>>,0,<<"Undefined variable:  a">>,continue}
7> php:release(Ref).
ok
8>

The function reserve/0 removes a PHP process from the pool and returns a key that is used in eval/2. Without a key, we can’t be sure that the same PHP process will evaluate our next string of code. Notice the correct return value of 400000 and the atom break which indicates that the PHP process has been restarted because it exceeded the memory usage limit. Our Ref now points to a fresh PHP process. The reservation remains valid.

There are a few other return tuples: one for timeouts, one for parse errors, and one for exits. That last one includes fatal errors. They can’t be trapped. You’ll just have to refer to your error logs. (You do write code with a terminal tailing all your error logs, don’t you?) Here are some more bullet points to keep in mind:

  • Never define a PHP function without first testing function_exists because you will get a fatal error every time. This is by design.
  • Be mindful of escaping quotes and control characters. User input is the enemy. Test.
  • This app was written by an Erlang novice. Do not underestimate its potential destructive power.
  • Even so, it’s in use on a production system that makes lots of PHP calls.
  • The php module has EDoc for all of the API functions.  The HTML version is included for completeness.
  • If you modify the PHPLOOP, you should restart the PHP processes. Try php:restart_all().

The code is here. All you have to do is compile it. I like to make:all(). The configuration is in php.app.

If you would like to contribute changes to the code or documentation, I will be happy to hear from you. My OTP stuff could benefit from a more experienced set of hands.

GOTO in PHP 5.3

A while back I was dreaming of GOTO in PHP. Now it’s in PHP 5.3, which is alpha 1 as of last week. GOTO is documented as undocumented.

It might be three years before I can use it but I feel good knowing it’ll be there.

Follow

Get every new post delivered to your Inbox.

Join 1,667 other followers