Updates from February, 2012 Toggle Comment Threads | Keyboard Shortcuts

  • Andy Skelton 3:45 pm on February 14, 2012 Permalink  

    Atomically update serialized PHP arrays in MySQL 

    Okay, okay, it’s hard to find a use case for this when it’s so obvious that the correct way to handle one-to-many is with JOIN. But if you’re already committed to your schema and you decide you need to append serialized PHP data to a row atomically, you can cons serialized values with this query:

    INSERT INTO tbl
      …
      serialized = "i:1;"
      ON DUPLICATE KEY UPDATE
        serialized = CONCAT(
          'a:3:{i:0;s:4:"cons";i:1;',
          VALUES(serialized),
          'i:2;',
          serialized,
          '}'
        )

    After you have performed this three times with the serialized values 1, 2, and 3, the row contains this:

    'a:3:{i:0;s:4:"cons";i:1;i:3;i:2;a:3:{i:0;s:4:"cons";i:1;i:2;i:2;a:2:{i:0;s:4:"cons";i:1;i:1;}}}'

    After unserializing, deconstruct it with this function:

    function decons($list) {
        $res = array();
        while ( $list != array() ) {
            if ( $list[0] === 'cons' ) {
                array_unshift( $res, $list[1] );
                $list = $list[2];
            } else {
                array_unshift( $res, $list );
                break;
            }
        }
        return $res;
    }

    The result:

    array(1, 2, 3)

    I haven’t actually used it (probably never will) but you are welcome to try this at home!

    Proving that this is stupid is left as an exercise for the reader.

     
  • Andy Skelton 3:56 pm on January 5, 2011 Permalink  

    WordPress code surprise: wp_sprintf 

    _DSC3433

    by Ulf Wendel, on Flickr

    Everybody loves PHP’s sprintf(). We use it everywhere. There are just some things it doesn’t do, like format lists in sentences. Three years ago, while working on the Media Library, I needed a way to list categories or tags in a sentence. And, by golly, it would need Oxford commas and localization to be worthy of WordPress.

     

    Problem solved

    I wrote wp_sprintf() as a wrapper for sprintf(). It uses WordPress filters to customize the formatting directives. WordPress only uses this in one place, the function where I needed to list categories and tags: get_the_taxonomies(). This is used in standard template tags so I estimate wp_sprintf() has run at least a trillion times without fanfare. The time has come to give wp_sprintf() the attention it deserves.

    wp_sprintf

    The calling interface is identical to sprintf():

    string wp_sprintf ( string $format [, mixed $args [, mixed $... ]] )

    Internally, wp_sprintf() splits the format string into fragments that begin with a single ‘%’. It sends each fragment through the 'wp_sprintf' filter with the appropriate $args, respective of numbered directives (‘%1$s’). If the filter did not modify the fragment, it is passed through sprintf(). The processed fragments are concatenated and then returned.

    Functionally, wp_sprintf() should be identical to sprintf() until a filter is added which implements a new formatting directive, or supersedes any of the standard ones. It is certainly less optimized so it should only be used when a customized directive is needed.

    wp_sprintf_l

    The only 'wp_sprintf' filter now in core is wp_sprintf_l(), which adds a new directive to format arrays into lists, %l:

    wp_sprintf('%s: %l.', 'Tags', array('Cats', 'Dogs', 'Birds'));
    => 'Tags: Cats, Dogs, and Birds.'

    Our filter, wp_sprintf_l(), receives a format fragment and the $args that corresponds with its position or number. It returns the first fragment, ‘%s: ‘, unchanged. When it sees ‘%l.’ it replaces the %l directive with a formatted list.

    WordPress always tries to make text beautiful. To that end, wp_sprintf_l formats lists with Oxford commas. Two items are “cats and dogs”. Three or more items are “cats, dogs, and birds”. However, if you don’t like Oxford commas you can remove them:

    function remove_oxford_commas( $separators ) {
        $separators[ 'between_last_two' ] = ' and ';
        return $separators;
    }
    add_filter( 'wp_sprintf_l', 'remove_oxford_commas' );

    Our list formatter also respects language differences. WordPress translations include wp_sprintf_l()‘s separators so that, for example, the Spanish translation always separates the last two items with ‘ y ‘ with no comma.

    You can use <code>wp_sprintf</code> with the ‘%l’ directive anywhere in WordPress since 2.5.0. It’s a lot easier than writing a <code>foreach</code> loop every time.

    Something new

    Recently a prominent bug crept into some code when the integer argument for %d was wrapped in a call to number_format(). Everything after the first comma was lost. I wished there were a formatting directive to mimic number_format.

    While I’m thinking about it, here’s a possibility: ‘%n’. It should automatically localize the thousands separator and decimal point; this alone makes it a compelling upgrade. It should also accept at least one specifier for precision. The other specifiers (sign, padding, alignment, width) would be nice but not necessary.

    Lisp alien

    Image via Wikipedia

    p.s. Lisp rocks!

    Yesterday I finished reading Conrad Barski’s entertaining Lisp primer, Land of Lisp. He gives only a whiff of Lisp’s format function but I was blown away. It provides tabulation, justification, iteration, recursion, conversion, conditions, and never mind making a cup of coffee, it could run a chain to compete with Starbucks. Here’s a taste:

    (format nil
            "~{~a~#[~;, and ~:;, ~]~}"
            (list "Cats" "Dogs" "Birds"))
    "Cats, Dogs, and Birds"
     
    • Andrew Nacin 10:14 pm on January 19, 2011 Permalink | Reply

      I ran into this function a few months ago, and found it to be quite cryptic. After experimenting with it, I realized what its purpose was and how awesome it was. Thanks for the writeup explaining its history.

      Oxford comma FTW.

  • Andy Skelton 5:46 pm on February 11, 2009 Permalink  

    Idea drop: db::multi_query 

    We usually do things sequentially in PHP. Any point where we can get two things done at the same time is an opportunity to reduce the total execution time. It is generally safe to execute processes in parallel when no process has a side effect that may impact any other process. Most MySQL SELECT statements fall into this class. Under certain conditions it might help to run these queries in parallel.

    So I got to thinking: does PHP provide a function to issue a query and return immediately, instead of waiting while the database churns? Such a function would let me spread a batch of queries across several servers and then collect the results. It should not involve forking or making OS calls. Maybe mysql_unbuffered_query is that function.

    Assuming it is, here is the basis for a parallel query system in PHP. It is obviously incomplete. I may complete it and try it out when I need to query a dataset partitioned across separate MySQL instances. The function connect_reserved($query) returns a locked MySQL link identifier, opening new connections as needed. The function release($query) removes the lock, returning the link to the pool of available database connections.

    function multi_query($queries);
        foreach ( $queries as $i => $query ) {
            $link = connect_reserved($query);
            $res[$i] = mysql_unbuffered_query($query, $link);
            $ret[$i] = array();
        }
    
        do {
            foreach ( $res as $i => $r ) {
                if ( $row = mysql_fetch_row($r) ) {
                    $ret[$i][] = $row;
                } else {
                    release($queries[$i]);
                    unset($res[$i]);
                }
            }
        } while ( count($res) );
    
        return $ret;
    }

    I have not used mysql_unbuffered_query. For best results, it should return as soon as the server determines that the query is valid. I have assumed that it can return while the database is still looking for records. (If it can not, this whole idea should be forgotten.) This oversimplified diagram helps illustrate how the queries run in parallel. The green line shows the beneficial overlap of query processing time.

    quasi-parallel queries in PHP

     
    • Otto 6:28 pm on February 11, 2009 Permalink

      Don’t think that’ll work.

      “Note: The benefits of mysql_unbuffered_query() come at a cost: You cannot use mysql_num_rows() and mysql_data_seek() on a result set returned from mysql_unbuffered_query(). You also have to fetch all result rows from an unbuffered SQL query, before you can send a new SQL query to MySQL.”

      So you can’t send a new query until you’ve fetched all the previous rows.

    • Otto 6:33 pm on February 11, 2009 Permalink

      Additional: Whoops, I didn’t see that you were using multiple DB connections. In that case, yes, this’ll work, but you’ll have to have one DB connection per parallel query.

      Also, when you do an unbuffered query, then it does wait for the database search to actually occur, and it even retrieves the first row of the result set into php for you. It just doesn’t retrieve the remaining rows from mySQL until you actually ask for them with a fetch_row.

    • Andy 6:59 pm on February 11, 2009 Permalink

      What about this:

      On the other hand, you can start working on the result set immediately after the first row has been retrieved: you don’t have to wait until the complete SQL query has been performed.

    • apokalyptik 1:30 am on February 12, 2009 Permalink

      its times like these when i really wish php had some form of threading…

    • apokalyptik 2:28 am on February 12, 2009 Permalink

      you might be able to emulate something like this with pcntl_fork()….

      i dont know how well this will paste… but… this illustrates the idea…

      $query ) {
      $sockets[$idx] = fork_query($query, $idx);
      $results[$idx][‘query’] = $query;
      $results[$idx][‘rows’] = array();
      }
      while ( true ) {
      $poll = array_values($sockets);
      if ( false === socket_select($poll, $w=null, $e=null, 0) )
      break;
      foreach ( $poll as $idx => $sock ) {
      $res = unserialize(base64_decode(socket_read($sock, 80960, PHP_NORMAL_READ)));
      if ( $res === false ) {
      socket_close($sock);
      break;
      }
      $id = $res[0];
      $results[$id][‘rows’][] = $res[1];
      }
      foreach( $sockets as $idx => $val ) {
      if ( ‘Socket’ != get_resource_type($val) )
      unset($sockets[$idx]);
      }
      if ( !count($sockets) )
      break;
      }
      return $results;
      }

      print_r(multi_query(array(“blah1″, “blah2″, “blah3″)));

      ?>

    • apokalyptik 2:29 am on February 12, 2009 Permalink

      uhh… how about this time with less suckage… http://pastebin.com/f7c242f82

    • Felix Geisendörfer 6:09 am on February 12, 2009 Permalink

      apokalyptik: pcntl_fork is not perfect but you can do some threading-type-of-stuff with it.

    • Joseph Scott 10:16 pm on March 23, 2009 Permalink

      The MySQLi Poll function (when built with mysqlnd) might be good for this:

      http://us.php.net/manual/en/mysqli.poll.php

  • Andy Skelton 6:30 am on February 6, 2009 Permalink  

    Persistent PHP processes in Erlang/OTP 

    Running PHP code from within Erlang is easy: os:cmd("php -r 'echo \"Hello, World!\";'"). This is fine when you need to run simple commands. When you demand more from PHP, this approach becomes awkward, wasteful, and eventually unusable. If your Erlang-to-PHP calls require large PHP applications, open connections to databases, or somehow incur significant initialization overhead, you should maintain pool of reusable PHP processes.

    My first complete application for Erlang/OTP is php_app. It manages a pool of persistent PHP processes and provides a simple API to evaluate PHP code. I designed php_app to be robust and easy to use. It’s so easy, in fact, that I now use it to debug WordPress functions from within Erlang. Here is a sample session using start/0 and eval/1:

    $ erl
    Eshell V5.6.4  (abort with ^G)
    1> php:start().
    ok
    2> php:eval("echo 'Hello, World!';
    2>           trigger_error('Uh-oh!');
    2>           return array(true, true);").
    {ok,<<"Hello, World!">>,
        [{0,true},{1,true}],
        <<"Uh-oh!">>,continue}
    3> 

    In the resulting tuple we have the output, the return value, and the last error. The atom continue indicates that the PHP process is eligible for reuse, determined by its size in memory after evaluating my code. In the next example I’ll reserve a PHP process to demonstrate persistence and what happens when we hit the memory limit.

    3> Ref = php:reserve().
    #Ref<0.0.0.52>
    4> php:eval("$a = array_fill(0, 200000, rand());
    4>           return count($a);", Ref).
    {ok,<<>>,200000,<<>>,continue}
    5> php:eval("$a = array_merge($a, array_fill(0, 200000, rand()));
    5>           return count($a);", Ref).
    {ok,<<>>,400000,<<>>,break}
    6> php:eval("return count($a);", Ref).
    {ok,<<>>,0,<<"Undefined variable:  a">>,continue}
    7> php:release(Ref).
    ok
    8>
    

    The function reserve/0 removes a PHP process from the pool and returns a key that is used in eval/2. Without a key, we can’t be sure that the same PHP process will evaluate our next string of code. Notice the correct return value of 400000 and the atom break which indicates that the PHP process has been restarted because it exceeded the memory usage limit. Our Ref now points to a fresh PHP process. The reservation remains valid.

    There are a few other return tuples: one for timeouts, one for parse errors, and one for exits. That last one includes fatal errors. They can’t be trapped. You’ll just have to refer to your error logs. (You do write code with a terminal tailing all your error logs, don’t you?) Here are some more bullet points to keep in mind:

    • Never define a PHP function without first testing function_exists because you will get a fatal error every time. This is by design.
    • Be mindful of escaping quotes and control characters. User input is the enemy. Test.
    • This app was written by an Erlang novice. Do not underestimate its potential destructive power.
    • Even so, it’s in use on a production system that makes lots of PHP calls.
    • The php module has EDoc for all of the API functions.  The HTML version is included for completeness.
    • If you modify the PHPLOOP, you should restart the PHP processes. Try php:restart_all().

    The code is here. All you have to do is compile it. I like to make:all(). The configuration is in php.app.

    If you would like to contribute changes to the code or documentation, I will be happy to hear from you. My OTP stuff could benefit from a more experienced set of hands.

     
    • Felix Geisendörfer 3:45 pm on February 6, 2009 Permalink | Reply

      Hey Andy, do you work on any project using Erlang and PHP or is this just for the fun of it? I actually just thought about how Erlang might be able to help me with certain aspects of the application I am working on today and then a friend passes me this link ; ).

    • Andy 4:35 pm on February 6, 2009 Permalink | Reply

      Felix: Yes, WordPress.com’s new firehose is available as an XMPP PubSub node implemented with ejabberd.

    • kidsenergyburner 11:36 am on February 7, 2009 Permalink | Reply

      They say that the moment you start to look for something Universe will provide it :)

      Our team gravitates to start new development based on Erlang. Amazing technology! Now even WordPress is talking about it. Do you use RabbitMQ, CouchDB ? It would be interesting to get an opinion of High volume traffic site like WordPress about these technologies. So far information is scarce.

    • Russ Garrett 6:50 am on February 8, 2009 Permalink | Reply

      I was working on something almost identical to this, but I hadn’t finished it – now you’ve saved me the trouble :). Cheers!

    • Hover 9:23 am on February 9, 2009 Permalink | Reply

      We are probably the first company to use Erlang for our in-text content platform development using Erlang / Mnesia / Yaws… We could not have made a better choice…

    • Jimmy 9:10 pm on April 6, 2009 Permalink | Reply

      I found a new php extension for php/erlang intergrate.

      http://code.google.com/p/mypeb/

    • Angel Alvarez 12:49 pm on April 9, 2009 Permalink | Reply

      Hi Andy

      Very cute concept, You should take a look at FastCGI , There is a php fast cgi implementation with cares about memory and worker processes. You connect to a fastcgo php server with a sockets and dont need to take care of housekeeping of the pool.

      Im learning how you done the eval loop on every port process and i saw the similarities with fast cgi.

      Anyway greatjob, I’ll stay tuned for more erlang pearls!!

    • Pedram 5:59 pm on July 21, 2009 Permalink | Reply

      FastCGI though maybe more efficient, it is not as grid based and server transparent

    • niahoo 4:35 pm on January 30, 2011 Permalink | Reply

      Hello,

      are you still maintaining php_app, or have you got a finished version ? i’m really interested.

      Thanks

    • laurei 5:08 pm on August 22, 2013 Permalink | Reply

      I’d like to get this running after cowboy requests. I’m interested in how it handles $_SERVER[] vars, and how to inject them.

  • Andy Skelton 9:42 pm on August 7, 2008 Permalink
    Tags: goto, PHP 5.3   

    GOTO in PHP 5.3 

    A while back I was dreaming of GOTO in PHP. Now it’s in PHP 5.3, which is alpha 1 as of last week. GOTO is documented as undocumented.

    It might be three years before I can use it but I feel good knowing it’ll be there.

     
    • _ck_ 4:09 pm on September 10, 2008 Permalink | Reply

      Ha! Slashdotters and such will have a field day mocking PHP when they find out about that one.

      But it does make me reminiscent for my good old BASIC days :-)

      I guess by “three years” you mean before 5.3 is mainstream. Because it will certainly go gold by the end of this year unless something goes horribly wrong. Really looking forward to it’s speed boost to finally escape PHP4…

    • Jacob Santos 1:44 pm on October 1, 2008 Permalink | Reply

      Well, not really, you can only GOTO a label. If they are mocking PHP, then they really don’t understand why GOTO was horrible practice.

    • Ian Lewis 4:05 am on July 2, 2009 Permalink | Reply

      Goto in Object oriented PHP5. Insanity.

      I guess the slashdotters will be mocking it because it’s a bad idea that encourages sloppy design.

    • Joey 5:06 pm on August 24, 2009 Permalink | Reply

      There’s enough shocking PHP code out there as it is.
      Half the “PHP is revolutionary” camp is all about OO. Then they go and do this.

    • eggins10 5:11 am on November 9, 2009 Permalink | Reply

      Apparently when I confronted the dev team years back about making class methods use late binding I was told it was “incorrect” and “bad” OOP …

      Oh but using GOTO is “good” OOP ;-)

c
Compose new post
j
Next post/Next comment
k
Previous post/Previous comment
r
Reply
e
Edit
o
Show/Hide comments
t
Go to top
l
Go to login
h
Show/Hide help
shift + esc
Cancel
Follow

Get every new post delivered to your Inbox.

Join 2,059 other followers