PHP, Python and Java developer located at Hvaler, Norway. Main interests include digital mapping, search and scalability. Mats is a DZone MVB and is not an employee of DZone and has posted 28 posts at DZone. You can read more from them at their website. View Full User Profile

Retrieving URLs in Parallel With CURL and PHP

11.15.2011
| 4507 views |
  • submit to reddit

As we’ve recently added support for querying Solr servers in parallel, one of the things we added was a simple class to allow us to query several servers at the same time. The CURL library (which has a PHP extension) even provides an abstraction layer for doing the nitty gritty work for you, as long as you keep track of the resources. The code beneath is based on examples in the documentation and a few tweaks of my own.

The code beneath is licensed under a MIT license. You can also download the file (gzipped).



    class Footo_Content_Retrieve_HTTP_CURLParallel
    {
        /**
         * Fetch a collection of URLs in parallell using cURL. The results are
         * returned as an associative array, with the URLs as the key and the
         * content of the URLs as the value.
         *
         * @param array<string> $addresses An array of URLs to fetch.
         * @return array<string> The content of each URL that we've been asked to fetch.
         **/
        public function retrieve($addresses)
        {
            $multiHandle = curl_multi_init();
            $handles = array();
            $results = array();
     
            foreach($addresses as $url)
            {
                $handle = curl_init($url);
                $handles[$url] = $handle;
     
                curl_setopt_array($handle, array(
                    CURLOPT_HEADER => false,
                    CURLOPT_RETURNTRANSFER => true,
                ));
     
                curl_multi_add_handle($multiHandle, $handle);
            }
     
            //execute the handles
            $result = CURLM_CALL_MULTI_PERFORM;
            $running = false;
     
            // set up and make any requests..
            while ($result == CURLM_CALL_MULTI_PERFORM)
            {
                $result = curl_multi_exec($multiHandle, $running);
            }
     
            // wait until data arrives on all sockets
            while($running && ($result == CURLM_OK))
            {
                if (curl_multi_select($multiHandle) > -1)
                {
                    $result = CURLM_CALL_MULTI_PERFORM;
     
                    // while we need to process sockets
                    while ($result == CURLM_CALL_MULTI_PERFORM)
                    {
                        $result = curl_multi_exec($multiHandle, $running);
                    }
                }
            }
     
            // clean up
            foreach($handles as $url => $handle)
            {
                $results[$url] = curl_multi_getcontent($handle);
     
                curl_multi_remove_handle($multiHandle, $handle);
                curl_close($handle);
            }
     
            curl_multi_close($multiHandle);
     
            return $results;
        }
    }

 

Download the file.

Source: http://e-mats.org/2010/01/retrieving-urls-in-parallel-with-curl-and-php/
Published at DZone with permission of Mats Lindh, author and DZone MVB.

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: