Google Changing Adwords URL Structure

 

in

Well, it was a very fun day yesterday. It appears that Google has changed the URL structure of adwords links. For those people who collect Adwords data I'm sure you were as thrilled as I was when you saw that your script was searching 600 keywords per minute.

I was making changes to my server when it happened so I didn't immediately catch on, but after 2 cron runs not finding any ads I decided to investigate.

It appears that Google has decided to become a little more compliant and have added quotes around class declarations. They also changed the URL parameter from &adurl to the easy recognizable &q.

I've decided to release my rewritten function that will correctly match ads on the new URL structure. Please note that you'll have to add in a way to remove slashes if you're going to insert them into a database. (I've been fiddling around with that but it doesn't seem to be working). I should mention that this function is designed for http://google.com/sponsoredlinks?q=keyword. I haven't gotten around to fixing the natural search result functions yet. Enjoy.

Here is the working code:

<?php
   
function getSponsoredAds($str)
    {
       
$spartstart = '<div id="tpa';
       
$spartend = '</div>';
       
$slinkstart = '<a id="pa';
       
$slinkend = '</a>';
       
$sdestlink = "&q=";
       
$scontentstart = '<font size="-1">';
       
$scontentend = '</font>';
       
$stxtcontentstart = '</span>';
       
$stxtcontentend = '</font>';
       
$sspanstart = '<span class="a">';
       
$sspanend = '</span>';

       
$ad = array();
       
$gad = array();
       
$desturl = array();
       
$dispurl = array();
       
$str = str_replace(array("\n","\r","\t","amp;"),array("","","",""),$str);
       
preg_match_all("|(".$spartstart."(.*)".$spartend.")|U",$str, $out);
        for(
$x=0;$x<count($out[1]);$x++)
        {
           
preg_match_all("|(".$slinkstart."(.*)".$slinkend.")|U",$out[1][$x], $out_1);
           
//var_dump($out_1);
           
preg_match_all("|<[aA].+[hH][rR][eE][fF]=.+&q=([^[>\s'\"]+)[\'\" >]|U",$out_1[1][0],$link, PREG_PATTERN_ORDER);
            if (!isset(
$link[1][0]) || $link[1][0] == "")
           
preg_match_all("|<[aA].+[hH][rR][eE][fF]=.+&q=([^[>\s'\"]+)[\'\" >]|U",$out_1[1][0],$link, PREG_PATTERN_ORDER);
           
preg_match_all("|<[aA].+>(.+)</[aA]>|U",$out_1[1][0],$linktext, PREG_PATTERN_ORDER);
            if (isset(
$link[1][0]) && $link[1][0] != "")
               
$ad["desturl"] = urldecode($link[1][0]);
            else
            {
               
$ad["desturl"] = "No URL";
            }           
           
$ad["subject"] = $linktext[1][0];
           
preg_match_all("|(".$scontentstart."(.*)".$scontentend.")|U",$out[1][$x], $out_1);
           
preg_match_all("|(".$stxtcontentstart."(.*)".$stxtcontentend.")|U",$out_1[1][0], $out_1);
           
$ad["body"] = $out_1[1][0];
           
preg_match_all("|(".$sspanstart."(.*)".$sspanend.")|U",$out[1][$x], $out_1);
           
$ad["dispurl"] =  strip_tags(html_entity_decode($out_1[2][0]));
           
$ad["subject"] = strip_tags(html_entity_decode($ad["subject"]));
           
$ad["body"] = preg_replace("(<[bB][rR]([ ]+)?(/)?(remove this and brackets)>)", " ", $ad["body"]);        
           
$ad["body"] = strip_tags(html_entity_decode($ad["body"]));
            if (
$ad["desturl"] != "")
            {
               
$usgcode = strpos($ad["desturl"],"&usg=");
                if (
$usgcode !== false)
                {
                   
$ad["desturl"] = substr($ad["desturl"],0,$usgcode);
                }
            }
           
$gad[] = $ad;
        }
        return
$gad;
    }
?>

Syndicate

Syndicate content