Google

Wait a minute... those aren't my stats!

Using Google Analytics Include Filters to Remove the Other Guy's Data

Traffic here at Nerdliness is generally pretty level.  Sure, there's the occasional spike on days we post new content, and the overall trend is in the upward direction, but viewed over a monthly timeline the graphs are roughly flat.

A couple of weeks ago, I was doing my daily OCD-ish perusal of the Nerdliness.com Google Analytics reports and started seeing something odd:  a substantial bump in visitors.  My first thought was, of course, that Someone Important had recently discovered our sheer awesomeness and was preaching our gospel, but reality soon set in.

After investigating a little further, I noticed that the Content reports showed that all this traffic was directed at a page that doesn't exist.  I looked through our Apache access and error logs, trying to find any references to those URLs and came up with nothing.  

On the off chance that I might have actually made some sort of mistake, I also compared the Google Analytics report to our Google AdSense data.  Our 404 page has a couple of AdSense blocks on it, so I figured that, if it was the result of a bad link and visitors getting a 404 error, we'd see a proportional increase in AdSense impressions.

Nothing.

Looked around at every report, log, and portent I could think of and found nothing.  Double-checked our AdSense code snippet, looked good.  At that point, I was reasonably sure that it wasn't on our end and started to suspect that, perhaps, someone else was using our GA code.

So I emailed Google Analytics support. 

(Quick side note:  GA support contact link was a pain in the pooper to find.  If you ever need to contact them yourself, you first have to go through their Google Analytics Troubleshooter and just through a few hoops.  Assuming you don't find your answer, you should get a contact form at the end. 

Good news is that they were very quick to respond, replying to my ticket well within the promised 24 hours.  Good on you, Google.)

GA support didn't 100% confirm that someone was using our code, but their canned message did suggest that was the most likely cause:

Finding information about traffic to domains that are not yours in your reports is possibly the result of someone accidentally entering the wrong code on their own site, or borrowing/displaying some of your website's code for their website.

Even better, they included a possible fix:

If you're concerned about this data corrupting your own reports, Analytics can easily filter on a specific domain so you can avoid this problem. We recommend creating an 'Include' filter on your own domain:

Filter Type: Custom filter > Include
Filter Field: Hostname
Filter Pattern: your-domain-name.com
Case Sensitive: No

Sweet.

In case you don't know, you get to the Filter setup by:

  1. Log in to Google Analytics (duh).
  2. Click the Edit link for the domain you want to Filter.  It's in the far right column, under the Actions heading.
  3. Click the "+Add Filter" link.  It's in the third section down with the heading "Filters Applied to Profile."  The link is on the far right, on the same grey background as the heading title.

Once there, just follow the Google Analytics Support instructions above.

Now, it doesn't look like the filter will do anything to historical data, just info that comes in after the filter is in place.  That said, I think that our new Google Analytics setup will include setting up this input filters as soon as we set up a new domain, just in case.

Miscellaneous:

Google Maps/Amazon/Banned Books Mashup - Part 2

Geocoding with Google Maps API

When we last heard from Our Hero, we’d begun the joyous process of creating a simple Google Maps/Amazon mashup.  We took a peek at what the finished product should look like, we obtained our Google Maps API key, and we created our database tables and populated them with some data.  All good things.

We also talked about geocoding, that process of converting a street address into the coordinates (longitude and latitude) that Google Maps digs.  Our LOCATIONS table has some street addresses and empty columns just hankerin’ for some longitude and latitude info.  Let’s do us some geocodin’.
This zip file contains the source for all the codin' we're doing today:

http://www.nerdliness.com/files/updatelnglat.zip

You'll need those files to follow along at home.  While I'll break 'em down pretty much completely in the paragraphs to follow, I won't be reposting them in their entirety.  If you try to just copy-and-paste from the text below, you'll be sadly disappointed when your efforts blow up because of a missing variable declaration, etc.

 

config.php


Ok, then.  First thing's first.  Before we get into the good stuff, we’re going to create a config file that will hold our database connection info (username, password, etc).  Call it config.php and put it in a scripts directory within the directory you normally put your HTML files.  In other words, if you put your goods in public_html/ normally, this config.php will live in public_html/scripts/config.php. Of course, you can do whatever you want, but if you end up just using my code you’ll need to change the various references accordingly.

Your config.php file should look something like this:

<?php

$dbhost= '[your_db_host_name]';
$dbname = '[your_db_name]';
$dbuser = '[your_db_user_name]';
$dbpasswd = '[your_db_password]';


$link = mysql_connect($dbhost, $dbuser, $dbpasswd)
    or die('Could not connect: ' . mysql_error());

mysql_select_db($dbname) or die('Yo, where da database?');

?>

While this config.php file isn't completely necessary, it'll save you some grief later.  Sure, you could add this info into every file that connects to your database if you like, but we lazy nerds prefer to just throw it all into one file and reference it when needed.

Don’t forget to replace everything in the brackets with the appropriate info for your setup.  Probably ‘localhost’ for the ‘[your_db_host_name]’ part, etc.  If you aren’t certain, you’ll need to contact your ISP or sys admin for the correct info.  And, again, I’m using MySQL, so you might need to modify this a bit if you use PostgreSQL or some other DBMS.

locationmassinsert.html


Now, in the root of the website, I have a file called locationmassinsert.html that's just a simple plain ol' HTML file:

<html>
<head>

<script src=
"http://maps.google.com/maps?file=api&amp;v=2&amp;key=[your_google_maps_api_key]"
type="text/javascript">
</script>

<script type="text/javascript" src="scripts/getcoord.js"></script>
</head>
<body onLoad="getLocFromXML()">

</body>


</html>

Not much to it, is there?  It’s essentially just a wrapper that will call the functions we use to actually pull the data from our table, submit a query to Google to geocode an address, and update our original table with the returned info. 

You could do this a few different ways…  Personally, I like just using a simple HTML page like this in case I want to call it remotely in the future.   I just fire it up in my Firefox and watch it do the rest.  In real life, I’d probably password protect the page to keep people from launching a DDOS attack (Google limits the number of geocoding requests you can make in a day), but since I’m only using it to initially populate my table, I’m not going to worry about that.  Instead, I’ll just make it unreadable when I’m done or maybe pull off the server completely.

Ok, so what are we doing here?   Let's take a closer look at some of the lines in that page.

<script src=
"http://maps.google.com/maps?file=api&amp;v=2&amp;key=[your_google_maps_api_key]"
type="text/javascript">

That’s the big one.  It’s what gives us access to the various Google Maps functions Google lets us use to create these maps.  Make sure you replace the “[your_google_maps_api_key]” part with the key you obtained early.  No need to surround it with quotation marks, just slap that long, ugly key in right after the equals sign.

The second important part is this line:

<script type="text/javascript" src="scripts/getcoord.js"></script>

One of the next steps will be to create that getcoord.js file, but for now just include the reference.  The getcoord.js file will hold all the JavaScript functions we’ll use to get our AJAX on and submit some requests to Google.  Without that reference, this last important line will error out:

<body onLoad="getLocFromXML()">

All we’re saying is that, when this page loads in the browser, it should call the function named “getLocFromXML()” we’ll define in the getcoord.js file.  Simple so far, huh?

 

getcoord.js


If you’re new to JavaScript, this might look pretty much like gibberish at first.  You’ll get the hang of it, though, and pretty quickly.

Our getcoord.js file contains three different functions:  getLocFromXML(), codeAddress(), and insertCoord().  Remember from a couple paragraphs ago that our locationmassinsert.html file does little more than call the getLocFromXML() function we define in this getcoord.js file…?   Here’s what that function looks like:

function getLocFromXML() {
        var ajaxRequest;

        try{
                ajaxRequest = new XMLHttpRequest();
        } catch (e){
                try{
                        ajaxRequest = new ActiveXObject("Msxml2.XMLHTTP");
                } catch (e) {
                        try{
                                ajaxRequest =
        new ActiveXObject("Microsoft.XMLHTTP");
                        } catch (e){
                                alert("Bad juju!");
                                return false;
                        }
                }
        }
        
ajaxRequest.onreadystatechange = function(){
if(ajaxRequest.readyState == 4){
        var xmldoc = ajaxRequest.responseXML.documentElement;
        var locations = xmldoc.getElementsByTagName('location');
       
        for (var i = 0; i < locations.length; i++) {
            var locationid =
        locations[i].getElementsByTagName('locationid')[0].firstChild.nodeValue;
            var street1 =
        locations[i].getElementsByTagName('street1')[0].firstChild.nodeValue;
            var city =
        locations[i].getElementsByTagName('city')[0].firstChild.nodeValue;
            var state =
        locations[i].getElementsByTagName('state')[0].firstChild.nodeValue;
            var zip =
        locations[i].getElementsByTagName('zip')[0].firstChild.nodeValue;
            address = street1 + ' ' + city + ' ' + state + ' ' + zip;
            codeAddress(address, locationid);           
        }
                }
        }
       
        ajaxRequest.open("GET", "scripts/getxmlinfo.php", true);
        ajaxRequest.send(null);
}

Looks a lot more complicated than it really is.  Helluva lot easier to understand this if we break it down. 

The first couple of lines are pretty straightforward.  First, we say that we’re creating a new function called “getLocFromXML” with the syntax “function getLocFromXML()”.  Wow.  Hard.

The next line is just about as complex.  “var ajaxRequest;” just means that we’re declaring a variable called “ajaxRequest”.  It’s nothing but a placeholder at this point, but it’ll mean the world to our app before all’s said and done.

Now here’s where the magic happens.  The first part of our function is what the kids call a “Try/Catch” block.  Like the name implies, we’re telling the system to try a piece of code first.  If it works, great.  If not, catch the error and attempt the second piece of code instead.  If that doesn’t work, try another piece, catch its error, etc.  If they all fail, send an alert that just says “Bad juju!”

All we’re really trying to do with that try/catch block is create an XMLHttpRequest object, the thing we use to send requests to a web server (and receive a response) without having to reload the whole page.  This, folks, is the magic behind AJAX.  

The rest of that try/catch garbage is only in place to provide support for different browsers and their inability to follow standards…  We say “Hey, browser!  Fetch me a XMLHttpRequest object and call it ‘ajaxRequest!’”  If it can do it, great.  We ignore the rest of that section.   But what if it doesn’t know what an XMLHttpRequest object is?  Well, then we tell it we want an ActiveXObject("Msxml2.XMLHTTP") instead.  Still not working?  How ‘bout an ActiveXObject("Microsoft.XMLHTTP")?  And if that doesn’t work, we give up on your sorry browser.

Let’s move on to the next part, the section that begins with “ajaxRequest.onreadystatechange = function()”.  Ok, so the XMLHttpRequest object we called ajaxRequest has a property called a “readyState” that can be anything from 0-4, depending on what its current state is:

0 = uninitialized
1 = loading
2 = loaded
3 = interactive
4 = complete


When the readyState changes over the course of the object’s lifetime, we can tell our code to do what we want it to do by using that nifty onreadystatechange event handler.  Whenever ajaxRequest’s state changes, the function we’re creating will be called.  The next dozen or so lines define exactly what the function is.

if(ajaxRequest.readyState == 4){

Pretty self-explanatory.  We only want our code to do its thang if the XMLHttpRequest has completed, so we check that readyState right out of the gate.  If it equals 4, then carry on, my wayward son.  Otherwise, fuggetaboutit.

var xmldoc = ajaxRequest.responseXML.documentElement;
var locations = xmldoc.getElementsByTagName('location');

Just declaring a couple of variables and assigning them values based on the properties of our ajaxRequest XMLHttpRequest object.  xmldoc will be an element of the XML our request receives later on, while locations contain the elements within xmldoc named “location” (this’ll make a little more sense later).

Next, we get a little FOR loop:

for (var i = 0; i < locations.length; i++) {
    var locationid =
        locations[i].getElementsByTagName('locationid')[0].firstChild.nodeValue;
    var street1 =
        locations[i].getElementsByTagName('street1')[0].firstChild.nodeValue;
    var city =
        locations[i].getElementsByTagName('city')[0].firstChild.nodeValue;
    var state =
        locations[i].getElementsByTagName('state')[0].firstChild.nodeValue;
    var zip =
        locations[i].getElementsByTagName('zip')[0].firstChild.nodeValue;
    address = street1 + ' ' + city + ' ' + state + ' ' + zip;
    codeAddress(address, locationid);           
}

While the variable “i” is between 0 and the length of our locations variable (the number of different location elements), we want to execute the code within.  In this case, we’re creating various variables we’ll need for our geocoding process (locationid, street1, city, etc.) and assigning them the values from our XMLHttpRequest’s response.  Again, this’ll probably make more sense when you see what the response looks like later in the lesson.

At the end of the loop, we concatenate a few of the variables together to create an “address.”  We then call a function called codeAddress(), giving it two pieces of information (the locationid and our concatenated address) for each of the location elements.  Makes sense, right?  We need to geocode every address, so we call a function to do just that once for every physical address we care about.  We’ll look more closely at the codeAddress() function itself shortly.

The last couple of lines tell ajaxRequest to actually open its connection and to where it should send its request:

ajaxRequest.open("GET", "scripts/getxmlinfo.php", true);
ajaxRequest.send(null);

Yeah, you guessed it.  We’re going to create a file called getxmlinfo.php in the near future that’s actually going to handle this request.  It then return back to our function the XML containing the location elements we just talked about.  Since we’re using the HTTP GET method here, we don’t need to include any info with that .send(null) line.  If we were using POST, well, that’d be another story (how’s that for foreshadowing?)…

On to the next function in our getcoord.js file, codeAddress():

function codeAddress(address, locationid) {
        geocoder = new GClientGeocoder();
        geocoder.getLocations(address, function(response) {
                if (!response || response.Status.code != 200) {
                        alert("Sorry, we were unable to geocode that address");
                 } else {
                        place = response.Placemark[0];
            var lng = place.Point.coordinates[0];
            var lat = place.Point.coordinates[1];
            insertCoord(locationid, lng, lat);           
                }
        });
}

Clearly, this one’s a bit simpler than our getLocFromXML() function.  Here we get our first taste of the Google Maps API, specifically the GclientGeocoder (http://code.google.com/apis/maps/documentation/reference.html#GClientGeocoder ) class.

The first couple of lines are similar to our last function. 

function codeAddress(address, locationid) {
        geocoder = new GClientGeocoder();

We’re creating a function called codeAddress that requires two pieces of input to get its job done:  an address and a locationid.  I’m sure you remember that we sent those to this function when we called it above.  You’re smart like that.

Next, we create a new object called geocoder from the aforementioned GClientGeocoder() class.   (Note:  I declared geocoder, along with a couple other variables, before all my functions.  You wouldn’t have seen that unless you already downloaded and perused the full getcoord.js file in the ZIP file I linked to earlier.)

After that, we call use the GClientGeocoder’s getLocations method to actually geocode the address in question.  In Google’s words:

Sends a request to Google servers to geocode the specified address. A reply that contains status code, and if successful, one or more Placemark objects, is passed to the user-specified callback function.

Here’s how we use that:

geocoder.getLocations(address, function(response) {
The “address” part of “geocoder.getLocations(address, function(response)” is the address we passed to our function originally (the concatenated one from the getLocFromXML function).  The “response” in “function(response)” is the data Google sends in response to our geocoding request.

First thing we do with their response is check its status. 

if (!response || response.Status.code != 200) {
That status can be any number of values Google’s created to tell you what happened when it tried to geocode your address (see http://code.google.com/apis/maps/documentation/reference.html#GGeoStatusCode for the list of values).  In our case, we only really care that we receive a successful response.  If it failed, we don’t care about the how and why for this example, but if you’re troubleshooting that information could be invaluable.

Quick sidebar for the new-to-JavaScript among us.  The ! operator generally means “not” or “the opposite,” and the || comparison operator means “or.”  In English, we’re saying “if we don’t get a response, or if we get a response that doesn’t equal 200, then do something…”  In this case, if either happens, we pop up an alert that simply says “Sorry, we were unable to geocode that address.”

If we get a good response back, though, we want to snag the longitude and latitude info from that response and insert that into our database:

} else {
    place = response.Placemark[0];
    var lng = place.Point.coordinates[0];
    var lat = place.Point.coordinates[1];
    insertCoord(locationid, lng, lat);           
}

Remember that Google quote from above that talked about how a successful geocoding request will return one or more Placemark objects?  That’s what we’re looking for.  First, we assign our place variable the value of response.Placemark[0].  (That “[0]” means the first of the “one or more Placemark objects.”  There’s a chance there could be multiple Placemarks, but we’re assuming that the first one is the one we want.  Also, keep in mind that the numbers start with zero, not one.  That can be confusing for some new programmers, but it’s pretty common.)

Next, we pull the Point.coordinates[0] property from our place object and assign it to our new lng variable.  Likewise with Point.coordinates[1] and our lat variable.  How did we know which one was the longitude and which the latitude?  Simple:  we Googled Google and found an example of the Placemark object’s structure.  If you browse to http://code.google.com/apis/maps/documentation/services.html#Geocoding_Structured you’ll see what we’re talking about.  The longitude is simply listed before the latitude in the Point.coordinates section.

Finally, now that we have our actual coordinates, we can insert them back into our original LOCATIONS table.  We do that by sending them to another function we’ve created called insertCoord():

function insertCoord(locationid, loclng, loclat) {
        var ajaxRequest; 

        try{
                ajaxRequest = new XMLHttpRequest();
        } catch (e){
                try{
                        ajaxRequest = new ActiveXObject("Msxml2.XMLHTTP");
                } catch (e) {
                        try{
                                ajaxRequest = new
                                  ActiveXObject("Microsoft.XMLHTTP");
                        } catch (e){
                                alert("Your browser broke!");
                                return false;
                        }
                }
        }
       
ajaxRequest.open("POST", "scripts/insertcoord.php", true);
ajaxRequest.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded;
   charset=UTF-8');
ajaxRequest.send('locationid=' + locationid + '&lng=' +
   loclng + '&lat=' + loclat);
}

Looks pretty familiar, eh?  Again, we’re using some AJAX to send our data between pages.  Why?  We save some resources by sending smaller bits of data to our server.   Save some bandwidth, thing runs faster.  Plus, there’s the added bonus of getting some more AJAX practice.  Win, win.

You all know what the first part of this function does by now (creates a function with our desired name and tells what arguments it requires).  Also, you probably remember the whole schpiel about the try/catch block from our getLocFromXML() discussion.  It’s exactly the same here.

So what’s different?  Only where we’re sending this request and the method we’re using (POST instead of GET):

ajaxRequest.open("POST", "scripts/insertcoord.php", true);
ajaxRequest.setRequestHeader('Content-Type', 'application/x-www-form-urlencoded;
   charset=UTF-8');
ajaxRequest.send('locationid=' + locationid + '&lng=' +
   loclng + '&lat=' + loclat);

Pretty boring, but a couple things to note.  First, we’re using an HTTP POST this time instead of a GET, and we’re POSTing to a file called insertcoord.php that we haven’t created yet.  We’ll do that soon.

Second, notice that we have to create specific headers when we’re using a POST instead of GET.  Don’t worry too much about that ajaxRequest.setRequestHeader line now, just know that you need it when you POST.

Finally, see how we aren’t sending ‘null’ like we did with the GET method?  Here, we’re actually sending the parameters we’ll be using in that insertcoord.php file when we update the database.  Without that info, the insertcoord.php file wouldn't know which record to update or what to update it with.

Now you might be asking yourself why that insertcoord.php file is even necessary, why we don’t just insert into MySQL directly from this JavaScript file.  The answer is that that there simply isn’t an easy way to do that with JavaScript.  We have to use a server-side language for that part.

And that’s it for our getcoord.js file.

Next, we need to create the other two files we referenced:  getxmlinfo.php and insertcoord.php.

getxmlinfo.php


This one's pretty simple:

<?php
include_once('config.php');
header('Content-Type: text/xml');

$output = "<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>\n";
$output.="<locations>\n";

$query = "SELECT locationid, name, street1, street2, city, state, zip
    FROM locations WHERE street1 != ''
    AND (lat IS NULL or lng IS NULL) ORDER BY locationid";
$result = mysql_query($query) or die('Query failed: ' . mysql_error());

while ($myline = mysql_fetch_assoc($result)) {
    $output.= "<location>\n";
    $output.= "\t<locationid>" . $myline['locationid'] . "</locationid>\n";
    $output.= "\t<street1>" . $myline['street1'] . "</street1>\n";
    $output.= "\t<street2>" . $myline['street2'] . "</street2>\n";
    $output.= "\t<city>" . $myline['city'] . "</city>\n";
    $output.= "\t<state>" . $myline['state'] . "</state>\n";
    $output.= "\t<zip>" . $myline['zip'] . "</zip>\n";
    $output.= "</location>\n";
}
$output.="</locations>\n";

echo $output;
?>

All this file does is query our database for all the locations that don’t yet have a longitude or latitude, and where the street1 column isn’t empty.  We format the output to look like XML and return it back to the getLocFromXML() function in our getcoord.js file.  You can actually browse directly to this file and see for yourself what the output looks like.  Mine’s at http://www.nerdliness.com/ajaxdemo/scripts/getxmlinfo.php.

Just about everything here is stuff we’ve seen before.  The lines are:

include_once('config.php');
Remember that config.php file we created before?  Well, here’s where it comes into play.  Without that file and this reference to it, we’d have to include all that connection info in every file that needs to talk to the database.  Just remember to include the proper path to your config.php file, relative to the file calling it.  In my case, since both files are in the scripts/ directory, I’m good like this.

header('Content-Type: text/xml');
Without that line, your output won’t be formatted and rendered properly.  Comment it out and your browser will treat all your element names like other HTML tags, and your getLocFromXML() function won’t recognize it as valid XML.  If it can’t do that, it can’t parse it properly.  Don’t forget it.

$query = "SELECT locationid, name, street1, street2, city, state, zip
    FROM locations WHERE street1 != ''
    AND (lat IS NULL or lng IS NULL) ORDER BY locationid";

This defines the query we’re using to pull the appropriate locations from our database.  We know that any addresses missing the street portion will fail, so we filter them out at this point.  We also don’t want to waste time geocoding addresses for which we already have long/lat info, so we filter those out with our WHERE clause, too.

$result = mysql_query($query) or die('Query failed: ' . mysql_error());
This actually submits our database query and assigns the results to $results.

while ($myline = mysql_fetch_assoc($result)) {
    $output.= "<location>\n";
    $output.= "\t<locationid>" . $myline['locationid'] . "</locationid>\n";
    $output.= "\t<street1>" . $myline['street1'] . "</street1>\n";
    $output.= "\t<street2>" . $myline['street2'] . "</street2>\n";
    $output.= "\t<city>" . $myline['city'] . "</city>\n";
    $output.= "\t<state>" . $myline['state'] . "</state>\n";
    $output.= "\t<zip>" . $myline['zip'] . "</zip>\n";
    $output.= "</location>\n";
}

For every row returned by our query and living in our $result variable, append our $output string with the given data.  $myline is the current row, and $myline[columname] is the info in the named column.  A \n means a linebreak, while the \t is a tab.  That’s all just to make it pretty.

echo $output;
Output the contents of $output.  Whatever is echo'd will be returned to the requestor (the getLocFromXML() function, or your browser if you're viewing it yourself).

insertcoord.php


Finally, we use insertcoord.php to update the LOCATIONS table with the newly obtained coordinates:

<?php

include_once('config.php');

$insertquery = "UPDATE locations SET ";

if (isset($_POST['lat']) && isset($_POST['lng'])) {

$loclat = $_POST['lat'];
$loclng = $_POST['lng'];
$locationid = $_POST['locationid'];

$insertquery .= "lat = " . mysql_real_escape_string($loclat) . ", lng = " .
   mysql_real_escape_string($loclng) . " WHERE locationid = " .
   mysql_real_escape_string($locationid);

$result = mysql_query($insertquery)
   or die ("Query failed, yo. " . mysql_error());
echo $result;
}

?>

Real similar to our getxmlinfo.php file, huh?  Again, we include the config.php file, again we’re submitting a query against our MySQL database.  This time, we’re using an UPDATE query instead of a SELECT query.

if (isset($_POST['lat']) && isset($_POST['lng'])) {
Remember how we used the HTTP POST method to talk to this script?  Well, first thing we’re doing is checking to see if whatever POSTed to this script remembered to include the longitude and latitude info we’re going to need.  Without it, there’s no need to bother with the rest of the script.  We use the PHP “isset()” function to check for those variables.  The && is the “and” operator, so both need to be included.

$loclat = $_POST['lat'];
$loclng = $_POST['lng'];
$locationid = $_POST['locationid'];

We’re pulling the values for the local $loclat, $loclng, and $locationid variables from the info POSTed to this script.  You could, if you were so inclined, just keep referring to them as “$_POST[‘lat’]” etc. throughout the course of your script, but I think this looks better.  Entering all that punctuation is a pain in the ass, and if you end up getting carpal tunnel your Guitar Hero skillz will go to hell in a handbasket. 

$insertquery .= "lat = " . mysql_real_escape_string($loclat) . ", lng = " .
   mysql_real_escape_string($loclng) . " WHERE locationid = " .
   mysql_real_escape_string($locationid);

$result = mysql_query($insertquery)
   or die ("Query failed, yo. " . mysql_error());
echo $result;

We just finished building our query and submitting it to our database.  Only thing interesting we’re doing here is using the mysql_real_escape_string() function to try to prevent a SQL injection attack against our database.  Without that, it’s all too easy for someone who knows to POST to it some naughty SQL that could potentially bring down our precious database.  Let's make it at least a little difficult for them, please.

Ok, so now that we have all the pieces in place, all we have to do is browse to our locationmassinsert.html file in our favorite browser.  With this design, we won’t get any feedback, but after it runs we should be able to view the underlying table and see that most of the records now have their lng and lat data.

One thing I’ve noticed with this script, though, is that it sometimes needs to be refreshed a couple of times before all the records are updated.  I’m guessing it has something to do with Google either limiting the number of geocoding requests you can send at one time, or an inherent limitation with the whole asynchronous nature of the requests we’re sending.  Not a big deal in this case as all we have to do is hit the refresh button in our browser to give it another go.

Next installment, we’ll start working on the whole Map part of the Google Maps mashup equation.  Fire away with any questions.

SQL:

Web 2.0:

Miscellaneous:

Coding:

Creating a Google Maps/Amazon.com/Banned Books Mashup

Part One of God Only Knows How Many Steps...

And now for something completely different…

Sorry to disappoint all the little nerdlings who come around here only for the Drupal goods, but today’s post won’t be at all related to our favorite CMS.  We’ve talked about different Drupaly topics for weeks, now it’s time to shake things up a little.  Today, kids, we’re going to make a basic Google Maps mashup.

Yeah, I know.  The world needs another Google Maps mashup like I need a hole in my head, right?  Who cares.  They’re fun, and we’ll try to make our sample a little more interesting by throwing in a little Amazon.com action and some AJAX-y goodness.

So here’s what we’re shooting for:

http://www.nerdliness.com/ajaxdemo/

First of all, don’t laugh.  Jeremy’s the form guy, I only do function.  Sure, it ain’t much to look at, but it works and it’s kinda cool.  And if we wanted it to look nice, we’d pay Jeremy his surprisingly reasonable freelance rate to pretty it up for us.

Ok, so what’s the big deal?  What are we doing on that page?   In a nutshell, we’re plotting, on the fly, the locations of every public school in Texas that, according to the ACLU of Texas, banned a book during 2006.  I took the data from a PDF they release annually (http://www.aclutx.org/projects/bannedbookspg.php?pid=45) and used it to populate a MySQL table with the various data you see by clicking on the assorted links. 

The data is sent over to the fine people at Google to plot the little book icons on the map when you click on various links.  When you click on the icon on the map itself, we hit the database again to display the location’s info (address, etc).  At the same time, we send a request to Amazon.com to get the info we need to dynamically create links to their product info pages for the various book titles. 

Finally, we’re using some AJAX to narrow down your search results as you type.  For instance, if you click Book Info and start typing the word “The,” you’ll see that the list is updated with every press of the key to show just the titles that contain what you’ve typed so far.  You can then click on the title of the book in question to plot its location on the map.

Now, a couple of disclaimers.  First, this sample is far from perfect.  Not only is it about as pretty as the Sam, the three-time World’s Ugliest Dog champ (http://www.snopes.com/photos/animals/uglydog.asp), but it doesn’t function quite the way you’d expect.  Information you might think would appear in one area doesn’t show up unless you click on a book icon, for instance.  Not very intuitive and certainly not a finished product.  Still, though, it’s good enough to give you an idea of what you can do and how you might do it.

Second, as with everything we do here, the sample code should be accepted “as-is,” with no guarantee implied.  I’ll provide it all at the end, but use it at your own risk.  I don’t think anything will blow up if you use it, but consider yourself warned.

Finally, while this little tutorial is intended for the beginner, there are a few prerequisites if you want to play the home version:

  1. You need a Google Maps API key.  They’re free and easily obtained, but they’re also uniquely assigned based on your domain name.  While you might see the Nerdliness.com one in my samples, it simply won’t work for you.  Get your own here:  http://code.google.com/apis/maps/signup.html
  2. If you’re interested in the Amazon.com part, you’ll need to sign up with them, too.  It’s also free and you can earn a commission on any sales you generate.  Sign up for that here:  https://affiliate-program.amazon.com/
  3. Some sort of database.  I’m using MySQL for the backend database.  You can use whatever database you like, but you might need to adjust the various queries accordingly.
  4. PHP. 

Other than that, you just need your favorite text editor and the ability to follow directions.  A little HTML/SQL/PHP/JavaScript knowledge wouldn’t kill you, but it’s totally not necessary.  I'm assuming you don't know much.

One last thing before we get started…  this will be a multipart post, but I have no idea exactly how many parts we’re looking at.  Sorta depends on how things look as we get going.  Subsequent parts should be posted within a couple of days, a week tops, though they might be broken up by additional Drupalesque posts from Jeremy and/or Fredric.

Ok, so in today’s post, we’re just going to set things up.  If you haven’t obtained that Google Maps API key I mentioned above, do that now.  Go ahead.  I’ll wait…  If you want to play the Amazon game, then snag that, too.

Done?  Great.  Now, let’s get you some data.  Like I mentioned before, our example is plotting some banned book information strip out of a PDF posted on the ACLU of Texas website (http://www.aclutx.org).  Believe me when I say that massaging that data was a long, tedious process, one that I don’t care to repeat any time soon and that I wouldn’t wish on you, gentle reader.  Instead, I’ll give you a shortcut. 

Download this zip file:  http://www.nerdliness.com/files/sqlsetup.zip

Inside, you’ll find several SQL scripts that will create the tables we need for this experiment and fill them with the data stripped from the ACLU PDF.  Because of the relationships between the tables, you’ll need to make sure to run the scripts in the following order:

 

 

  1. createmysqltables.sql
  2. authors.sql
  3. books.sql
  4. locations.sql
  5. reasons.sql
  6. bookauthors.sql
  7. challenges.sql
  8. challengereasons.sql

Of course, being the astute and savvy Internet user you are, you’ll be looking at these scripts before you actual execute them, right?  I certainly wouldn’t trust some faceless yahoo…

And having looked at the scripts, you’ll notice a couple of things.  First, we’re creating a fairly normalized database consisting of seven tables.  The relationships look something like this:

 

Nerdliness.com Entity-Relationship Diagram for Banned Book Database

For those of you who aren’t database people, that there’s what ya call an Entity-Relationship Diagram.  As the name implies, it provides a visual representation of the relationships between various entities.  In this case, each entity is a table in our database. 

Each square is a table in our database, and the information listed in the square tells you a little something about that particular table.  For example, the CHALLENGES table is represented by a block with the word “challenges” at the top.  Simple, huh?  Within that block, you can see a list of the various columns within that table (challengeid, locationid, bookid, actiontaken, notes, challengeyear).  Any columns that are part of a Primary Key are listed at the top, underlined, and have the letters PK to the left.  Columns in a Foreign Key relationship (a value listed in this table that’s a Primary Key in another table) are labeled FK.  Finally, required columns (those created using the NOT NULL SQL clause) are in bold.

The lines between the different entities tell you about the relationship between them.  For instance, look at the line connecting the LOCATIONS table with the CHALLENGES table.  That particular relationship is called a one-to-many relationship, meaning that each CHALLENGE we log in the database is mapped to exactly one specific LOCATION.  However, each LOCATION could potentially issue one or more challenges.  But I digress…  if you want more on E-R Diagrams, talk to Google.

Second thing you might notice is that we’re not filling in all the data in the LOCATIONS table.  Why?  Well, it has to do with geocoding.

See, Google Maps can’t just plot any old location just by the address.  Instead, it plots them out according to the longitude and latitude of the place in question.  That process of translating a street address into its longitudinal and latitudinal coordinates is called geocoding.
 
The Google Maps API will allow you to geocode addresses on the fly.  In fact, that’s how I originally set up this example…  every time I clicked on a location, I’d send a request to Google to geocode that address.  When that one came back, I’d send another request to actually plot the point.

That method lends itself to a couple of problems.  First, it greatly increases the amount of time it takes to plot out points.  Geocoding is an expensive process (resource-wise, not necessarily monetarily).  When you geocode every point every time, loading the page can take a while.  In some situations, you might not have a choice.  Want to map a moving target?  You’ll have to geocode on the fly.  In our case, though, them schools ain’t going any where, so we’re pretty safe to geocode once and store the data in our table for future use. 

The second problem is that the code needs to be a little more robust if you’re geocoding on the fly.  What happens if you have a malformed address in the mix and your code fails trying to geocode it?  If you don’t take that into account, your code might choke every time you try to plot that point.  I’m lazy.  I like the idea of running the script one time, monitoring for failures, and dealing with them by simply ignoring any entries in the database missing the longitude and latitude info.

And this, my friends, is where we're going to leave off for today.  Tune in next time as we create our first little bit o' AJAX to geocode all our addresses and fill in the rest of the blanks in our table. 

SQL:

Web 2.0:

Miscellaneous:

Subscribe to RSS - Google