Tail Wagging Googlebot

July 17, 2009, 10:46 pm

I discovered an interesting and somewhat elusive bug in my user validation stuff last night. With a little more sleep and some forethought I might have arrived at the solution a little earlier that I did, but I got there eventually.

I noticed, of the handfall of users that have signed up for my Timeline Project, not all had validated their accounts. I wondered about the reason why and decided to do a bit of testing. I follow a pretty standard model for account validation:

  • User enters their new details.
  • I add the details to the database and generate a random key.
  • I send an email to the specified address with a URL back passing the key as a param.
  • User clicks on the URL in the email and I match it to the database key I stored earlier. Then I erase the key in the user's database entry. I only need to do this once and I use the key for other stuff (password resets).
  • If everything matches up, I set a valid flag on the user account.

So pretty straightforward? When I originally wrote the code, I tested with a couple of accounts, assumed everything was fine and moved on. I assumed wrong.

Re-testing this stuff, I created a whole pile of accounts and checked that they could be successfully validated. To my chagrin, I found that when I clicked on the validation URL, for a handful of them, I generated an error message saying that the key didn't match. This was something of a gumption trap, a random intermittent fail with a system that should be trivial.

I fell back on that mainstay of programming, debug writes. I already have a decent logging system, so I added a few more log messages when the validation URL gets fired. The results I found were puzzling to say the least:

For this log example, the validation succeeded (validate_pass), but the message immediately after it was extremely suspicious. An attempt to validate that user id from some obscure IP address that I hadn't seen in my logs before. My "it is always your fault" programmer instincts failed me and I jumped from one irrational and outlandish conclusion to the next. Eventually I came to my senses and decided to do a traceroute on the IP address:

When my activation email arrived in my Gmail inbox, that enthusiastic, tongue lolling, tail wagging Labrador of a GoogleBot immediately jumped on my validation URL and bounded off to get it. So if the GoogleBot hit the URL first, then the key would get reset and the subsequent real user request would fail. It is pretty satisfying when a classic intermittent and quite random race condition has a very simple solution. "It is ALWAYS your fault". I gotta remember that one.

Adding a few lines to my robots.txt file, which up until that point had been empty, ensured that the GoogleBot would refrain from hitting those fragile URLs. Bug Fixed.

Permalink - Comments - Tags: Development,Google

Juno, Sitemaps, Allison Janney, Crumpet Toast

July 12, 2009, 1:31 am

Finally got around to watching Juno tonight. I loved the trailer when it was on TV and really wanted to see it at the time (I would never barf in your urn). My life tends to lag by about 18 months or so when it comes to stuff like that.

Jeff Atwood on the StackOverflow podcast made me realize I need to generate a sitemap XML based on my content, rather than knocking something together from the entry pages and hoping that the Google crawler will find it's way. Pretty happy with the results, my PHP produced 1,249 URLs ready for Google indexing. Who knew I had so much content?

Mmm, eating crumpet toast with honey. Ok, this is just a gratuitous, inane, food reference.

So I sat in bed writing some pretty simple PHP, while watching Juno. I liked it, but I get the feeling that I am so changed by my West Wing obsession that I would probably like anything with Allison Janney in it:

Permalink - Comments - Tags: Development

August 1940

July 4, 2009, 11:17 pm

I have alternatively made progress and had setbacks in the last few days. First the progress:

I am finding the new interface really streamlined for creating content and have finished entering data points up to August 1940 in Europe. The battle for France has ended, the Italians have entered the war and the Battle of Britain has just begun, it is a fascinating period to be reading about. I would love to get hold of a book on the Battle of Britain to improve the geographical data I have entered for the various phases of that conflict.

The setbacks:

First I discovered that the datapoint creation form was badly broken in Internet Explorer 7. Fixing this will involve a painstaking process of elimination to find the CSS at the root of the problem. I then discovered that the main Timemap interface was broken on all versions of IE, this is a much more serious. I haven't had a chance to investigate yet, but I am confident that this problem will not be too complicated to resolve.

Update : The second problem with the main interface is fixed now. Turns out I was generating some invalid JSON for polyline type markers on the map. Now I just have to fix up entry creation interface for IE7.

Permalink - Comments - Tags: Development,World War Two

Timeline Project Developer Diary

July 2, 2009, 12:27 am

Now that the content creation interface is getting more mature, I have decided to go back to entering data for the site. There are a couple of good reasons to do this:

  • Populating a quality data set is the fundamental reason that I started the project.
  • Eating my own dog food is a great way to find problems with the UI.

Unsurprisingly, apart from entering some more data for the Battle of France, I found a few problems with the UI that I have tried to address:

GeoCoding AJAX in the UI

Often (but obviously not always) new datapoints have a pretty easy to find location. London for De Gaulle's first radio address after the fall of France or Paris for the German occupation of that city in June 1940. I found myself switching to Wikipedia to look up those locations, finding the coordinates and then copy/pasting them into the text boxes in the UI. This was pretty clumsy and I realized I could just do the lookup myself using the Google GClientGeocoder. I have added a search box so that a user can type in a name, do a search and update the latitude, longitude accordingly. I love how this turned out and it really streamlines datapoint creation process:

Check for duplicates

I have a text file of data points that I had created early on in the project (and was previously importing into the database with a Python script). I have been slowly going through that file, using the UI to enter the data. I haven't been to careful about keeping track of where I was up to and have consequently entered a couple of data points twice. This was pretty annoying and I realized a simple solution to the problem would be an AJAX based check for duplicates as soon as you enter a start and end date. This was inspired mostly by what Stack Overflow does when you create a new question:

Implementing this feature involved a bit of learning curve to understand how the MooTools Request object worked, but in the end it was pretty straight forward.

Permalink - Comments - Tags: Development,World War Two,Napoleonic Wars

Timeline Project Comments

June 27, 2009, 2:40 am

I have implemented a comment stream for each data point in the system. There is still a long list to todos and probably a longer list of bugs, but I feel like I am making some progress:

  • I need to a better job of filtering datapoints in the page. Rather than doing a reload from the right hand tag menu.
  • I need to scroll the timeline to a data point based on the id.
  • I would like the ability to flag data points as offensive,inaccurate. I would like the entry page for each data point to reflect the number of flags.
  • I would like to create user badges/achievements based on what the user does in the system. This inspired by the X-Box achievements system via StackOverflow.
  • I realised today that it would be handy to do a GeoCoding request based on a city name typed into the edit/create entry forms. I often found myself doing a search on wikipedia, going to the geocoding page and then copy/pasting the latitude, longitude. This could mostly be automated in the form.
  • I have clarified the tags for each data point, but they are still not perfect. I need to remove the theater option for The Napoleonic Wars. I also should get rid of the War, it is not really useful.
  • I need to clean up the info window dialogs on the map. They are ugly at the moment and they need links to the edit page and discuss page.
  • Check out the details for licensing the data as CC WIKI.

Permalink - Comments - Tags: Development,World War Two,Napoleonic Wars