Linux
Apache
MySQL
PHP

CSS
XHTML1.1
XML/RSS

Creative Commons

2010-06-24 20:09:18

Track Podcast Downloads Using PHP And Mod_Rewrite

I recently ran into a problem that I suspect most new podcasters run into. You publish your own podcast but soon find out that you have no way to track your downloads. Your first instinct would normally be the right one, just grep Apache's access log for the name of the mp3 file and count how many times it appears. Simple! But unless you own the server that your website is hosted on, you probably don't have control over those logs and they probably get rotated too often to get an accurate count. My host rotates the logs every 24 hours, and by rotate I mean delete the old one and make a new, blank file. So unless I check the log each and every day, I'm not getting my download count. But it's ok, let's make a way that is better.

So what we want to have is a table in our database to track the number of downloads. Every time someone downloads our podcast, we want the entry in the database to be updated. To do this, we'll make a PHP page that does three things:
  1. Accepts an argument for the filename
  2. Updates the database to add 1 to the download count
  3. Sends the mp3 to the client for download
So here is the PHP for the downloader PHP page (explanation to come):

<?php include ("database.php"); function clean($s) { return preg_replace("/[^A-Za-z0-9\-]/", "", $s); } $file = clean($_GET['f']); if (file_exists("files/" . $file . ".mp3")) { $SQL = "select id,downloads from podcasts where name='" . $file . "' limit 1"; $RESULT = mysql_query($SQL); if (mysql_num_rows($RESULT) > 0) { while($ROW = mysql_fetch_array($RESULT)) { $id = intval($ROW['id']); $downloads = intval($ROW['downloads']); $SQL = "update podcasts set downloads=" . ($downloads+1) . " where id=" . $id; $UPDATE = mysql_query($SQL); } } else { $SQL = "insert into podcasts (name,downloads) values ('" . $file . "',1)"; $RESULT = mysql_query($SQL); } mysql_close($CONN); header("Cache-Control: no-cache, must-revalidate"); // HTTP/1.1 header("Expires: Sat, 26 Jul 1997 05:00:00 GMT"); // Date in the past header('Content-type: audio/mpeg'); header('Content-Disposition: attachment; filename="' . $file . '.mp3"'); readfile("files/" . $file . ".mp3"); } else { mysql_close($CONN); header("Location: index.php"); } ?>

Now for the quick explanation of what's going on. First, you connect to your database however you do that. Second is a custom function to sanitize the user input. Since we are letting the user (not really, but it can happen) supply a filename, we want to make sure that they don't try to send any malicious code. This regular expression will find anything that is not a letter, number, or hyphen and remove it. I am assuming podcast filenames such as episode-001, so those are the only characters I need.

Next we get the user input and run it through the custom function. Next is an IF statement. If the filename supplied by the user actually exists on the server we will enter the IF block, which will either create a new entry if this is the first time the podcast has been downloaded or update the existing entry if one is there. If the file supplied by the user doesn't exist we will assume they are being malicious, close the database connection, and redirect to the front page.

Also inside of the aforementioned IF block is the download section. We set some header values to make sure that proxies or content engines don't cache the file and requests a new download each time (Cache-Control and Expires), set some parameters of the file since we will just be delivering a byte stream and not an actual "file" (Content-type, and Content-Disposition), and then deliver the byte stream (readfile).

So that's all there is to it. We now have a PHP file that can be used to track and deliver the download of our podcast. But we still have a slight problem. When we put this into our RSS feed the link will look something like:

http://www.domain.com/podcast.php?f=episode-01

Even though this will deliver an mp3 to the client (because we told it to), podcatchers (such as iTunes) will assume that the file is a PHP file instead of an audio file because the filename is podcast.php. We need to change the extension of our podcast downloader while still preserving our web server's ability to parse it as a PHP file. If we were to just rename podcast.php to podcast.mp3 our web server would not execute our code, it would just prompt the user to download podcast.mp3 or perhaps even try to play it in a music player. Obviously not what we want to do.

Instead, we will leave the file named podcast.php and use the power of Apache's mod_rewrite to fake the name as podcast.mp3. This is super simple, and just means two lines in your .htaccess file:

RewriteEngine on RewriteRule ^podcast\.mp3$ podcast.php

In plain English, this rule will search the address bar for any instance of podcast.mp3 and pass it locally as podcast.php. Now we can have this link in our RSS feed:

http://www.domain.com/podcast.mp3?f=episode-01

and all podcatchers will be tricked into thinking that it is, in fact, an audio file and you will have succeeded in your mission.

Back

3 comments


2010-06-25 22:47:32


Jonathan Angliss says...
An excellent, easy, relatively quick way of handling tracking. I have a few recommendations with the script, and a recommendation on the statistics.

Your script has little error handling in it. If your SQL db is unreachable, you'll start barfing errors, unless you have them disabled, which will just result in bad/empty data being sent to the client. I'd hide the errors (unless some debug flag is passed in), and show a 500 error of some sorts so feed programs can handle appropriately.

Move all your DB code inside the if (file_exists()) code. No need to open a DB connection to just close it if the file doesn't exist. Save your DB some work :)

The select, update, insert section of the SQL can be replaced with a single command, see the INSERT.. ON DUPLICATE KEY syntax ( http://dev.mysql.com/doc/refman/5.1/en/insert-on-duplicate.html ). This should save your script some leg work, and let SQL handle what it's good at. If you didn't want to do that, I'd skip using the variable $downloads for the update value, and do this instead:
update podcasts set downloads = downloads + 1 where id = "$id"
This lets MySQL handle the update value, saving you from the possibility of two executions overlapping each other.

Otherwise, a pretty simple script, and a nice way to handle quick tracking of downloads.

As for the basic tracking idea, if you're just after a "how many downloaded", this works great. If you want more, you might want to checkout FeedBurner. They offer a whole lot of tracking data, and even have stuff specifically for PodCasters. A good way to feed your ego, or squish it as I find when I look at my site stats sometimes :)

Have a great weekend.
Jon

2010-06-26 11:05:53


slonkak says...
Thanks, Jon! I'm relatively new to how RSS works, especially with podcatchers. I will definitely be implementing some of your ideas, especially some of the stupid things I overlooked such as where I opened the db *facepalm*

2014-06-22 18:14:33


Charles says...
Kevin, are you still tracking subscriptions with this method? I'm attempting to track users in my logs, but I tend to see multiple hits from the same IP with differing file sizes transferred. I think this happens when people skip forward.

I know the Blubrry service offers a method to track unique subscriptions, but I'm interested in using my own server.

www.wilsonwonderings.ga is our blog and podcast. my email is the word contact and then our domain name.

Post a comment!

Name:
Comment: