I thought it would be fun to track the sales rank over a period of time, then display a graph of the sales rank over time on a Web page.
You can gather data from Amazon in a number of ways. wget and grep could probably get the job done, but it is not elegant, nor is it encouraged by Amazon. The best way to get information is to use Amazon's application programming interface (API).
Amazon's Web Services (AWS) API offers a way to connect to the Amazon data warehouse and retrieve data about an Amazon item. To use the AWS API, you need to register with Amazon at the Amazon Web Services page. After registering, and accepting an end user license agreement (EULA), you will be given two keys: one for general access and requests, and one for verification and signing of requests. The general access key allows you to connect to the Amazon Web Services databases. The APIs are well-documented on the Amazon Web Services site.
I used the Amazon E-Commerce Service for my project to track sales rank on book titles over time. This service provides the ability to query an item via its Amazon Standard Identification Number (ASIN), International Standard Book Number (ISBN), author, artists, product name, publisher, or title, and retrieve virtually all information shown on the Amazon Web page about that item.
I started with an extremely simple PHP5 script that created the URL string you need to use with the Amazon Web service using Representational State Transfer (REST).
The PHP script is designed to run from the command line and POSTS a URL string. While you're debugging the script, you can copy the URL string and paste it into a Web browser to verify that the Web services interaction is working appropriately. The following script shows the basic setup of the PHP script to query AWS.
# Build URL that will query AWS
$ACCESS_KEY = 'Access_Key';
$asin = '1590594762';
$url = 'http://webservices.amazon.com/onca/xml?Service=AWSECommerceService';
$url.= "&AWSAccessKeyId=$ACCESS_KEY";
$url.= "&Operation=ItemLookup&IdType=ASIN&ItemId=$asin";
$url.= '&ResponseGroup=Medium,OfferFull';
print "
" . $url . "
";
?>
The output from this script is a URL you can enter in your browser. The browser will return some text formatted via XML. The XML schema for this text isn't too complicated, and if you wanted to use an XSLT stylesheet, you could format the XML into HTML and have your presentation layer completed. However, my goal was not just to get information stored in Amazon's database, but to store it myself so I can track the data over time.
For this I needed a data container. I had a few options for a data container in which to store statistics from the Web service queries. A relational database made the most sense, and PHP supports several. SQLite, introduced in PHP5, seemed like a nice choice, because SQLite is simple to administer and use.
Before you begin using SQLite, take a look at your PHP information and ensure that SQLite is supported by your configuration. If it is not, you can either compile the support into PHP or download an applicable package to add support for the database. Alternatively, you could use MySQL, PostgreSQL, Oracle, or another database.
Database setup
The database schema for this Web application involves two tables: one to track the unique Amazon Item Numbers (ASIN) and the initial date they were added into the tracking system, and the other to hold the ASIN, Sales Rank from Amazon, and datestamp for when the Sales Rank was updated. The small size of the database is a design feature.
The display page that shows the graphs, sales rank, and pricing information from Amazon will be updated upon display. That means we can pull the data, such as cover images, list price, description, title, and everything else dynamically. Amazon stores that information, so we don't need to. Additionally, if the data changes, such as when price changes during a sale, the display page will have the updated information.
This is the basic schema for my SQLite database:
create table aws (
asin varchar(30),
sales_rank bigint,
active_date date);
create table item (
asin varchar(30) primary key);
Next, I added the ASIN into the item table manually via SQL. Obviously, you could write a PHP page to administer this portion of database interaction as well. After the initial script is modified to parse the XML and store the data into the database, you could set up the PHP script to run as a cron job. The script to fill the database will query the database to see what Amazon item numbers the script should be gathering statistics for. This allows for tracking of multiple items without any code changes, and thus does not lock the script into any hard-coded ASIN, as I used in the initial URL-building PHP code.
#!/usr/bin/php -q
$DB="/var/www/db/aws.db";
# Ensure database exists
if (! file_exists($DB))
{
echo "The database file $DB not found.\n";
exit(05);
}
# Ensure database file is writable.
elseif(is_writable($DB) != 1)
{
echo "Processing cannot continue, the database $DB cannot be written to.\n";
exit(06);
}
# Assumes basic Schema setup for $DB
/* To get an appropriate database setup, simply run the following two lines
of code inside an SQLITE prompt.
create table item ( asin varchar(30) primary key);
create table track ( asin varchar(30), sales_rank big_int, query_date date);
To have an initial setup, an ASIN number must be entered into the 'item' table.
In this case I have chosen my ISBN number for my book, Pro OpenSSH.
Don't forget that rowid is kept internally in sqlite.
*/
# Amazon Web Services access Key
/* Get Amazon Web Services access Key (free) from http://aws.amazon.com
The following key is not a working key, but used as an example.
*/
$ACCESS_KEY='Access_Key';
# Connect to database
$dblink = sqlite_open($DB) or die ("Couldn't connect to $DB");
# Query database to find which ASINs to search on
$sql = "SELECT asin FROM item ORDER BY asin";
$resource_set = sqlite_query($dblink, $sql);
$dt = date('Y-n-d H:i');
while ($row = sqlite_fetch_array($resource_set, SQLITE_ASSOC))
{
# Value for ASIN
$asin=$row['asin'];
# Build URL to query based on ASIN and ACCESS_KEY
$url='http://webservices.amazon.com/onca/xml?Service=AWSECommerceService';
$url.="&AWSAccessKeyId=$ACCESS_KEY";
$url.="&Operation=ItemLookup&IdType=ASIN&ItemId=$asin";
$url.='&ResponseGroup=Medium,OfferFull';
# Place the results into an XML string
$xml= file_get_contents($url);
# Use Simple XML to put results into Simple XML object (requires PHP5)
$simple_xml=simplexml_load_string($xml);
# Retrieve Sales Rank
$sales_rank=$simple_xml->Items->Item->SalesRank;
# Place Sales rank in Database
# Build SQL statement to insert values into database
$sql2 = "INSERT INTO aws (sales_rank,active_date,asin) VALUES ('$sales_rank', '$dt', '$asin')";
# Ensure Results are received
$insert_results = sqlite_query($dblink, $sql2);
{
# Check results
if($insert_results)
echo "Database $DB updated.\n";
}
}
else
{
echo "Database $DB update failed.\n";
exit(07);
}
?>
After retrieving results from AWS, the script inserts the sales rank parameter along with a date and which ASIN the information correlates to into the aws table. This table will provide the data points for displaying graphs and other presentation material about an Amazon item.
The script parses the XML shown after using a URL similar to the one seen in the first PHP listing. The XML is then loaded into a string using PHP's file_get_contents function. From there the XML is loaded into a SimpleXML data structure that is a very thorough set of associative arrays that can reference any value contained inside XML tags. To see the whole listing you can use the var_dump or print_r functionality of PHP.
After finding the pertinent information to store -- Sales Rank in this case -- we use an insert statement to create a record inside the local database. If we get an error in almost any stage of execution, we exit and return a non-zero error code.
The final step is in presentation. As stated earlier, using XSLT to parse the XML is certainly an option, but for this exercise, I will just use native PHP functionality in conjunction with SimpleXML.
I wanted to graph the sales rank over time to show the status of my book sales. To do this, I used the Image::Graph PHP Extension and Application Repository (PEAR) module.
To install Image::Graph, follow normal PEAR installation procedures. The installation was fairly easy on Fedora and Ubuntu Linux systems. The next script is the display.php page, which accesses the database and displays the sales rank in graph form. The system could be modified easily to track price or albums from your favorite artist, or other items.
#$Id$
include 'Image/Graph.php';
$DB="/var/www/db/aws.db";
# Ensure database exists
if (! file_exists($DB))
{
echo "The database file $DB not found.\n";
exit(05);
}
# Ensure database file is writable.
elseif(is_writable($DB) != 1)
{
echo "Processing cannot continue, the database $DB cannot be written to.\n";
exit(06);
}
$ACCESS_KEY='Access_Key';
# Connect to database
$dblink = sqlite_open($DB) or die ("Could connect to $DB");
# Query database to find which ASINs to search on
$sql = "SELECT asin FROM item ORDER BY asin";
$resource_set = sqlite_query($dblink, $sql);
print "\n";
while ($row = sqlite_fetch_array($resource_set, SQLITE_ASSOC))
{
# Value for ASIN
$asin=$row['asin'];
# Build URL to query based on ASIN and ACCESS_KEY
$url='http://webservices.amazon.com/onca/xml?Service=AWSECommerceService';
$url.="&AWSAccessKeyId=$ACCESS_KEY";
$url.="&Operation=ItemLookup&IdType=ASIN&ItemId=$asin";
$url.='&ResponseGroup=Medium,OfferFull';
# Place the results into an XML string
$xml= file_get_contents($url);
# Use Simple XML to put results into Simple XML object
$simple_xml=simplexml_load_string($xml);
$author=$simple_xml->Items->Item->ItemAttributes->Author;
$ISBN=$simple_xml->Items->Item->ItemAttributes->ISBN;
$publisher=$simple_xml->Items->Item->ItemAttributes->Publisher;
$publication_date=$simple_xml->Items->Item->ItemAttributes->PublicationDate;
$title=$simple_xml->Items->Item->ItemAttributes->Title;
$num_pages=$simple_xml->Items->Item->ItemAttributes->NumberOfPages;
$list_price=$simple_xml->Items->Item->ItemAttributes->ListPrice->FormattedPrice;
$image=$simple_xml->Items->Item->MediumImage->URL;
$sale_price=$simple_xml->Items->Item->OfferSummary->LowestNewPrice->FormattedPrice;
$min_rank=get_rank($asin,'min');
$max_rank=get_rank($asin,'max');
# Format the output, you'd probably want a CSS sheet of some sort
print "Author: $author \n
Title: $title \n
Publisher: $publisher \n
ISBN: $ISBN \n
List Price: $list_price \n
Sale Price: $sale_price \n
Page Count: $num_pages \n
Best Rank: $min_rank \n
Worst Rank: $max_rank \n
Publication Date: $publication_date \n";
# Database chart points
$Graph =& Image_Graph::factory('graph', array(600, 400));
$Font =& $Graph->addNew('ttf_font', 'Verdana');
$Font->setSize(10);
$Graph->setFont($Font);
$Plotarea =& $Graph->addNew('plotarea');
$Dataset =& Image_Graph::factory('dataset');
# SQL to get data points
$sql="select active_date, sales_rank from aws where asin='$asin' order by active_date";
$resource_set = sqlite_query($dblink, $sql);
$i=0;
while ($row = sqlite_fetch_array($resource_set, SQLITE_ASSOC))
{
$Dataset->addPoint($i, $row['sales_rank']);
$i++;
}
$AxisX =& $Plotarea->getAxis(IMAGE_GRAPH_AXIS_X);
$AxisX->setTitle('Time');
$AxisY =& $Plotarea->getAxis(IMAGE_GRAPH_AXIS_Y);
$AxisY->setTitle('Sales Rank', 'vertical');
$Plot =& $Plotarea->addNew('smooth_line', &$Dataset);
$Graph->done(array('filename' => './output.png'));
print "\n";
}
print "
\n";
function get_rank($asin, $type)
{
global $dblink;
$sql = "select $type(sales_rank) as rank from aws where asin='$asin'";
$resource_set = sqlite_query($dblink, $sql);
while ($row = sqlite_fetch_array($resource_set, SQLITE_ASSOC))
{
return $row['rank'] ;
}
}
?>
This PHP script retrieves information from the database and builds a graph based on the data collected. Here is a screenshot of the page in action.
This bit of code should look similar to the first listing, in that it makes database calls to the SQLite database and interacts with Amazon via AWS. After getting the previous rankings out of the database, and displaying the information gathered via AWS, which is stored in a SimpleXML object, the script makes a call to Image::Graph, which uses the data points retrieved from the database and makes a line graph with the rank as the Y-axis and date/time as the X-axis. The script outputs the graph in .png format and displays it via HTML.
The get_rank function returns the highest or lowest rank the item has had since the database has been active. The rank is displayed when the price, picture, author, and other information is displayed.
All this work still leaves much to do to create a fully usable application, but this is a good start. Remember that by using the AWS API you can get information about other types of products from Amazon, including information from Wish Lists, Wedding Registries, and ListMania data.