Quickly Build Amazon ASIN Grabber with Simple Html DOM

In my last article Easy Build Amazon ASIN Grabber with PHP and Curl, now i will share about how to use Simple HTML DOM to grab asin from amazon site. Basically, the concept is same, grab HTML`s element, but with Simple HTML DOM function , the script will be more simpler. Visit PHP Simple HTML DOM if you want to learn more about simple html DOM.

What is PHP Simple HTML DOM Parser ? From the site, The description, requirement & features is

  • A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
  • Require PHP 5+.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

Okay let`s start the experiment.

  • Download Simple Html DOM function here
  • Create php file with name asin_dom.php

include "simple_html_dom.php";
$html = file_get_html('http://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone');
$no=1;
echo '<table border="1" style="border-collapse:collapse;border-spacing:0;border-color:#aabcfe;"><tr><td align="center">NO</td><td align="center" class="tg-g91i">ASIN</td><td align="center" class="tg-g91i">TITLE</td></tr>';
foreach($html->find('a') as $element)
{
$asin='';
$title='';
foreach ($element->find('span') as $node1)
{
  if ($node1->class=='lrg bold')
  {
    foreach ($node1->find('text') as $node)
    {
       if ($node->parent() === $node1 && strlen($t = trim($node->plaintext)))
       {
           $url = $element->href;
           $hasil=explode("/",$url);
           if (count($hasil)>=5)
           {
              if ($hasil[4]=='dp')
              {
                 $asin=$hasil[5];
              }
           }
           $title = $t ;
           echo '<tr><td align="center">'.$no.'</td><td>'.$asin.'</td><td>'.$title.'</td></tr>';
           $no++;
        }
    }
 }
}
}
echo '</table>';

 How Amazon ASIN Grabber with Simple Html DOM work ? 

  • You must include simple_html_dom.php on the first.
include "simple_html_dom.php";
  • Scrap HTML element with
$html = file_get_html('http://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone');
  • Next, you must examine the pattern of ASIN code layout on the amazon html element. in example,

We will grab iphone product in amazon. We get url ” http://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone ” open with your browser , ( in this case i use chrome browser ). And then right click and choose view page source . Examine the line of code until you find the same pattern and sequence as follows

<h3 class="newaps">
 <span style="color: rgb(255, 0, 0);"><a href="http://www.amazon.com/Apple-iPhone-16GB-Black-Verizon/dp/B004ZLV5UE/ref=sr_1_1?ie=UTF8&amp;qid=1406514795&amp;sr=8-1&amp;keywords=iphone"></span>
<span style="color: rgb(255, 0, 0);"><span class="lrg bold">Apple iPhone 4 16GB (Black) - CDMA Verizon</span></a></span>
<span class="med reg">by Apple (Sep 3, 2011)</span>
 </h3><ul class="rsltL">
...........
<span style="color: #ff0000;"> <a href="http://www.amazon.com/Apple-iPhone-8GB-White-Verizon/dp/B0074R1IP8/ref=sr_1_3?ie=UTF8&amp;qid=1406514795&amp;sr=8-3&amp;keywords=iphone"></span>
<span style="color: #ff0000;"><span class="lrg bold">Apple iPhone 4 8GB (White) - Verizon</span></a></span>

Find a href element contains <span class=”lrg bold”> , with that code  we will get url and the title. Use this script to grab that patern.

</span>

foreach($html->find('a') as $element)
{
$asin='';
$title='';
foreach ($element->find('span') as $node1)
{
if ($node1->class=='lrg bold')
{
foreach ($node1->find('text') as $node)
{
if ($node->parent() === $node1 && strlen($t = trim($node->plaintext)))
{
$url = $element->href;
$hasil=explode("/",$url);
if (count($hasil)>=5)
{
if ($hasil[4]=='dp')
{
$asin=$hasil[5];
}
}
$title = $t ;
echo '<tr><td align="center">'.$no.'</td><td>'.$asin.'</td><td>'.$title.'</td></tr>';
$no++;
}
}
}
}
}

<span style="color: #000000; font-family: inherit; font-size: 1rem; line-height: inherit;">

Congratulation, you create your amazon asin grabber with simple html dom.

You can develop the script to scrap the Amazon product description, price, etc. Contact us is you need to learn more.

This script is not function again, for new release of this script, please visit http://seegatesite.com/easy-get-asin-with-my-amazon-asin-grabber-class/

Click Here to try Amazon ASIN Grabber with Simple Html DOM.

The following two tabs change content below.
This site is a personal Blog of Sigit Prasetya Nugroho, a Desktop developer and freelance web developer working in PHP, MySQL, WordPress.

7 Comments

 Add your comment
  1. Dear Admin,
    In source code from url : hxxp://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone ”.

    I’n not find a href element contains , help me please

    Thankyou
    ismanuddin

  2. Can We Get The Variation Of A Parent ASIN Using PHP.??

Leave a Comment

Your email address will not be published.

Time limit is exhausted. Please reload CAPTCHA.

%d bloggers like this: