In my last article Easy Build Amazon ASIN Grabber with PHP and Curl, now i will share about how to use Simple HTML DOM to grab asin from amazon site. Basically, the concept is same, grab HTMLs element, but with Simple HTML DOM function , the script will be more simpler. Visit PHP Simple HTML DOM if you want to learn more about simple html DOM.
What is PHP Simple HTML DOM Parser ? From the site, The description, requirement & features is
- A HTML DOM parser written in PHP5+ let you manipulate HTML in a very easy way!
- Require PHP 5+.
- Supports invalid HTML.
- Find tags on an HTML page with selectors just like jQuery.
- Extract contents from HTML in a single line.
Okay lets start the experiment.
- Download Simple Html DOM function here
- Create php file with name asin_dom.php
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | include "simple_html_dom.php"; $html = file_get_html('http://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone'); $no=1; echo '<table border="1" style="border-collapse:collapse;border-spacing:0;border-color:#aabcfe;"><tr><td align="center">NO</td><td align="center" class="tg-g91i">ASIN</td><td align="center" class="tg-g91i">TITLE</td></tr>'; foreach($html->find('a') as $element) { $asin=''; $title=''; foreach ($element->find('span') as $node1) { if ($node1->class=='lrg bold') { foreach ($node1->find('text') as $node) { if ($node->parent() === $node1 && strlen($t = trim($node->plaintext))) { $url = $element->href; $hasil=explode("/",$url); if (count($hasil)>=5) { if ($hasil[4]=='dp') { $asin=$hasil[5]; } } $title = $t ; echo '<tr><td align="center">'.$no.'</td><td>'.$asin.'</td><td>'.$title.'</td></tr>'; $no++; } } } } } echo '</table>'; |
Table of Contents
How Amazon ASIN Grabber with Simple Html DOM work ?
- You must include simple_html_dom.php on the first.
1 | include "simple_html_dom.php"; |
- Scrap HTML element with
1 | $html = file_get_html('<span style="color: #ff0000;">http://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone</span>'); |
- Next, you must examine the pattern of ASIN code layout on the amazon html element. in example,
We will grab iphone product in amazon. We get url ” http://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone ” open with your browser , ( in this case i use chrome browser ). And then right click and choose view page source . Examine the line of code until you find the same pattern and sequence as follows
1 2 3 4 5 6 7 8 | <h3 class="newaps"> <span style="color: rgb(255, 0, 0);"><a href="http://www.amazon.com/Apple-iPhone-16GB-Black-Verizon/dp/B004ZLV5UE/ref=sr_1_1?ie=UTF8&amp;qid=1406514795&amp;sr=8-1&amp;keywords=iphone"></span> <span style="color: rgb(255, 0, 0);"><span style="color: #ff00ff;"><span class="lrg bold"></span>Apple iPhone 4 16GB (Black) - CDMA Verizon</span></a></span> <span class="med reg">by Apple (Sep 3, 2011)</span> </h3><ul class="rsltL"> ........... <span style="color: #ff0000;"> <a href="http://www.amazon.com/Apple-iPhone-8GB-White-Verizon/dp/B0074R1IP8/ref=sr_1_3?ie=UTF8&amp;qid=1406514795&amp;sr=8-3&amp;keywords=iphone"></span> <span style="color: #ff0000;"><span style="color: #ff00ff;"><span class="lrg bold"></span>Apple iPhone 4 8GB (White) - Verizon</span></a></span> |
Find a href element contains <span class=”lrg bold”> , with that code we will get url and the title. Use this script to grab that patern.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | </span> foreach($html->find('a') as $element) { $asin=''; $title=''; foreach ($element->find('span') as $node1) { if ($node1->class=='lrg bold') { foreach ($node1->find('text') as $node) { if ($node->parent() === $node1 && strlen($t = trim($node->plaintext))) { $url = $element->href; $hasil=explode("/",$url); if (count($hasil)>=5) { if ($hasil[4]=='dp') { $asin=$hasil[5]; } } $title = $t ; echo '<tr><td align="center">'.$no.'</td><td>'.$asin.'</td><td>'.$title.'</td></tr>'; $no++; } } } } } <span style="color: #000000; font-family: inherit; font-size: 1rem; line-height: inherit;"> |
Congratulation, you create your amazon asin grabber with simple html dom.
You can develop the script to scrap the Amazon product description, price, etc. Contact us is you need to learn more.
This script is not function again, for new release of this script, please visit https://seegatesite.com/easy-get-asin-with-my-amazon-asin-grabber-class/
Click Here to try Amazon ASIN Grabber with Simple Html DOM.
Dear Admin,
In source code from url : hxxp://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone ”.
I’n not find a href element contains , help me please
Thankyou
ismanuddin
I think amazon change the css, try this code
include “simple_html_dom.php”;
$html = file_get_html(‘http://www.amazon.com/s/ref=nb_sb_noss_2/176-0876229-3718769?url=search-alias%3Daps&field-keywords=iphone’);
foreach ($html->find(‘li[class=s-result-item]’) as $node1)
{
foreach($node1->find(‘a’) as $linku)
{
$url=$linku->href;
$hasil=explode(“/”,$url);
if (count($hasil)>=5)
{
if ($hasil[4]==’dp’)
{
echo $hasil[5].’
‘;
}
}
}
}
🙂
Is the above code using : span , there is a change or not used ?
not used…the basic using simple_html_dom is you must be careful in looking for html attributes (with right click and view pagesource).
change the code with new example code in the comment.
If you have difficulty to use simple_html_dom, please try my tutorial grab asin using curl, its still working
thank you
Can We Get The Variation Of A Parent ASIN Using PHP.??
Helo Usama Jafri,
Sory, i dont know what you need, but you can try my another asin grabber class script here https://seegatesite.com/easy-get-asin-with-my-amazon-asin-grabber-class/ 🙂