How to scrape specific data from scrape with simple html dom parser

Tuesday, May 15, 2012

How to scrape specific data from scrape with simple html dom parser

I am trying to scrape the datas from a webpage, but I get need to get all the data in this link .




include 'simple_html_dom.php';

$html1 = file_get_html('http://www.aktive-buergerschaft.de/buergerstiftungen/unsere_leistungen/buergerstiftungsfinder');



$info1 = $html1->find('b[class=[what to enter herer ]',0);

I need to get all the data out of this site .




Bürgerstiftung Lebensraum Aachen

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Hubert Schramm

    Alexanderstr. 69/ 71

    52062 Aachen

    Telefon: 0241 - 4500130

    Telefax: 0241 - 4500131

    Email: info@buergerstiftung-aachen.de

    www.buergerstiftung-aachen.de

    >> Weitere Details zu dieser Stiftung



Bürgerstiftung Achim

    rechtsfähige Stiftung des bürgerlichen Rechts

    Ansprechpartner: Helga Kühn

    Rotkehlchenstr. 72

    28832 Achim

    Telefon: 04202-84981

    Telefax: 04202-955210

    Email: info@buergerstiftung-achim.de

    www.buergerstiftung-achim.de

    >> Weitere Details zu dieser Stiftung

I need to have the data that are "behind" the link - is there any way to do this with a easy and understandable parser - one that can be understood and written by a newbie!?

Source: Tips4all

4 comments:

UserMay 15, 2012 at 2:26 AM
Seems to be written in the documentation:

$html1->find('b[class=info]',0)->innertext;
ReplyDelete
Replies
UserMay 15, 2012 at 2:26 AM
Your provided links are down,
I will suggest you to use the native PHP "DOM" Extension instead of "simple html parser", it will be much faster and easier ;)
I had a look at the page using googlecache, you can use something like:-

$doc = new DOMDocument;
@$doc->loadHTMLFile('...URL....'); // Using the @ operator to hide parse errors
$contents = $doc->getElementById('content')->nodeValue; // Text contents of #content
ReplyDelete
Replies
UserMay 15, 2012 at 2:26 AM
From what i can quickly glance you need to loop through the <dl> tags in #content, then the dt and dd.

foreach ($html->find('#content dl') as $item) {
$info = $item->find('dd');
foreach ($info as $info_item) {..}
}

Using the simple_html_dom library
ReplyDelete
Replies
UserMay 15, 2012 at 2:26 AM
XPath makes scraping ridiculously easy, and allows for some changes in the HTML document to not affect you. For example, to pull out the names, you'd use a query that looks like:

//div[id='content']/d1/dt

A simple Google search will give you plenty of tutorials
ReplyDelete
Replies

Add comment

Ccna final exam - java, php, javascript, ios, cshap all in one

Tuesday, May 15, 2012

How to scrape specific data from scrape with simple html dom parser

4 comments:

Total Pageviews