Crawler optimieren (Schneller!?)
Hallo zusammen,
hat jemand eine Idee, um den Crawler zu optimieren?
PHP-Code:
function get_data($url)
{
/* do some curl magic */
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 3);
$data = curl_exec($ch);
curl_close($ch);
return $data;
}
function get_match($regex, $content, $pos = 1)
{
/* do your job */
preg_match($regex, $content, $matches);
/* return our result */
return $matches[intval($pos)];
}
echo "Naruto-Tube Crawler Test<br />";
function pageSearch()
{
$html = file_get_contents('http://naruto-tube.org/boruto-episoden-streams'); //get the html returned from the following url
$pokemon_doc = new DOMDocument();
libxml_use_internal_errors(true); //disable libxml errors
if (!empty($html)) {
//if any html is actually returned
$pokemon_doc->loadHTML($html);
libxml_clear_errors(); //remove errors for yucky html
$pokemon_xpath = new DOMXPath($pokemon_doc);
//get all the h2's with an id
$pre_link = $pokemon_xpath->query('//tr/@onclick');
if ($pre_link->length > 0) {
foreach ($pre_link as $row) {
$search = array("window.location.href = '", "'");
$replace = array("", "");
$clean_link = str_replace($search, $replace, $row->nodeValue);
// Grab Name
$name = get_data("http://naruto-tube.org" . $clean_link);
$grabed_name = @get_match('/<td class="contentheading" width="100%">([^\"]*)<\/td>/', $name);
// Search Video URL
$search_vid_url = get_data("http://naruto-tube.org" . $clean_link);
$search_vid_url_found = @get_match('!var S1 = \'<iframe width="735" height="414" src="(.+)" frameborder="0" scrolling="no" allowfullscreen></iframe>\'!iUm', $search_vid_url);
// Grab Video File
$founded_vid_url = get_data($search_vid_url_found);
$grab_video = @get_match('!file: "(.+)",!iUm', $founded_vid_url);
$stream_file .= '<a href="' . $grab_video . '">' . $grabed_name . '</a><br />';
}
}
}
$array = array(
$stream_file,
);
return ($array);
}
$boruto = pageSearch();
echo $boruto[0];
Ich finde, er sucht (zu)lange!?
Kann auch an der Seite liegen, aber wnen jemand Optimierungsvorschläge hat, würde ich mich über Hilfe freuen.
LG