How to use PHP to copy files from a remote directory to your server

21 Nov 2009
Posted by Eddie

One of the sites I'm working on needs a photo gallery consisting of about 300 images. However, the images they need in the gallery are stored on their old server which we don't readily have access to. So rather than copying each photo by hand, we can use PHP to copy them directly to our server without having to move each one manually. Here is a script I wrote yesterday that saved me a ton of time.

Once again, I opted to use the simple_html_dom parser, which uses selectors like jQuery to match elements on a page and allows one to do fun stuff to manipulate the data.

On the site I need to take the images from, they are listed like this:

<a href='full/photo055.jpg' rel='lightbox' style='color:#ffffff;'><img src='thumbs/photo055.jpg' border='0' width='100' height='100'>	055	</a>
<a href='full/photo294.jpg' rel='lightbox' style='color:#ffffff;'><img src='thumbs/photo294.jpg' border='0' width='100' height='100'>	294	</a>
...

And so on about 300 more times. It's a thumbnail image linked to a larger image that appears in a lightbox when the user clicks. All we're interested in, really, is grabbing the large image and copying to our server.

So here's a script that parses this page and matches each <a> tag. It examines each match again, this time only using matches that end with 'jpg'. Then each of those matches are striped of extraneous code, starting with <a href='full/photo055.jpg' rel='lightbox' style='color:#ffffff;'><img src='thumbs/photo055.jpg' border='0' width='100' height='100'> 055 </a> and whittling down to just 'full/photo055.jpg'. Once that's done, we can add on a new path of where we want the copied photo to live on our server.

<?php 
include ('includes/simple_html_dom.php');
$html = file_get_html('http://www.somedomain.com/photos/index.html');
$frontremove = "<a href='";
$backremove = "' rel='lightbox' style='color:#ffffff;'>";
$x=0;
foreach($html->find('a') as $element) {
	if(preg_match('/jpg/',$element)) {
		$image[$x] = str_replace($frontremove,"",$element);
		$image[$x] = str_replace($backremove,"",$image[$x]);
		$image[$x] = substr($image[$x], 0, 17);
		$x++;
	}
}
for($n=0;$n<=$x;$n++) {
	$file = 'http://www.somedomain.com/photos/'.$image[$n];
	$newfile = $_SERVER['DOCUMENT_ROOT'] . '/photos/'.$image[$n];
	echo $newfile;
	fopen($newfile, "w");
	if (!copy($file, $newfile)) {
		echo "failed to copy $file...\n";
	}
	fclose($newfile);
}
?>

Now that I'm looking at this script again under a well-rested mindset, I think I could simplify this and avoid using DOM parsing altogether since we know the remote directory. This simpler method would save memory and probably work more efficiently. I probably overcomplicated this script.

However, an advantage to using DOM parsing is that you would only copy over the files that you want from the page, rather than running the risk of copying any other files that may be living in that directory that you may not want.

Oh well, it's all a learning experience. I'll play with a simplified version later.

Comments

Review

Thanks for sharing this information and hope to read more from you.

Posted by Term Paper | Dec 7th, 2009 at 2:43 am | Reply
Posted by canisaysomething.com | Jan 25th, 2010 at 3:43 pm | Reply