Archiving a Joomla website with wget

by Edward_178118   Last Updated January 13, 2018 03:10 AM

I've been asked to archive a Joomla website using wget so the end result is a directory of HTML and images. This way, the website can still be viewed for images and content without the actual MySQL database and the Joomla software and extensions. Again, this is only to archive it, not actually producing a production site this way.

I've tried using the following to create the archive:

wget --recursive --no-clobber --page-requisites --html-extension --convert-links --restrict-file-names=windows --random-wait --domains example.com --no-parent www.example.com

This produces a directory called www.example.com, and it does contain the images, HTML version of the Joomla pages and even the CSS.

However, there are links which appear in HTML which are hard links to example.com. Such as the link for the CSS, it is still the same link as when the site was live as a Joomla website. While this looks like it successfully retrieve the entire website for archive purposes with wget, it is creating links to the production site on some things.

I'm not sure how to solve this. Is there a way on wget to change things like http://example.com/template/mytemplate/css so that ../template is at the root directory of the archive?

Or do I need to create a shell script of sed -i to globally change so all the links reference itself? In the case with the CSS, I have checked and it did download it correctly, it just isn't pointing to it, it it pointing to the production live site.

Tags : joomla-2.5 wget


Related Questions