top_left top_right
bottom_left
Next Event: Unknown | Forum Rules | QGL Website | Event Registration
openFolder AusForums.com
iconwatfolderLineopenFolder LANs
iconwatfolderLineopenFolder QGL
iconwatfolderLineopenFolder QGL Forum
Author
Topic: Web site file checking tool
TicMan
Posts: 4734
Location: Melbourne, Victoria
So the subject isn't descriptive but I'm looking for a tool that can go through code on a site and spit out what files are in use by the site. I'm moving all our sites into SVN and the amount of old s*** code in one of them is beyond believable and I'd like to tidy it up before doing the initial import.

Anyone heard of anything that can do this?

Edit: Adding that I've got the raw code at my end and I'm not someone browsing to a site.
system
--
scuzzy
Posts: 13459
Location: Brisbane, Queensland
Well you could allways do a wget mirror and see what it s***s out.
TicMan
Posts: 4735
Location: Melbourne, Victoria
wget won't tell me what include files are being called or even Web.config - and all of them are needed!
Clubby
Posts: 113
Location: Brisbane, Queensland
yeah grab a tool that downloads whole sites (obviously you set things like within the domain, amount of level to go down following links, types of file extensions to download etc) like SurfOffline or something. With appropriate permissions is should grab the included libs etc.

Haven't tried any of the free stuff as the only time I've done something like this I had access to a bit of software that had been bought.

last edited by Clubby at 14:54:12 16/Jun/09
thermite
Posts: 1767
Location: Brisbane, Queensland
You can find out which files are loaded by the browser. But this won't be good enough unless you also click every single possible thing, and if there is any scripting/programming going on in some of these files, there could easily be other dependencies you won't find unless you carefully investigate.
pARODY
Posts: 340
Location: Brisbane, Queensland

I use the following site to pull down malicious code from websites and review it safely. It has the ability to read the html/php/javascript and pull down the next referenced file. They released a basic version of it for people to use, maybe you can modify it to just pull down the files from the site you want.


http://jsunpack.jeek.org/dec/go

But it won't find any server side included files that are not referenced within the code, so it won't grab .htaccess or similar files that are commonly present but not linked to.
Pinky
Posts: 1714
Location: Melbourne, Victoria

If the site has been active for a long time, use Webalizer on the HTTP server log files? Should be a pretty accurate representation of what's being used, I would have thought. Not 100% perfect though.
scuzzy
Posts: 13460
Location: Brisbane, Queensland
I can't think of anything localy that can bridge the gap between the files referenced in the html output and any serverside includes you might have.
TicMan
Posts: 4736
Location: Melbourne, Victoria
I've gone through a few of the ideas but the biggest problem is the server side includes. I might look at writing some almight Perl to go through each file and if it finds a reference to an include, img src, etc then to record it so I know what to keep (or vice versa).

Was hoping someone else had done it so I can copy their work, pretend to spend a week doing it (while maintaining my active QGL attendance) and become a hero at work.
Pinky
Posts: 1715
Location: Melbourne, Victoria

Couldn't you just "grep -i include" to do that?
thermite
Posts: 1768
Location: Brisbane, Queensland
what would that achieve
TicMan
Posts: 4737
Location: Melbourne, Victoria
f*** pinky you just don't get it.. we need to make things that are very simple (grep -ir *include*) sound overly complex so we get more time to do it that we spend playing games or browsing the web.

Don't you know how IT works?!?
thermite
Posts: 1769
Location: Brisbane, Queensland
There are many ways to load dependencies other than typing the word include
Clubby
Posts: 114
Location: Brisbane, Queensland
Usually that's what the approval process caters for Ticman :) ...

* Peer review what you are going to do
* Document all the risks to the business
* Change control person approve
* Business unit that owns the system approve it
* Manager etc ...
tequila
Posts: 2458
Location: Brisbane, Queensland
brett@probation:/unix/logs$ cat unix.org.au-access_log | awk {'print $7'} | sort | uniq


(for apache logs)
TicMan
Posts: 4738
Location: Melbourne, Victoria
Usually that's what the approval process caters for Ticman :) ...


Approval processes? Takes too long for one of them - just do it on the fly, living by the seat of my pants!
mooby
Posts: 4882
Location: Brisbane, Queensland
what lang? redgate have alot of dot net tools that will analyze code and find redundant code. might be a good start?
system
--
Not a new post since your last visit.
New Post Since your last visit
Back To Forum
Advertise with Us | Privacy Policy | Contact Us
© Copyright 2001-2026 AusGamers Pty Ltd. ACN 093 772 242.
Hosted by Mammoth Networks - Australian VPS Hosting
Web development by Mammoth Media.