|
![]() |
|
| Author |
|
|||||||
|
TicMan
Posts: 4734
Location: Melbourne, Victoria
|
So the subject isn't descriptive but I'm looking for a tool that can go through code on a site and spit out what files are in use by the site. I'm moving all our sites into SVN and the amount of old s*** code in one of them is beyond believable and I'd like to tidy it up before doing the initial import.
Anyone heard of anything that can do this? Edit: Adding that I've got the raw code at my end and I'm not someone browsing to a site. |
|||||||
| #0 02:58pm 16/06/09 |
|
|||||||
|
system
|
--
|
|||||||
| #0 |
|
|||||||
|
scuzzy
Posts: 13459
Location: Brisbane, Queensland
|
Well you could allways do a wget mirror and see what it s***s out.
|
|||||||
| #1 02:49pm 16/06/09 |
|
|||||||
|
TicMan
Posts: 4735
Location: Melbourne, Victoria
|
wget won't tell me what include files are being called or even Web.config - and all of them are needed!
|
|||||||
| #2 02:51pm 16/06/09 |
|
|||||||
|
Clubby
Posts: 113
Location: Brisbane, Queensland
|
yeah grab a tool that downloads whole sites (obviously you set things like within the domain, amount of level to go down following links, types of file extensions to download etc) like SurfOffline or something. With appropriate permissions is should grab the included libs etc.
Haven't tried any of the free stuff as the only time I've done something like this I had access to a bit of software that had been bought. last edited by Clubby at 14:54:12 16/Jun/09 |
|||||||
| #3 02:54pm 16/06/09 |
|
|||||||
|
thermite
Posts: 1767
Location: Brisbane, Queensland
|
You can find out which files are loaded by the browser. But this won't be good enough unless you also click every single possible thing, and if there is any scripting/programming going on in some of these files, there could easily be other dependencies you won't find unless you carefully investigate.
|
|||||||
| #4 02:54pm 16/06/09 |
|
|||||||
|
pARODY
Posts: 340
Location: Brisbane, Queensland
|
I use the following site to pull down malicious code from websites and review it safely. It has the ability to read the html/php/javascript and pull down the next referenced file. They released a basic version of it for people to use, maybe you can modify it to just pull down the files from the site you want. http://jsunpack.jeek.org/dec/go But it won't find any server side included files that are not referenced within the code, so it won't grab .htaccess or similar files that are commonly present but not linked to. |
|||||||
| #5 02:56pm 16/06/09 |
|
|||||||
|
Pinky
Posts: 1714
Location: Melbourne, Victoria
|
If the site has been active for a long time, use Webalizer on the HTTP server log files? Should be a pretty accurate representation of what's being used, I would have thought. Not 100% perfect though. |
|||||||
| #6 02:57pm 16/06/09 |
|
|||||||
|
scuzzy
Posts: 13460
Location: Brisbane, Queensland
|
I can't think of anything localy that can bridge the gap between the files referenced in the html output and any serverside includes you might have.
|
|||||||
| #7 02:58pm 16/06/09 |
|
|||||||
|
TicMan
Posts: 4736
Location: Melbourne, Victoria
|
I've gone through a few of the ideas but the biggest problem is the server side includes. I might look at writing some almight Perl to go through each file and if it finds a reference to an include, img src, etc then to record it so I know what to keep (or vice versa).
Was hoping someone else had done it so I can copy their work, pretend to spend a week doing it (while maintaining my active QGL attendance) and become a hero at work. |
|||||||
| #8 03:00pm 16/06/09 |
|
|||||||
|
Pinky
Posts: 1715
Location: Melbourne, Victoria
|
Couldn't you just "grep -i include" to do that? |
|||||||
| #9 03:03pm 16/06/09 |
|
|||||||
|
thermite
Posts: 1768
Location: Brisbane, Queensland
|
what would that achieve
|
|||||||
| #10 03:06pm 16/06/09 |
|
|||||||
|
TicMan
Posts: 4737
Location: Melbourne, Victoria
|
f*** pinky you just don't get it.. we need to make things that are very simple (grep -ir *include*) sound overly complex so we get more time to do it that we spend playing games or browsing the web.
Don't you know how IT works?!? |
|||||||
| #11 03:07pm 16/06/09 |
|
|||||||
|
thermite
Posts: 1769
Location: Brisbane, Queensland
|
There are many ways to load dependencies other than typing the word include
|
|||||||
| #12 03:08pm 16/06/09 |
|
|||||||
|
Clubby
Posts: 114
Location: Brisbane, Queensland
|
Usually that's what the approval process caters for Ticman :) ...
* Peer review what you are going to do * Document all the risks to the business * Change control person approve * Business unit that owns the system approve it * Manager etc ... |
|||||||
| #13 03:18pm 16/06/09 |
|
|||||||
|
tequila
Posts: 2458
Location: Brisbane, Queensland
|
brett@probation:/unix/logs$ cat unix.org.au-access_log | awk {'print $7'} | sort | uniq (for apache logs) |
|||||||
| #14 03:27pm 16/06/09 |
|
|||||||
|
TicMan
Posts: 4738
Location: Melbourne, Victoria
|
Usually that's what the approval process caters for Ticman :) ... Approval processes? Takes too long for one of them - just do it on the fly, living by the seat of my pants! |
|||||||
| #15 03:36pm 16/06/09 |
|
|||||||
|
mooby
Posts: 4882
Location: Brisbane, Queensland
|
what lang? redgate have alot of dot net tools that will analyze code and find redundant code. might be a good start?
|
|||||||
| #16 09:45pm 16/06/09 |
|
|||||||
|
system
|
--
|
|||||||
| #16 |
|
|||||||
|
| ||||||||