Log Parser

I was talking with a guy the other day and he said something along the lines of -- "Sometimes there are bad things that happen on the Internet."  I replied:  "There are bad things happening on the Internet ALL THE TIME."

Fast forward to today -- I'm working on a project and I need to parse through the Apache access.log file, create a unique list of IP addresses, perform an nslookup on each of the IP addresses, ignore the addresses that do not resolve, and I need to spit out the list of addresses that resolve. 

Not that I'm shocked but while writing this up and using cat to show the first part of the log for this screenshot, I see mostly malicious traffic hitting this server:




/manager/html -- that's Tomcat, not something on this server.
/xmlrpc.php & /blog/xmlrpc.php -- that's WordPress and this is NOT a WordPress site.
/wp-login.php -- also WordPress.

As I was saying previously... ALL THE TIME!

I've stated this a million times, don't judge me on my ability to write elegant code because my code is functional, not elegant.  


#!/bin/bash
awk '{print $1}' /var/log/apache2/access.log | sort -u > ./visitors.ip
input="./visitors.ip"
while IFS= read -r var
do
echo "Resolving $var"
nslookup "$var" >> ./resolved.txt
done < "$input"
grep -v "server can't find" ./resolved.txt | tr -s '\n' '\n' | grep -v "Authoritative answers can be found from:" | sort -u





Using Awk, we're grabbing the first field, the IP address, in the access.log file and we're outputting that data into the file visitors.ip
The we're reading visitors.ip, we're performing a lookup on each IP using nslookup, and we're directing that output to the file resolved.txt
From there, we're going to use grep to ignore lines that contain text not pertinent to what we want and we're going to remove empty lines.

When we run our script, our output giving us progress on which IP it's resolving:




In the second half the script, we see our grep output to the screen.  We could also redirect this to an output file as well to wrap it up in a nice little package but I chose to output it to the screen because I'm going to massage this a bit more for my specific purpose.





Not that a Web Application Firewall (WAF) is the end all be all but have I mentioned the importance of using a WAF??? 

I'm not a CloudFlare user but they offer a free version for "individuals" and the pay version for their "Pro" account is $20/month.  Factor in the cost of a pay SSL certificate and it's about half the annual for their Pro account -- they include the SSL certificate into their $20/month fee.  Seems like a no-brainer.  Of course we're talking about Let's Encrypt certs which are free anyway and now there are a lot of tangents in front of me so I'm going to end on this -- use a WAF.