Cewl -- Building Wordlists

by Vince
in Blog
Hits: 3994

I have a wordlist I created from a collection of wordlists I've acquired.  It's not the end-all, be-all wordlist but it's a big and if you have a weak password, it's in this list.  In fact, if you have a decent password, it's in the list.

It's a good list for banging against passwords to see if they are reasonably secure.  When I attempt to crack a passwords, I go to the top 10 most used, the top 500 most used, and then 'the' list.  Beyond that, I'm probably going to stop unless I have a different motivation.

Recently, I attempted to crack a password, went to the big list but came up short.  Password cracking is an art, generating wordlists is an art, and if you put compare it to the world of medicine, this is an area where I'm a general practitioner.  That said, I didn't want to give up without a solid effort made.

Rather than take a general approach, I got specific -- I used cewl.  To paraphrase -- Cewl crawls a site based on on parameters and compiles a wordlist.  

Let's take this site for example:

cewl -w sevenlayers.txt -d 3 -m 6 https://www.sevenlayers.com

-
w output file:  sevenlayers.txt
-d depth (how far you want to go into the site):  three links deep
-m minimum word length:  six characters
url

Cewl does its thing and spits out the list.  I run the following command to see how many words are in my wordlist.

wc -l sevenlayers.txt
1232 words

I've got bigger plans for this file so I'm going to manipulate it a bit more.  Let's put everything in lowercase:

tr A-Z a-z < sevenlayers.txt >sevenlayers1.txt

Let's sort the entries and remove duplicates:

sort -u sevenlayers1.txt > sevenlayers2.txt

Another word count to see the difference:

wc -l sevenlayers2.txt
1107 words

I noticed some junk in the file, mostly double words and some other foreign entries.  Most of the junk pushes the character length and I want to remove any entry longer than 12 characters:

sed '/^.\{12\}./d' sevenlayers2.txt > sevenlayers3.txt

wc -l sevenlayers3.txt
1021 words

As it stands right now, we have a list of words that can be found in the dictionary -- no password complexity whatsoever.  How about a little 133t speak?  Replace all of the a's with @'s:

sed -i~ 's/a/@/g' sevenlayers3.txt

( The ~ creates a backup file:  sevenlayers3.txt~ )

We can also convert all of the e's to 3's:

sed -i~ 's/e/3/g' sevenlayers3.txt

Taking a look at some of the words in our list:

midstr3@m
migr@t3d

migr@ting
migr@tions
minut3
minut3s
mism@tch
mism@tch3
mism@tch3d

Rethinking this a bit more, I should remove anything smaller than 8 characters:

sed -r '/^.{,8}$/d' sevenlayers3.txt > sevenlayers4.txt

Checking our word count:

wc -l sevenlayers4.txt
320 sevenlayers4.txt

Let's take a final look at some of the words in our list:

coll3ction
combin@tion
comp@ni3s
comp@tibl3
compl3t3d
complic@t3d
compromis3
compromis3d
compulsiv3

Not bad.  

You can take this a step further by morphing step by step, creating the list with just @'s for a's, then create another with 3's for e's and so on.  With the different lists, you can create a bigger list.  The following will sort a pair of files, create a unique list and spit the output to a new file:

sort -u file1.txt file2.txt > file3.txt

The possibilities are endless as are the ways of accomplishing this task.