Cewl -- Building Wordlists

    I have a wordlist I created from a collection of wordlists I've acquired.  It's not the end-all, be-all wordlist but it's a big and if you have a weak password, it's in this list.  In fact, if you have a decent password, it's in the list.

    It's a good list for banging against passwords to see if they are reasonably secure.  When I attempt to crack a passwords, I go to the top 10 most used, the top 500 most used, and then 'the' list.  Beyond that, I'm probably going to stop unless I have a different motivation.

    Recently, I attempted to crack a password, went to the big list but came up short.  Password cracking is an art, generating wordlists is an art, and if you put compare it to the world of medicine, this is an area where I'm a general practitioner.  That said, I didn't want to give up without a solid effort made.

    Rather than take a general approach, I got specific -- I used cewl.  To paraphrase -- Cewl crawls a site based on on parameters and compiles a wordlist.  

    Let's take this site for example:

    cewl -w sevenlayers.txt -d 3 -m 6 https://www.sevenlayers.com

    -
    w output file:  sevenlayers.txt
    -d depth (how far you want to go into the site):  three links deep
    -m minimum word length:  six characters
    url

    Cewl does its thing and spits out the list.  I run the following command to see how many words are in my wordlist.

    wc -l sevenlayers.txt
    1232 words

    I've got bigger plans for this file so I'm going to manipulate it a bit more.  Let's put everything in lowercase:

    tr A-Z a-z < sevenlayers.txt >sevenlayers1.txt

    Let's sort the entries and remove duplicates:

    sort -u sevenlayers1.txt > sevenlayers2.txt

    Another word count to see the difference:

    wc -l sevenlayers2.txt
    1107 words

    I noticed some junk in the file, mostly double words and some other foreign entries.  Most of the junk pushes the character length and I want to remove any entry longer than 12 characters:

    sed '/^.\{12\}./d' sevenlayers2.txt > sevenlayers3.txt

    wc -l sevenlayers3.txt
    1021 words

    As it stands right now, we have a list of words that can be found in the dictionary -- no password complexity whatsoever.  How about a little 133t speak?  Replace all of the a's with @'s:

    sed -i~ 's/a/@/g' sevenlayers3.txt

    ( The ~ creates a backup file:  sevenlayers3.txt~ )

    We can also convert all of the e's to 3's:

    sed -i~ 's/e/3/g' sevenlayers3.txt

    Taking a look at some of the words in our list:

    midstr3@m
    migr@t3d

    migr@ting
    migr@tions
    minut3
    minut3s
    mism@tch
    mism@tch3
    mism@tch3d

    Rethinking this a bit more, I should remove anything smaller than 8 characters:

    sed -r '/^.{,8}$/d' sevenlayers3.txt > sevenlayers4.txt

    Checking our word count:

    wc -l sevenlayers4.txt
    320 sevenlayers4.txt

    Let's take a final look at some of the words in our list:

    coll3ction
    combin@tion
    comp@ni3s
    comp@tibl3
    compl3t3d
    complic@t3d
    compromis3
    compromis3d
    compulsiv3

    Not bad.  

    You can take this a step further by morphing step by step, creating the list with just @'s for a's, then create another with 3's for e's and so on.  With the different lists, you can create a bigger list.  The following will sort a pair of files, create a unique list and spit the output to a new file:

    sort -u file1.txt file2.txt > file3.txt

    The possibilities are endless as are the ways of accomplishing this task.


    © 2020 sevenlayers.com