Python Script: PDF Extract

    While playing around with a couple of other scripts, I got this idea that I wanted to incorporate extracting data from PDFs.  Nothing fancy here, just a recursive search for PDFs, we're extracting the text, and we're writing it out to a text file:  output.txt

    #!/usr/bin/python3
    import glob
    import PyPDF2
    folder_path = './'
    for filename in glob.iglob(folder_path + '**/*.pdf', recursive=True):
        file = open(filename, 'rb')
        pdfReader = PyPDF2.PdfFileReader(file, strict=False)
        pageObj = pdfReader.getPage(0)
        f1=open('./output.txt', 'a+')
        f1.write(pageObj.extractText())
        f1.close()


    © 2020 sevenlayers.com