Image that you just download lots of articles from your favorite Data Science Blog in *.html file, and you want to combine them into a PDF file.
Maybe you would try to copy all the texts and paste it to Microsoft Word 2016 and save it as .PDF.
That’s not a bad idea right?
Far from it.
If you have 100+ articles? I think you guys don’t want to spend so much time pasting the texts, it’s boring.
So can I shorten the time?
Yes, you can.
We use glob, os and pdfkit these three module to get boring jobs done.
The Logic of the program：
Step1. Import the packages
Step2. Setting the path of wkhtmltopdf.exe so that Python can use it
Step3. create a new list
Step4. Using for loop to search all the *.html files in current directory with "glob" and "os"module
Step5. append the *.html file names to the list
Step6. Output the pdf file with the list.
Step7. Start reading your favorite blog’s articles
import pdfkit #pip install pdfkitpath_wkhtmltopdf = r"C:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe"config = pdfkit.configuration(wkhtmltopdf=path_wkhtmltopdf)newList=for filename in glob.iglob(os.path.join('*.html')):
pdfkit.from_file(newList, 'out.pdf', configuration=config)