Converting directory of ipynb to pdfs
Sun 02 February 2020 by Dr. Dirk ColbryI wanted to share the content for one of my courses with some other instructors but it turns out not everyone uses Jupyter notebooks.
Option 1: I decided to convert all of the notebooks to pdfs using nbconvert. However, nbconvert
isn't working on my MacBook. I think I have too many versions of latex and python installed and I can't quickly find the right magic to get the pdf converter working.
Option 2: However, nbconvert
works great if I convert to html. So then I had a thought to first convert to html and then print to pdf. However this option requires me to open and save each html file individually.
Option 3: I discovered there are a bunch of tools that convert html to pdf. Most of them require similar installation I was having trouble with in Option 1. However, Chrome allows me to run in "Headless" mode which may make it easy to automate.
My final solution has the following steps. Note, this only works on MacOS but conceptually could be converted to work on any system:
- Use
nbconvert
to convert each ipynb in the current directory to html. - Use
Chrome
headless mode to convert html files to pdfs. - Use the
join
command to combine all of the pdfs into one file.
Here is a bash script that seems to work:
#!/bin/bash
mkdir -p HTML
jupyter nbconvert --to html --no-prompt --allow-errors --output-dir HTML *.ipynb
mkdir -p PDF
for file in ./html/*.html
do
filename=$(basename -- "$file")
extension="${filename##*.}"
filename="${filename%.*}"
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless \
--virtual-time-budget=10000 \
--crash-dumps-dir=./html/ \
--disable-gpu \
--print-to-pdf=./PDF/${filename}.pdf \
--no-margins \
file://$PWD/$file
done
"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" -o CMSE314.pdf ./PDF/*.pdf
open CMSE314.pdf
Once I got the bash script working I decided a Makefile would be better
IPYNB_FILES = $(wildcard *.ipynb)
PDF_FILES = $(patsubst %.ipynb, PDF/%.pdf, $(IPYNB_FILES))
all: CMSE314.pdf
open CMSE314.pdf
HTML/%.html: %.ipynb
@mkdir -p "$(@D)"
jupyter nbconvert --to html --no-prompt --allow-errors --output-dir HTML "$<"
PDF/%.pdf: HTML/%.html
@mkdir -p "$(@D)"
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --headless --crash-dumps-dir=$(@D) --disable-gpu --print-to-pdf file://$(PWD)/$<
mv output.pdf $@
CMSE314.pdf: $(PDF_FILES)
"/System/Library/Automator/Combine PDF Pages.action/Contents/Resources/join.py" -o CMSE314.pdf $(PDF_FILES)
clean:
rm -rf HTML
rm -rf PDF
Both seem to work and get the job done. The output for the script is a little easier to read but the makefile is nice because it can be restarted (although It can't be run with the -j parallel option because of the output.pdf temporary file).
Anyway, let me know if you find this helpful. I am sure there are better ways but I needed something quick.