Export HTML to PDF, how hard can it be?

Uncertainty: how do you export an interactive, responsive SPA designed for the web into a printed PDF document?

Challenges of the SPA (single-page application, or web app)

  • There are no page breaks on the web
  • The web app adapts to screen sizes on the fly
  • The web app has elements that are only shown in popups
  • Images rendered into <canvas> or <svg> throw cross-domain errors

Needs of a PDF/printed document:

  • Clients have different page sizes in different countries
  • Document needs clean page breaks, no text cut in half
  • Document needs header and footer with ref. # on each page
  • Need to show images and text exactly as in web app
  • Text should be searchable, not rasterized

Developer needs

  • Server-side solutions must support Node.js (no PHP etc.)
  • Ideally needs to be generated server-side, without a browser
  • Ideally a clean file in case it needs to be imported and parsed
  • Small file size for sending in email

Three types of modules/plugins

  1. Purely JavaScript (client-side generates PDF)
  2. Node.js (server generates PDF)
  3. Raster (turn HTML page into an image saved as PDF)

Client-side

jsPDF

Client-side JavaScript PDF generation for everyone

Conclusion: Exports raw HTML but currently without CSS support.

https://github.com/MrRio/jsPDF
https://parall.ax/products/jspdf


PDFKit

A JavaScript PDF generation library for Node and the browser

Conclusion: Impressive but uses its own scripting to generate content. Does not convert existing HTML into PDF.

http://pdfkit.org
https://github.com/devongovett/pdfkit/


pdfmake

Client/server side PDF printing in pure JavaScript. Based on pdfkit, build PDF using a JSON format

Conclusion: Extends pdfkit, uses its own scripting commands to generate PDF from data. Does not convert HTML into PDF.

https://github.com/bpampuch/pdfmake
http://pdfmake.org/playground.html


DOM-to-image

Conclusion: Very close to satisfactory, can save page as an SVG using <foreignObject>, but this of course doesn’t work in IE, and will never be supported in IE.

https://www.npmjs.com/package/dom-to-image


Sever-side with Node.js

node-html-pdf

https://github.com/marcbachmann/node-html-pdf

Conclusion: Similar to others, use JSON scripting to generate PDF. You must generate inline CSS yourself if you want to use it. Not as good as other solutions.


module.exports = {  
    header: {
        height: '3cm', contents: function (page) {
            return '<header class="pdf-header" style=" overflow:hidden; font-size: 10px; padding: 10px; margin: 0 -15px; color: #fff; background: none repeat scroll 0 0 #00396f;"><img style="float: left;" alt="" src="../images/logo.jpg"><p> XYZ </p></header>'
        }
    },

    footer: {
        height: '3cm', contents: function (page) {
            return '<footer class="pdf-footer" style="font-size: 10px; font-weight: bold; color: #000;><p style="margin: 0">Powered by XYZ</p></footer>'
        }
    },

}
phantom-html2pdf

Node module to generate PDFs from HTML via PhantomJS

Looks promising, appears to support CSS. Need to test.

https://github.com/bauhausjs/phantom-html2pdf/blob/master/FAQ.md
https://github.com/bauhausjs/phantom-html2pdf/issues/24
https://medium.com/@stockholmux/besting-phantomjs-font-problems-ee22795f5c0b#.jk6ur5grq

Conclusion: Looks like a good solution. Node process converts HTML with styles into PDF, supports headers and footers.


wkhtmltopdf

Node wrapper of CLI WebKit HTML to PDF
http://wkhtmltopdf.org
https://github.com/devongovett/node-wkhtmltopdf

Conclusion: Node process converts HTML file to PDFs using WebKit, but what about page breaks, header, footer?


Raster (convert to image)

html2canvas

This script allows you to take "screenshots" of webpages or parts of it, directly on the users browser.

Conclusion: Converts HTML page exactly as you see it to a raster image, but the text is blurry and not selectable. Makes large files. Might be good for portions of an HTML file, like the images, but not good for text.

http://html2canvas.hertzen.com


RasterizeHTML.js

Draw a page/an HTML string/a Document to the canvas.

https://cburgmer.github.io/rasterizeHTML.js/

Conclusion: Like html2canvas, turns HTML into an image. Text not selectable. IE up to version 11 does not honour <foreignObject> and is unsupported.