本文的中文版本参见 这里

Over the past few years, I have been using some dedicated note-taking software to take my notes. But all these tools I have tried are either slow or do not work well when I want to search some notes. Finally, I decided to take my notes in Markdown and convert them to PDF for reading.

Taking your notes in Markdown format has several advantages:

  • You can edit the Markdown files with your favorite editor, for example, Sublime Text, which means more efficient editing and pleasant writing experience.

  • Since a Markdown file is a textual file, you can search it using your favorite search tool such as grep or ripgrep.

  • You can covert the Markdown files to various formats such as PDF, HTML, epub, mobi etc., for better reading experience, with the help of Pandoc.

  • Your notes are all text files and are small in size, which means easier and faster syncing or backup between your native PC and the cloud service you use.

In this post, I would like to share how to generate beautiful PDF files from Markdown and give solutions to some of the issues I have encountered in the process.

Prerequisite

You need to first make sure that you have installed the following tools in order to proceed:

  • First, you need to install Pandoc. After install, you should add the path of the Pandoc executable file to the system PATH variable.

  • TeX distribution. Please make sure that TeX has been installed on your system. You can use either TeX Live or MiKTeX or MacTeX base on your platform. You may need to set up the PATH variable1.

  • A powerful text editor. One of my favorite is Sublime Text. You can also choose to use VS Code or even Neovim.

Generating PDF from Markdown with Pandoc

Background

For those who are not familiar with Pandoc, Pandoc is a powerful tool which can convert between different file formats. It is called the swiss knife for converting format. There are actually two steps involved in converting Markdown files to PDF: 1. Markdown files are converted to LaTeX source files. 2. Pandoc invokes the pdflatex, xelatex or other TeX command and converts .tex source file to the final PDF file.

Because I often use non-ASCII characters in my files and my Markdown files use quotation, table and other complex format, I have met a few problems during the conversion process. In the following text, I will introduce how to solve these issues.

How to Handle Languages other than English

By default, Pandoc uses pdflatex command to generate PDF files, which can not handle Unicode characters well. You will encounter errors when you try to convert Markdown files containing Unicode characters to PDF files.

In order to handle Unicode characters, we need to use xelatex command instead. For the CJK languages, you need to use CJKmainfont option to give the proper font which supports the language you are using2. In this post, I will use the Chinese language as an example.

On Windows systems, for Pandoc version above 2.0, you can use the following command to generate the PDF file:

pandoc --pdf-engine=xelatex -V CJKmainfont="KaiTi" test.md -o test.pdf

In the above command, KaiTi is the name of a font which supports the Chinese characters. How do we find a font which support a particular language? First, you need to know the language code for the language you are using. For example, the language code for Chinese is zh. Then, use the fc-list command to look up the fonts which support this language3:

fc-list :lang=zh

The output of command is like the following:

The font name is the string after the font location. Since the font names may contain spaces, you need to quote the font name when you want to use a particular font, e.g., -V CJKmainfont="Source Han Serif CN".

In Pandoc version 2.0, --pdf-engine option replaces the old --latex-engine option. On Linux systems where the Pandoc version may be old, the above command will not work. You need to use the following command instead4:

pandoc --latex-engine=xelatex -V mainfont='WenQuanYi Micro Hei' test.md -o
test.pdf

On Linux systems, the way to find the font supporting your language is the same as Windows system.

Issues and techniques

Block quote, table and list are not correctly rendered

The reason is that Pandoc requires that you leave an empty line before block quote, list and table environment. If the lines in the block quote are not correctly broken, i.e., all the lines are merged as one paragraph, you can add a space after each line to solve this issue.

Add highlight to block code

Pandoc supports block code syntax highlighting for many languages and offers several highlight themes. To list the highlight themes that Pandoc provides, use the following command:

pandoc --list-highlight-styles

To list all the languages that Pandoc supports, use the following command:

pandoc --list-highlight-languages

To use syntax highlighting for different languages, you need to specify the language in the block quote and use --highlight-style, e.g.,:

pandoc --pdf-engine=xelatex --highlight-style zenburn test.md -o test.pdf

In the above command, we use the zenburn theme, I also recommend using the tango or breezedark theme.

Use numbered section and add TOC

By default, there is no table of contents (TOC) in the generated PDF and no numbers in the headers5. To add TOC, use the --toc option; to add section numbers, use the -N option. A complete example is as follows:

pandoc --pdf-engine=xelatex --toc -N -o test.pdf test.md

According to the Pandoc user guide, we can add colors to different links via the colorlinks option to separate the links from the normal texts:

colorlinks add color to link text; automatically enabled if any of linkcolor, filecolor, citecolor, urlcolor, or toccolor are set

To customize the color of different types of links, Pandoc offers different options:

linkcolor, filecolor, citecolor, urlcolor, toccolor color for internal links, external links, citation links, linked URLs, and links in table of contents, respectively: uses options allowed by xcolor, including the dvipsnames, svgnames, and x11names lists

For example, to set the URL color to NavyBlue and set the TOC color to Red, we can use the following command:

pandoc --pdf-engine=xelatex -V colorlinks -V urlcolor=NavyBlue -V toccolor=Red test.md -o test.pdf

Note that the urlcolor option will not color the raw URL links in the text. To color those raw links, you can enclose those links with <>, e.g., <www.google.com>.

Change the PDF margin

The default margin for the generated PDF is too large. According to the Pandoc FAQ, you can use the following option to change the margin:

-V geometry:"top=2cm, bottom=1.5cm, left=2cm, right=2cm"

The complete command is:

pandoc --pdf-engine=xelatex -V geometry:"top=2cm, bottom=1.5cm, left=2cm,
right=2cm" -o test.pdf test.md

Error when using backslash inside Markdown

In ordinary Markdown format, it is fine to use backslash characters inside the files. But Pandoc interpret the backslash and string after it as LaTeX command by default. As a result, you may encounter weired errors when trying to compile Markdown files containing backslash characters. Based on discussions here and here, the solution is to make Pandoc treat the Markdown file as normal Markdown files and not interpret the LaTeX command. You need to use the following flag:

pandoc -f markdown-raw_tex

Or you can use two backslash to represent a literal backslash, e.g., \\sometxt. If you want to express a LaTeX command, enclose the command with inline code block, like this: \textt{}.

Add background color to inline code

In translating Markdown source file to TeX files, Pandoc use the \texttt command to represent the inline code. So inline code has no background color in the generated PDF files. To increase the readability of inline code, we can modify the \texttt command to add background color to text.

First, we need to create a file named head.tex and add the following settings to it:

% change background color for inline code in
% markdown files. The following code does not work well for
% long text as the text will exceed the page boundary
\definecolor{bgcolor}{HTML}{E0E0E0}
\let\oldtexttt\texttt

\renewcommand{\texttt}[1]{
  \colorbox{bgcolor}{\oldtexttt{#1}}
  }

When converting Markdown files, use the -H option to refer the head.tex file, e.g.,:

pandoc --pdf-engine=xelatex -H head.tex test.md -o test.pdf

In the generated PDF, the inline code will have grey background color. You can change the background color as you wish.

Put the settings to head.tex

You may have noticed the clumsiness if you try to customize a lot of settings. When converting Markdown to PDF, we often need to use several settings. If you specify all these options on the command line, it would be time consuming and cumbersome to edit. A good way is ease the issue is to put some command settings to head.tex file and refer to this file during Markdown file conversion.

For example, we can put the settings related to margin, inline code highlighting, and link color to head.tex:

\usepackage{fancyvrb,newverbs}
\usepackage[top=2cm, bottom=1.5cm, left=2cm, right=2cm]{geometry}

% change background color for inline code in
% markdown files. The following code does not work well for
% long text as the text will exceed the page boundary
\definecolor{bgcolor}{HTML}{E0E0E0}
\let\oldtexttt\texttt

\renewcommand{\texttt}[1]{
\colorbox{bgcolor}{\oldtexttt{#1}}
}

%% color and other settings for hyperref package
\hypersetup{
    bookmarksopen=true,
    linkcolor=blue,
    filecolor=magenta,
    urlcolor=RoyalBlue,
}

Nested list level exceed the limit

One reader Karl Liu mentioned that if the nested list level exceeds 6, you will encounter the following error when trying to generate PDF file:

! LaTeX Error: Too deeply nested.

More detailed discussions can be found here. The solution proposed is to add the following settings in head.tex:

\usepackage{enumitem}
\setlistdepth{9}

\setlist[itemize,1]{label=$\bullet$}
\setlist[itemize,2]{label=$\bullet$}
\setlist[itemize,3]{label=$\bullet$}
\setlist[itemize,4]{label=$\bullet$}
\setlist[itemize,5]{label=$\bullet$}
\setlist[itemize,6]{label=$\bullet$}
\setlist[itemize,7]{label=$\bullet$}
\setlist[itemize,8]{label=$\bullet$}
\setlist[itemize,9]{label=$\bullet$}
\renewlist{itemize}{itemize}{9}

\setlist[enumerate,1]{label=$\arabic*.$}
\setlist[enumerate,2]{label=$\alph*.$}
\setlist[enumerate,3]{label=$\roman*.$}
\setlist[enumerate,4]{label=$\arabic*.$}
\setlist[enumerate,5]{label=$\alpha*$}
\setlist[enumerate,6]{label=$\roman*.$}
\setlist[enumerate,7]{label=$\arabic*.$}
\setlist[enumerate,8]{label=$\alph*.$}
\setlist[enumerate,9]{label=$\roman*.$}
\renewlist{enumerate}{enumerate}{9}

Add the -H head.tex option when compiling PDF files.

Add anchors in Markdown

I try to use anchors in Markdown following the discussion here. Unfortunately, in the generated PDF, the anchor does not work: when I click the linking text, there is no jump to the destination page.

Instead, we should use the attribute to give an id to the location we want to jump to and then refer to it in other places using the id. Here is an example:

## head 2 {#my_head2}

Please click [here](#my_head2) to go to head 2.

How to resize image

We can also resize images using the attribute. You can specify width or height in absolute pixel values or as percentage relative to the page or column width. For example:

you can use absolute pixel values

![test image](test.jpg){width=128px}

or you can use relative value to the page or column width

![test image](test.jpg){width=90%}

How to start a new page for each section

By default, when you generate PDF from Markdown files, each section started by the the level 1 header do not start from the new page: it will continue from where the last section ends. If you want to start a new page when a new section starts, you need to add the following settings to head.tex according to this:

\usepackage{titlesec}
\newcommand{\sectionbreak}{\clearpage}

But when I tried to produce PDF with the updated head.tex files, I got an error:

! Argument of \paragraph has an extra }.
<inserted text>
                \par
l.1290 \ttl@extract\paragraph

pandoc: Error producing PDF

According to discussions here, it is because Pandoc’s default LaTeX redefines the \pragraph command and we have to disable this behaviour. We need to use -V subparagraph when invoking the pandoc command:

pandoc -V subparagraph -o file.pdf file.md

Start a new page only after TOC

What if we only want to add a new page after the table of contents page? An easy way is to hack the \tableofcontents command. Add the following command to head.tex to redefine \tableofcontents command:

\let\oldtoc\tableofcontents
\renewcommand{\tableofcontents}{\oldtoc\newpage}

In the above command, we first save the old command and then redefine it to avoid recursive calls.

Line breaks

In Markdown, you can create a hard linebreak by appending two spaces after a line:

hello<space><space>
world

Using space at the line end for formating is annoying since it cause the trailing whitespace warning. The space characters are also not visible.

Pandoc also provides an escaped_line_breaks extension. You can use \ in the end of a line followed by newline character to represent a hard line break:

hello\
world

Images references

Pandoc supports LaTeX command inside Markdown, to refer to an image, you can use the LaTeX syntax:

![my great image\label{fig-my-great-img}](image_great.jpg)

In Fig.\ref{fig-my-great-img}, I show a great image.

Generate PDF using Sublime Text build system

It is cumbersome to switch to the terminal and use Pandoc to generate the PDF files and preview it after finishing writing the Markdown files. To simply the process, I use the Sublime Text build system for building PDF file and previewing. I use the light-weight Sumatra PDF reader for PDF previewing.

An example build system is shown below:

{
    "shell_cmd": "pandoc --pdf-engine=xelatex --highlight-style=zenburn -V colorlinks -V CJKmainfont=KaiTi \"${file}\" -o \"${file_path}/${file_base_name}.pdf\" ",
    "file_regex": "^(..[^:]*):([0-9]+):?([0-9]+)?:? (.*)$",
    "working_dir": "${file_path}",
    "selector": "text.html.markdown",

    "variants":
    [
        {
            "name": "Convert to PDF and Preview",
            "shell_cmd": "pandoc --pdf-engine=xelatex --highlight-style=zenburn -V colorlinks  -V CJKmainfont=KaiTi \"${file}\" -o \"${file_path}/${file_base_name}.pdf\"  &&SumatraPDF \"${file_path}/${file_base_name}.pdf\" ",
            // "shell_cmd":   "start \"$file_base_name\" call $file_base_name"
        }
    ]
}

You can download the build system and head.tex file here.

Pandoc is not recognized on Windows systems

For some reasons unknown to me, when using the above build systems to compile Markdown files, I encountered the following errors:

‘pandoc’ is not recognized as an internal or external command, operable program or batch file.

After looking up the Sublime Text documentation, I find that we can add path in the build system. So I adjust the above build system:

{
    "shell_cmd": "pandoc --pdf-engine=xelatex --highlight-style=zenburn -V colorlinks -V CJKmainfont=\"Source Han Serif SC\" \"${file}\" -o \"${file_path}/${file_base_name}.pdf\" ",
    "path": "C:/Users/east/AppData/Local/Pandoc/;%PATH%",
    "file_regex": "^(..[^:]*):([0-9]+):?([0-9]+)?:? (.*)$",
    "working_dir": "${file_path}",
    "selector": "text.html.markdown",

    "variants":
    [
        {
            "name": "Convert to PDF and Preview",
            "shell_cmd": "pandoc --pdf-engine=xelatex --highlight-style=zenburn -V colorlinks  -V CJKmainfont=\"Source Han Serif SC\" \"${file}\" -o \"${file_path}/${file_base_name}.pdf\"  &&SumatraPDF \"${file_path}/${file_base_name}.pdf\" ",
            "path": "C:/Users/east/AppData/Local/Pandoc/;%PATH%",
            // "shell_cmd":   "start \"$file_base_name\" call $file_base_name"
        }
    ]
}

After that, everything goes well.

Conclusion

In this post, I give a complete summary on how to generate beautiful PDF files from Markdown. I also share several solutions to the issues I have encountered. I hope that you can now generate beautiful PDF from Markdown files.

References


  1. Make sure that you can use latex command on the command line.
  2. For other languages, you need to use --mainfont option.
  3. For Windows system, you can use fc-list command after installing the TeX Live full edition. For Linux systems, this command is usually pre-installed.
  4. Tested on Pandoc version 1.12.3.1.
  5. Only the font size varies for different header levels.