377 lines
12 KiB
Markdown
377 lines
12 KiB
Markdown
|
---
|
|||
|
language: en
|
|||
|
highlight-style: native
|
|||
|
---
|
|||
|
|
|||
|
# Formátítko 2.0
|
|||
|
A python program based on [pandoc](https://pandoc.org/) and its python library
|
|||
|
[panflute](http://scorreia.com/software/panflute) for converting from markdown
|
|||
|
to TeX and HTML with added fancy features like image processing, python-based
|
|||
|
macros and much more.
|
|||
|
|
|||
|
## Requirements
|
|||
|
This project requires `panflute 2.3.0` that itself requires `pandoc 3.0`. If the
|
|||
|
version of `pandoc` doesn't match, very weird things can happen. ImageMagick and
|
|||
|
Inkscape are used for image processing. Nodejs is used for KaTeX.
|
|||
|
|
|||
|
## Usage
|
|||
|
```
|
|||
|
usage: formatitko.py [-h] [-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]] [-p IMG_PUBLIC_DIR] [-i IMG_WEB_PATH] [-w OUTPUT_HTML] [-t OUTPUT_TEX] input_filename
|
|||
|
|
|||
|
positional arguments:
|
|||
|
input_filename The markdown file to process.
|
|||
|
|
|||
|
options:
|
|||
|
-h, --help show this help message and exit
|
|||
|
-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...], --img-lookup-dirs IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]
|
|||
|
Image lookup directories. When processing images, the program will try to find the image in them first. Always looks for images in the same folder as the markdown
|
|||
|
file. (default: [])
|
|||
|
-p IMG_PUBLIC_DIR, --img-public-dir IMG_PUBLIC_DIR
|
|||
|
Directory to put processed images into. The program will not overwrite existing images. (default: public)
|
|||
|
-i IMG_WEB_PATH, --img-web-path IMG_WEB_PATH
|
|||
|
Path where the processed images are available on the website. (default: /)
|
|||
|
-w OUTPUT_HTML, --output-html OUTPUT_HTML
|
|||
|
The HTML file (for Web) to write into. (default: output.html)
|
|||
|
-t OUTPUT_TEX, --output-tex OUTPUT_TEX
|
|||
|
The TEX file to write into. (default: output.tex)
|
|||
|
```
|
|||
|
|
|||
|
## Format
|
|||
|
Formátítko uses all the default pandoc markdown extensions except for
|
|||
|
definition lists and citations. It also adds its own custom features.
|
|||
|
|
|||
|
## Features
|
|||
|
|
|||
|
### Hiding and showing elements based on flags
|
|||
|
|
|||
|
Flags can be set in the Front Matter or with python code. Then, elements with
|
|||
|
the `if` attribute will only be shown if the flag is set to True and elements
|
|||
|
with the `ifn` attribute will only be show if the flag is not set to True.
|
|||
|
|
|||
|
**Example:**
|
|||
|
|
|||
|
```markdown {.group}
|
|||
|
---
|
|||
|
flags:
|
|||
|
foo: true
|
|||
|
---
|
|||
|
[This will be shown]{if=foo}
|
|||
|
|
|||
|
[This will not be shown]{if=bar}
|
|||
|
|
|||
|
[This will be shown]{ifn=bar}
|
|||
|
```
|
|||
|
|
|||
|
### Including other files
|
|||
|
|
|||
|
There are two ways of including files.
|
|||
|
|
|||
|
#### Importing
|
|||
|
The first is importing, which only takes the state (defined commands, metadata,
|
|||
|
etc.) from the file and any content is omitted. This is useful for creating
|
|||
|
libraries of commands. The syntax is as follows:
|
|||
|
|
|||
|
[#test/empty.md]{}
|
|||
|
|
|||
|
The curly braces are required for pandoc to parse the import properly and should
|
|||
|
be left empty.
|
|||
|
|
|||
|
#### Partials
|
|||
|
Partials are the very opposite of imports, they have their own context, which
|
|||
|
inherits everything from the context they're included in, but gets reset after
|
|||
|
the file ends.
|
|||
|
|
|||
|
:::{partial=test/empty.md}
|
|||
|
:::
|
|||
|
|
|||
|
If the `untrusted` attribute is set to True, the partial and all its children
|
|||
|
will not be able to define commands or run inline blocks (but it will be able to
|
|||
|
run commands defined in the parent). ^[Please don't trust this for any security
|
|||
|
though, we're playing with *eval* fire, this will never be secure.]
|
|||
|
|
|||
|
You can also import raw HTML and TeX if you set the `type` attribute of the
|
|||
|
partial to `tex` or `html`.
|
|||
|
|
|||
|
### Groups
|
|||
|
|
|||
|
Groups are pieces of markdown with their own sandboxed context, in other words,
|
|||
|
inline partials. They function exactly the same as partials, namely can have
|
|||
|
their own front matter.
|
|||
|
|
|||
|
```markdown {.group}
|
|||
|
---
|
|||
|
language: cs
|
|||
|
---
|
|||
|
OOOoo český mód
|
|||
|
```
|
|||
|
|
|||
|
If you need to nest groups or have code blocks inside groups, you can increase
|
|||
|
the amount of backticks around the outer block:
|
|||
|
|
|||
|
````markdown {.group}
|
|||
|
```go
|
|||
|
fmt.Pritln("owo")
|
|||
|
```
|
|||
|
````
|
|||
|
|
|||
|
Groups and partials are also enclosed in `\begingroup` and `\endgroup` in the
|
|||
|
output TeX.
|
|||
|
|
|||
|
### Raw HTML and TeX ^[This is a pandoc feature]
|
|||
|
If raw HTML or TeX is included in the markdown file, it will automagically pop
|
|||
|
out into the respective output file.
|
|||
|
|
|||
|
<em style="color: red">red text</em>
|
|||
|
|
|||
|
\vskip1em
|
|||
|
|
|||
|
This has the advantage and disadvantage of being very *"automagic"*, which means
|
|||
|
that for instance markdown inside HTML will still get interpreted as markdown.
|
|||
|
It is however very very unreliable, so in most cases, you should use explicit
|
|||
|
raw blocks with the unnamed attribute set to either `html` or `tex`. ^[Still a
|
|||
|
pandoc feature.]
|
|||
|
|
|||
|
``` {=html}
|
|||
|
<span style="color: red">red text</span>
|
|||
|
```
|
|||
|
|
|||
|
### Running python code
|
|||
|
|
|||
|
Formátítko allows you to run Python code directly from your MD file. Any
|
|||
|
`python` code block with the class `run` will be executed:
|
|||
|
|
|||
|
#### Context
|
|||
|
|
|||
|
You can access the current context using the `ctx` variable. The context
|
|||
|
provides read/write access to the FrontMatter metadata. The context has the
|
|||
|
following methods:
|
|||
|
|
|||
|
`ctx.get_metadata(key: str, simple: bool=True, immediate: bool=False)`
|
|||
|
|
|||
|
- `key`: The key of the metadatum you want to get. Separate child keys with
|
|||
|
dots: `ctx.get_metadata("flags.foo")`
|
|||
|
- `simple`: Whether to use python's simple builtin types or panflute's
|
|||
|
MetaValues. MetaValues can contain formatted text, simple values loose all
|
|||
|
formatting.
|
|||
|
- `immediate`: Only get metadatum from the current context, not from its
|
|||
|
parents.
|
|||
|
|
|||
|
`ctx.set_metadata(key: str, value)`
|
|||
|
|
|||
|
- `key`: The key of the metadatum you want to get. Separate child keys with
|
|||
|
dots: `ctx.get_metadata("flags.foo")`
|
|||
|
- `value`: Any value you want to assign to the metadatum
|
|||
|
|
|||
|
`ctx.unset_metadata(key: str)`
|
|||
|
|
|||
|
Delete the metadatum in the current context and allow it to inherit the value
|
|||
|
from the parent context.
|
|||
|
|
|||
|
- `key`: The key of the metadatum you want to get. Separate child keys with
|
|||
|
dots: `ctx.get_metadata("flags.foo")`
|
|||
|
|
|||
|
Helper functions for flags exist which work the same as for metadata:
|
|||
|
|
|||
|
`ctx.is_flag_set(flag: str) -> bool`
|
|||
|
|
|||
|
`ctx.set_flag(flag: str, val: bool)`
|
|||
|
|
|||
|
`ctx.unset_flag(flag: str)`
|
|||
|
|
|||
|
#### Writing output
|
|||
|
|
|||
|
There are two modes of writing output, plaintext and element-based.
|
|||
|
|
|||
|
Plaintext mode uses the `print(text: str)` and `println(text: str)` functions,
|
|||
|
that append text to a buffer which is then interpreted as markdown input.
|
|||
|
|
|||
|
Element-based mode uses the `appendChild(element: pf.Element)` and
|
|||
|
`appendChildren(*elements: List[pf.Element])` functions which allow you to
|
|||
|
append `panflute` elements to a list which is then again interpreted as input.
|
|||
|
The `panflute` library is available as `pf`.
|
|||
|
|
|||
|
When one of these functions is called, the mode is set and functions from the
|
|||
|
other mode cannot be called within the same block of code.
|
|||
|
|
|||
|
**Examples:**
|
|||
|
|
|||
|
````markdown {.group}
|
|||
|
---
|
|||
|
title: Foo
|
|||
|
---
|
|||
|
```python {.run}
|
|||
|
println("*wooo*")
|
|||
|
println()
|
|||
|
println("The title of this file is: " + ctx.get_metadata("title"))
|
|||
|
```
|
|||
|
````
|
|||
|
|
|||
|
```python {.run}
|
|||
|
appendChild(pf.Para(pf.Strong(pf.Str("foo"))))
|
|||
|
```
|
|||
|
|
|||
|
### Defining and running commands
|
|||
|
|
|||
|
Code blocks can be also saved and executed later. Defining is done using the
|
|||
|
`define` attribute:
|
|||
|
|
|||
|
**Example:**
|
|||
|
|
|||
|
```python {define=commandname}
|
|||
|
print("foo")
|
|||
|
```
|
|||
|
|
|||
|
If you try to define the same command twice, you will get an error. To redefine
|
|||
|
a command, use the `define` attribute instead of `redefine`.
|
|||
|
|
|||
|
### Running defined commands
|
|||
|
|
|||
|
There are multiple ways of running commands. There is the shorthand way:
|
|||
|
|
|||
|
[!commandname]{}
|
|||
|
|
|||
|
Or using the `c` attribute on a span or a div:
|
|||
|
|
|||
|
[Some content]{c=commandname}
|
|||
|
|
|||
|
:::{c=commandname}
|
|||
|
Some content
|
|||
|
:::
|
|||
|
|
|||
|
To access the content or attributes of the div or span the command has been
|
|||
|
called on, the `element` variable is available, which contains the `panflute`
|
|||
|
representation of the element.
|
|||
|
|
|||
|
**Example:**
|
|||
|
|
|||
|
```python {define=index}
|
|||
|
appendChild(element.content[int(element.attributes["i"])])
|
|||
|
```
|
|||
|
|
|||
|
[Pick the third element from this span]{c=index i=2}
|
|||
|
|
|||
|
### Direct metadata print
|
|||
|
Metadata can be printed directly using a shorthand. The advantage of this is it
|
|||
|
keeps the formatting from the metadatum's definition
|
|||
|
|
|||
|
```markdown {.group}
|
|||
|
---
|
|||
|
a:
|
|||
|
b: some text with **strong**
|
|||
|
---
|
|||
|
[$a.b]{}
|
|||
|
```
|
|||
|
|
|||
|
### Syntax highlighting
|
|||
|
Formátítko uses [pygments](https://pygments.org/) to highlight syntax in code
|
|||
|
blocks. To turn it off for a single block, don't specify a language or set the
|
|||
|
`highlight` attribute to `False`. You can also set the metadatum `highlight` to
|
|||
|
`false` in the FrontMatter to disable it in a given Group. To change the [highlighting
|
|||
|
style](https://pygments.org/styles/), you have to set the `highlight-style`
|
|||
|
metadatum in the **top-level document** this is to prevent the need for many
|
|||
|
inline style definitions.
|
|||
|
|
|||
|
**Examples:**
|
|||
|
```python
|
|||
|
print("cool")
|
|||
|
```
|
|||
|
|
|||
|
```zsh {highlight=False}
|
|||
|
./formatitko.py README.md
|
|||
|
```
|
|||
|
|
|||
|
### Language awareness
|
|||
|
Formátítko is language aware, this means that the `language` metadatum is
|
|||
|
somewhat special. When set using the front matter, it is also popped out to TeX
|
|||
|
as a `\languagexx` macro. Currently supported values are `cs` and `en` for
|
|||
|
internal uses but can be set to anything.
|
|||
|
|
|||
|
### NBSP
|
|||
|
Formátítko automatically inserts no-break spaces according to its sorta smart
|
|||
|
rules. (See the `whitespace.py` file for more info) These rules **depend on the
|
|||
|
chosen language**. (`cs` has some additional rules)
|
|||
|
|
|||
|
To insert a literal no-break space, you can either insert the unicode no-break
|
|||
|
space or use the html escape.
|
|||
|
|
|||
|
Enforcing a breakable space is not as painless, you should insert a zero-width
|
|||
|
space beside the normal​ space.
|
|||
|
|
|||
|
### Smart quotes
|
|||
|
Quotes get automatically converted to the slanted ones according to the current
|
|||
|
language.
|
|||
|
|
|||
|
**Examples:**
|
|||
|
|
|||
|
```markdown {.group}
|
|||
|
---
|
|||
|
language: cs
|
|||
|
---
|
|||
|
"Uvozovky se v českém testu píší 'jinak' než v angličtině."
|
|||
|
```
|
|||
|
|
|||
|
"In Czech texts, quotes are written 'differently' than in English"
|
|||
|
|
|||
|
### Math
|
|||
|
Math blocks get automatically converted to HTML using $Ka\TeX$ and fall out
|
|||
|
unchanged into TeX output.
|
|||
|
|
|||
|
To make KaTeX as consistent with TeX as possible, the `\begingroup` and
|
|||
|
`\endgroup` that are produced by [Groups](#groups) are also emulated in the
|
|||
|
KaTeX environment, so macro definitions should be isolated as you expect.
|
|||
|
|
|||
|
### Images
|
|||
|
|
|||
|
#### Figures
|
|||
|
Pandoc's [implicit
|
|||
|
figures](https://pandoc.org/MANUAL.html#extension-implicit_figures) are enabled,
|
|||
|
so images which are alone in a paragraph are automatically converted to figures:
|
|||
|
|
|||
|
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers (it defaults to the caption)"){width=10em}
|
|||
|
|
|||
|
To prevent this, add a backslash at the end of the line with the image:
|
|||
|
|
|||
|
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers"){width=10em}\
|
|||
|
|
|||
|
#### Image gathering
|
|||
|
Images are automatically searched for in the directory where each markdown file is
|
|||
|
(including partials) and also in directories listed in the `--lookup-dirs`
|
|||
|
command line parameter. After processing, they're all put into the folder
|
|||
|
specified with `--public-dir`.
|
|||
|
|
|||
|
#### Image processing
|
|||
|
Images are automatically processed so that they can be successfully used in both
|
|||
|
output formats. This includes generating multiple sizes and providing a
|
|||
|
[srcset](https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images).
|
|||
|
|
|||
|
To customize this, the `file-width`, `file-height`, `file-dpi`, `file-quality`
|
|||
|
and `no-srcset` attributes are available. All but the last one should be
|
|||
|
integers.
|
|||
|
|
|||
|
Keep in mind that the processing tries to be as lazy as possible, so it never
|
|||
|
overwrites any files and if it finds the right format or resolution (only
|
|||
|
judging by the filenames) in the lookup directories it will just copy that. This
|
|||
|
means that any automatic attempts at conversion can be overridden by converting
|
|||
|
the file yourself, naming it accordingly and placing it either in the public or
|
|||
|
one of the lookup directories.
|
|||
|
|
|||
|
## Working with the produced output
|
|||
|
|
|||
|
### HTML
|
|||
|
The HTML should be almost usable as-is. The styles for synstax-highlighting are
|
|||
|
added automatically. The styles for KaTeX however are not and should be added in
|
|||
|
your `<head>`^[This is taken directly from [KaTeX's docs](https://katex.org/docs/browser.html)]:
|
|||
|
|
|||
|
```html
|
|||
|
<link rel='stylesheet' href='https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css' integrity='sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0' crossorigin='anonymous'>
|
|||
|
```
|
|||
|
|
|||
|
Also the output HTML is not intended as a standalone file but should be included
|
|||
|
as part of a larger template. (That includes a doctype, other css, etc.)
|
|||
|
|
|||
|
### TeX
|
|||
|
The TeX output is not usable as is. Many of the elements are just converted to
|
|||
|
macros, which you have to define yourself. There is an example implementation in
|
|||
|
`formatitko.tex`, which uses LuaTeX and the ucwmac package, but you should
|
|||
|
customize it to your needs (and to the context in which the output is used).
|