You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

377 lines
12 KiB

---
language: en
highlight-style: native
---
# Formátítko 2.0
A python program based on [pandoc](https://pandoc.org/) and its python library
[panflute](http://scorreia.com/software/panflute) for converting from markdown
to TeX and HTML with added fancy features like image processing, python-based
macros and much more.
## Requirements
This project requires `panflute 2.3.0` that itself requires `pandoc 3.0`. If the
version of `pandoc` doesn't match, very weird things can happen. ImageMagick and
Inkscape are used for image processing. Nodejs is used for KaTeX.
## Usage
```
usage: formatitko.py [-h] [-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]] [-p IMG_PUBLIC_DIR] [-i IMG_WEB_PATH] [-w OUTPUT_HTML] [-t OUTPUT_TEX] input_filename
positional arguments:
input_filename The markdown file to process.
options:
-h, --help show this help message and exit
-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...], --img-lookup-dirs IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]
Image lookup directories. When processing images, the program will try to find the image in them first. Always looks for images in the same folder as the markdown
file. (default: [])
-p IMG_PUBLIC_DIR, --img-public-dir IMG_PUBLIC_DIR
Directory to put processed images into. The program will not overwrite existing images. (default: public)
-i IMG_WEB_PATH, --img-web-path IMG_WEB_PATH
Path where the processed images are available on the website. (default: /)
-w OUTPUT_HTML, --output-html OUTPUT_HTML
The HTML file (for Web) to write into. (default: output.html)
-t OUTPUT_TEX, --output-tex OUTPUT_TEX
The TEX file to write into. (default: output.tex)
```
## Format
Formátítko uses all the default pandoc markdown extensions except for
definition lists and citations. It also adds its own custom features.
## Features
### Hiding and showing elements based on flags
Flags can be set in the Front Matter or with python code. Then, elements with
the `if` attribute will only be shown if the flag is set to True and elements
with the `ifn` attribute will only be show if the flag is not set to True.
**Example:**
```markdown {.group}
---
flags:
foo: true
---
[This will be shown]{if=foo}
[This will not be shown]{if=bar}
[This will be shown]{ifn=bar}
```
### Including other files
There are two ways of including files.
#### Importing
The first is importing, which only takes the state (defined commands, metadata,
etc.) from the file and any content is omitted. This is useful for creating
libraries of commands. The syntax is as follows:
[#test/empty.md]{}
The curly braces are required for pandoc to parse the import properly and should
be left empty.
#### Partials
Partials are the very opposite of imports, they have their own context, which
inherits everything from the context they're included in, but gets reset after
the file ends.
:::{partial=test/empty.md}
:::
If the `untrusted` attribute is set to True, the partial and all its children
will not be able to define commands or run inline blocks (but it will be able to
run commands defined in the parent). ^[Please don't trust this for any security
though, we're playing with *eval* fire, this will never be secure.]
You can also import raw HTML and TeX if you set the `type` attribute of the
partial to `tex` or `html`.
### Groups
Groups are pieces of markdown with their own sandboxed context, in other words,
inline partials. They function exactly the same as partials, namely can have
their own front matter.
```markdown {.group}
---
language: cs
---
OOOoo český mód
```
If you need to nest groups or have code blocks inside groups, you can increase
the amount of backticks around the outer block:
````markdown {.group}
```go
fmt.Pritln("owo")
```
````
Groups and partials are also enclosed in `\begingroup` and `\endgroup` in the
output TeX.
### Raw HTML and TeX ^[This is a pandoc feature]
If raw HTML or TeX is included in the markdown file, it will automagically pop
out into the respective output file.
<em style="color: red">red text</em>
\vskip1em
This has the advantage and disadvantage of being very *"automagic"*, which means
that for instance markdown inside HTML will still get interpreted as markdown.
It is however very very unreliable, so in most cases, you should use explicit
raw blocks with the unnamed attribute set to either `html` or `tex`. ^[Still a
pandoc feature.]
``` {=html}
<span style="color: red">red text</span>
```
### Running python code
Formátítko allows you to run Python code directly from your MD file. Any
`python` code block with the class `run` will be executed:
#### Context
You can access the current context using the `ctx` variable. The context
provides read/write access to the FrontMatter metadata. The context has the
following methods:
`ctx.get_metadata(key: str, simple: bool=True, immediate: bool=False)`
- `key`: The key of the metadatum you want to get. Separate child keys with
dots: `ctx.get_metadata("flags.foo")`
- `simple`: Whether to use python's simple builtin types or panflute's
MetaValues. MetaValues can contain formatted text, simple values loose all
formatting.
- `immediate`: Only get metadatum from the current context, not from its
parents.
`ctx.set_metadata(key: str, value)`
- `key`: The key of the metadatum you want to get. Separate child keys with
dots: `ctx.get_metadata("flags.foo")`
- `value`: Any value you want to assign to the metadatum
`ctx.unset_metadata(key: str)`
Delete the metadatum in the current context and allow it to inherit the value
from the parent context.
- `key`: The key of the metadatum you want to get. Separate child keys with
dots: `ctx.get_metadata("flags.foo")`
Helper functions for flags exist which work the same as for metadata:
`ctx.is_flag_set(flag: str) -> bool`
`ctx.set_flag(flag: str, val: bool)`
`ctx.unset_flag(flag: str)`
#### Writing output
There are two modes of writing output, plaintext and element-based.
Plaintext mode uses the `print(text: str)` and `println(text: str)` functions,
that append text to a buffer which is then interpreted as markdown input.
Element-based mode uses the `appendChild(element: pf.Element)` and
`appendChildren(*elements: List[pf.Element])` functions which allow you to
append `panflute` elements to a list which is then again interpreted as input.
The `panflute` library is available as `pf`.
When one of these functions is called, the mode is set and functions from the
other mode cannot be called within the same block of code.
**Examples:**
````markdown {.group}
---
title: Foo
---
```python {.run}
println("*wooo*")
println()
println("The title of this file is: " + ctx.get_metadata("title"))
```
````
```python {.run}
appendChild(pf.Para(pf.Strong(pf.Str("foo"))))
```
### Defining and running commands
Code blocks can be also saved and executed later. Defining is done using the
`define` attribute:
**Example:**
```python {define=commandname}
print("foo")
```
If you try to define the same command twice, you will get an error. To redefine
a command, use the `define` attribute instead of `redefine`.
### Running defined commands
There are multiple ways of running commands. There is the shorthand way:
[!commandname]{}
Or using the `c` attribute on a span or a div:
[Some content]{c=commandname}
:::{c=commandname}
Some content
:::
To access the content or attributes of the div or span the command has been
called on, the `element` variable is available, which contains the `panflute`
representation of the element.
**Example:**
```python {define=index}
appendChild(element.content[int(element.attributes["i"])])
```
[Pick the third element from this span]{c=index i=2}
### Direct metadata print
Metadata can be printed directly using a shorthand. The advantage of this is it
keeps the formatting from the metadatum's definition
```markdown {.group}
---
a:
b: some text with **strong**
---
[$a.b]{}
```
### Syntax highlighting
Formátítko uses [pygments](https://pygments.org/) to highlight syntax in code
blocks. To turn it off for a single block, don't specify a language or set the
`highlight` attribute to `False`. You can also set the metadatum `highlight` to
`false` in the FrontMatter to disable it in a given Group. To change the [highlighting
style](https://pygments.org/styles/), you have to set the `highlight-style`
metadatum in the **top-level document** this is to prevent the need for many
inline style definitions.
**Examples:**
```python
print("cool")
```
```zsh {highlight=False}
./formatitko.py README.md
```
### Language awareness
Formátítko is language aware, this means that the `language` metadatum is
somewhat special. When set using the front matter, it is also popped out to TeX
as a `\languagexx` macro. Currently supported values are `cs` and `en` for
internal uses but can be set to anything.
### NBSP
Formátítko automatically inserts no-break spaces according to its sorta smart
rules. (See the `whitespace.py` file for more info) These rules **depend on the
chosen language**. (`cs` has some additional rules)
To insert a literal no-break space, you can either insert the unicode no-break
space or use&nbsp;the html escape.
Enforcing a breakable space is not as painless, you should insert a​ zero-width
space beside the normal&#8203; space.
### Smart quotes
Quotes get automatically converted to the slanted ones according to the current
language.
**Examples:**
```markdown {.group}
---
language: cs
---
"Uvozovky se v českém testu píší 'jinak' než v angličtině."
```
"In Czech texts, quotes are written 'differently' than in English"
### Math
Math blocks get automatically converted to HTML using $Ka\TeX$ and fall out
unchanged into TeX output.
To make KaTeX as consistent with TeX as possible, the `\begingroup` and
`\endgroup` that are produced by [Groups](#groups) are also emulated in the
KaTeX environment, so macro definitions should be isolated as you expect.
### Images
#### Figures
Pandoc's [implicit
figures](https://pandoc.org/MANUAL.html#extension-implicit_figures) are enabled,
so images which are alone in a paragraph are automatically converted to figures:
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers (it defaults to the caption)"){width=10em}
To prevent this, add a backslash at the end of the line with the image:
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers"){width=10em}\
#### Image gathering
Images are automatically searched for in the directory where each markdown file is
(including partials) and also in directories listed in the `--lookup-dirs`
command line parameter. After processing, they're all put into the folder
specified with `--public-dir`.
#### Image processing
Images are automatically processed so that they can be successfully used in both
output formats. This includes generating multiple sizes and providing a
[srcset](https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images).
To customize this, the `file-width`, `file-height`, `file-dpi`, `file-quality`
and `no-srcset` attributes are available. All but the last one should be
integers.
Keep in mind that the processing tries to be as lazy as possible, so it never
overwrites any files and if it finds the right format or resolution (only
judging by the filenames) in the lookup directories it will just copy that. This
means that any automatic attempts at conversion can be overridden by converting
the file yourself, naming it accordingly and placing it either in the public or
one of the lookup directories.
## Working with the produced output
### HTML
The HTML should be almost usable as-is. The styles for synstax-highlighting are
added automatically. The styles for KaTeX however are not and should be added in
your `<head>`^[This is taken directly from [KaTeX's docs](https://katex.org/docs/browser.html)]:
```html
<link rel='stylesheet' href='https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css' integrity='sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0' crossorigin='anonymous'>
```
Also the output HTML is not intended as a standalone file but should be included
as part of a larger template. (That includes a doctype, other css, etc.)
### TeX
The TeX output is not usable as is. Many of the elements are just converted to
macros, which you have to define yourself. There is an example implementation in
`formatitko.tex`, which uses LuaTeX and the ucwmac package, but you should
customize it to your needs (and to the context in which the output is used).