469 lines
16 KiB
Markdown
469 lines
16 KiB
Markdown
---
|
||
language: en
|
||
highlight-style: native
|
||
---
|
||
|
||
# Formátítko 2.0
|
||
A python program based on [pandoc](https://pandoc.org/) and its python library
|
||
[panflute](http://scorreia.com/software/panflute) for converting from markdown
|
||
to TeX and HTML with added fancy features like image processing, python-based
|
||
macros and much more.
|
||
|
||
## Requirements
|
||
This project requires `panflute 2.3.0` that itself requires `pandoc 3.0`. If the
|
||
version of `pandoc` doesn't match, very weird things can happen. ImageMagick and
|
||
Inkscape are used for image processing. Nodejs is used for KaTeX.
|
||
|
||
## Usage
|
||
```
|
||
usage: formatitko [-h] [-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]] [-p IMG_PUBLIC_DIR] [-c IMG_CACHE_DIR] [-i IMG_WEB_PATH] [-w OUTPUT_HTML] [-t OUTPUT_TEX] [-m OUTPUT_MD]
|
||
[-j OUTPUT_JSON] [--katex-server] [-k KATEX_SOCKET] [--debug]
|
||
input_filename
|
||
|
||
positional arguments:
|
||
input_filename The markdown file to process.
|
||
|
||
options:
|
||
-h, --help show this help message and exit
|
||
-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...], --img-lookup-dirs IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]
|
||
Image lookup directories. When processing images, the program will try to find the image in them first. Always looks for images in the same folder
|
||
as the markdown file. (default: [])
|
||
-p IMG_PUBLIC_DIR, --img-public-dir IMG_PUBLIC_DIR
|
||
Directory to put processed images into. The program will overwrite images, whose dependencies are newer. (default: public)
|
||
-c IMG_CACHE_DIR, --img-cache-dir IMG_CACHE_DIR
|
||
Directory to cache processed images and intermediate products. The program will overwrite files, whose dependencies are newer. (default: cache)
|
||
-i IMG_WEB_PATH, --img-web-path IMG_WEB_PATH
|
||
Path where the processed images are available on the website. (default: /)
|
||
-w OUTPUT_HTML, --output-html OUTPUT_HTML
|
||
The HTML file (for Web) to write into. (default: None)
|
||
-t OUTPUT_TEX, --output-tex OUTPUT_TEX
|
||
The TEX file to write into. (default: None)
|
||
-m OUTPUT_MD, --output-md OUTPUT_MD
|
||
The Markdown file to write into. (Uses pandoc to generate markdown) (default: None)
|
||
-j OUTPUT_JSON, --output-json OUTPUT_JSON
|
||
The JSON file to dump the pandoc-compatible AST into. (default: None)
|
||
--katex-server Starts a KaTeX server and prints the socket filename onto stdout. Useful for running formatitko many times without starting the KaTeX server each
|
||
time. (default: False)
|
||
-k KATEX_SOCKET, --katex-socket KATEX_SOCKET
|
||
The KaTeX server socket filename obtained by running with `--katex-server`. (default: None)
|
||
--debug
|
||
```
|
||
|
||
## Format
|
||
Formátítko uses all the default pandoc markdown extensions except for
|
||
definition lists and citations. It also adds its own custom features.
|
||
|
||
## Features
|
||
|
||
### Hiding and showing elements based on flags
|
||
|
||
Flags can be set in the Front Matter or with python code. Then, elements with
|
||
the `if` attribute will only be shown if the flag is set to True and elements
|
||
with the `ifn` attribute will only be show if the flag is not set to True.
|
||
|
||
**Example:**
|
||
|
||
```markdown {.group}
|
||
---
|
||
flags:
|
||
foo: true
|
||
---
|
||
[This will be shown]{if=foo}
|
||
|
||
[This will not be shown]{if=bar}
|
||
|
||
[This will be shown]{ifn=bar}
|
||
```
|
||
|
||
### Including other files
|
||
|
||
There are two ways of including files.
|
||
|
||
#### Importing
|
||
The first is importing, which only takes the state (defined commands, metadata,
|
||
etc.) from the file and any content is omitted. This is useful for creating
|
||
libraries of commands.
|
||
|
||
There are three types of imports:
|
||
|
||
##### Python Module (the default)
|
||
```markdown
|
||
[#ksp_formatitko as ksp]{}
|
||
```
|
||
or
|
||
```markdown
|
||
[#ksp_formatitko]{}
|
||
```
|
||
with an optional `type=module` in the curly brackets, tries to import a python
|
||
module as a set of formatitko commands. See below for more details about
|
||
commands.
|
||
|
||
##### JSON Metadata
|
||
[#test/test.json]{type=metadata key=orgs}
|
||
|
||
This will import metadata from a JSON file. THe optional `key` argument sets the
|
||
key under which the whole JSON file will be placed. Dictionaries are merged,
|
||
others overwritten.
|
||
|
||
#### Partials
|
||
Partials are the very opposite of imports, they have their own context, which
|
||
inherits everything from the context they're included in, but gets reset after
|
||
the file ends.
|
||
|
||
:::{partial=test/empty.md}
|
||
:::
|
||
|
||
If the `untrusted` attribute is set to True, the partial and all its children
|
||
will not be able to define commands or run inline blocks (but it will be able to
|
||
run commands defined in the parent). ^[Please don't trust this for any security
|
||
though, we're playing with *eval* fire, this will never be secure.]
|
||
|
||
You can also import raw HTML and TeX if you set the `type` attribute of the
|
||
partial to `tex` or `html`.
|
||
|
||
### Groups
|
||
|
||
Groups are pieces of markdown with their own sandboxed context, in other words,
|
||
inline partials. Syntax-wise they are pandoc Divs with the `.group` class. All
|
||
attributes of the Div will be passed down as metadata to the group.
|
||
|
||
::: {.group lang=cs}
|
||
OOOoo český mód
|
||
:::
|
||
|
||
If you want to have more fancy metadata, that can only be specified in a front
|
||
matter, you can use the following syntax:
|
||
|
||
```markdown {.group}
|
||
---
|
||
lang: cs
|
||
---
|
||
OOOoo český mód
|
||
```
|
||
|
||
If you need to nest groups or have code blocks inside groups, you can increase
|
||
the amount of backticks around the outer block:
|
||
|
||
````markdown {.group}
|
||
```go
|
||
fmt.Pritln("owo")
|
||
```
|
||
````
|
||
|
||
Note however, that when this syntax is used, pandoc is executed for each of
|
||
these blocks which could get slow. Using divs is preferred.
|
||
|
||
Groups and partials are also enclosed in `\begingroup` and `\endgroup` in the
|
||
output TeX.
|
||
|
||
### Raw HTML and TeX ^[This is a pandoc feature]
|
||
If raw HTML or TeX is included in the markdown file, it will automagically pop
|
||
out into the respective output file.
|
||
|
||
<em style="color: red">red text</em>
|
||
|
||
\vskip1em
|
||
|
||
This has the advantage and disadvantage of being very *"automagic"*, which means
|
||
that for instance markdown inside HTML will still get interpreted as markdown.
|
||
It is however very very unreliable, so in most cases, you should use explicit
|
||
raw blocks with the unnamed attribute set to either `html` or `tex`. ^[Still a
|
||
pandoc feature.]
|
||
|
||
``` {=html}
|
||
<span style="color: red">red text</span>
|
||
```
|
||
|
||
### Running python code
|
||
|
||
Formátítko allows you to run Python code directly from your MD file. Any
|
||
`python` code block with the class `run` will be executed.
|
||
|
||
#### Command environment
|
||
|
||
The commands will be executed as functions with the following signature:
|
||
```python
|
||
def command(element: Command, context: Context) -> list[Element]:
|
||
```
|
||
some global variables may be available, and are defined in `command_env.py`:
|
||
```python
|
||
import panflute as pf
|
||
import formatitko.elements as fe
|
||
from formatitko.util import import_md_list
|
||
from formatitko.util import parse_string
|
||
|
||
from formatitko.context import Context
|
||
from formatitko.command import Command
|
||
from panflute import Element
|
||
```
|
||
##### `element` parameter
|
||
|
||
The `element` parameter holds the element the command is currently being executed
|
||
on. In the case of running python blocks directly, it is probably not
|
||
interesting but will get interesting later.
|
||
|
||
##### `context` parameter
|
||
|
||
You can access the current context using the `context` parameter. The context
|
||
provides read/write access to the FrontMatter metadata. The context has the
|
||
following methods:
|
||
|
||
`context.get_metadata(key: str, simple: bool=True, immediate: bool=False)`
|
||
|
||
- `key`: The key of the metadatum you want to get. Separate child keys with
|
||
dots: `ctx.get_metadata("flags.foo")`
|
||
- `simple`: Whether to use python's simple builtin types or panflute's
|
||
MetaValues. MetaValues can contain formatted text, simple values loose all
|
||
formatting.
|
||
- `immediate`: Only get metadatum from the current context, not from its
|
||
parents.
|
||
|
||
`context.set_metadata(key: str, value)`
|
||
|
||
- `key`: The key of the metadatum you want to get. Separate child keys with
|
||
dots: `ctx.get_metadata("flags.foo")`
|
||
- `value`: Any value you want to assign to the metadatum
|
||
|
||
`context.unset_metadata(key: str)`
|
||
|
||
Delete the metadatum in the current context and allow it to inherit the value
|
||
from the parent context.
|
||
|
||
- `key`: The key of the metadatum you want to get. Separate child keys with
|
||
dots: `ctx.get_metadata("flags.foo")`
|
||
|
||
Helper functions for flags exist which work the same as for metadata:
|
||
|
||
`context.is_flag_set(flag: str) -> bool`
|
||
|
||
`context.set_flag(flag: str, val: bool)`
|
||
|
||
`context.unset_flag(flag: str)`
|
||
|
||
There are also other useful functions, which you can see for yourself in
|
||
`context.py`.
|
||
|
||
> **WARNING**: Writing to metadata should **only** be done **at the beginning**
|
||
> of the document or a group (before any printable content). Writing to metadata
|
||
> in other places in the document might cause undefined behaviour (mostly some
|
||
> elements might behave as if the metadata was set elsewhere).
|
||
|
||
##### Return value
|
||
The function **must** return a list of valid Elements. This list may be empty.
|
||
These elements will be placed in the document in the location where the command
|
||
was invoked.
|
||
|
||
The `parse_string` function might be useful, it turns a simple string into a
|
||
list of panflute's `Str`s and `Space`s (without any formatting). If you want to
|
||
use markdown in your function output, you have to convert it yourself using
|
||
`import_md` but beware this calls pandoc, is potentially slow and is
|
||
discouraged.
|
||
|
||
|
||
**Examples:**
|
||
|
||
````markdown {.group}
|
||
---
|
||
title: Foo
|
||
---
|
||
```python {.run}
|
||
return [
|
||
pf.Para(pf.Emph(pf.Str("wooo"))),
|
||
pf.Para(*parse_string("The title of this file is: " + context.get_metadata("title")))
|
||
]
|
||
```
|
||
````
|
||
|
||
```python {.run}
|
||
return [pf.Strong(*parse_string("Hello world!"))]
|
||
```
|
||
|
||
### Defining and running commands
|
||
|
||
Code blocks can be also saved and executed later. Defining is done using the
|
||
`define` attribute:
|
||
|
||
**Example:**
|
||
|
||
```python {define=commandname}
|
||
return [pf.Str("foo")]
|
||
```
|
||
|
||
If you try to define the same command twice, you will get an error. To redefine
|
||
a command, use the `define` attribute instead of `redefine`.
|
||
|
||
### Running defined commands
|
||
|
||
There are multiple ways of running commands. There is the shorthand way:
|
||
|
||
[!commandname]{}
|
||
|
||
Or using the `c` attribute on a span or a div (new: or a codeblock!):
|
||
|
||
[Some content]{c=commandname}
|
||
|
||
:::{c=commandname}
|
||
Some content
|
||
:::
|
||
|
||
```python {define=bash}
|
||
import subprocess
|
||
c = subprocess.run(["bash", "-c", element.text], stdout=subprocess.PIPE, check=True, encoding="utf-8")
|
||
return [pf.Para(pf.Str(c.stdout))]
|
||
```
|
||
|
||
```bash {c=bash}
|
||
cat /etc/hostname
|
||
```
|
||
|
||
To access the content or attributes of the div or span the command has been
|
||
called on, the `element` variable is available, which contains the `panflute`
|
||
representation of the element.
|
||
|
||
**Example:**
|
||
|
||
```python {define=index}
|
||
return [element.content[int(element.attributes["i"])]]
|
||
```
|
||
|
||
[Pick the third element from this span]{c=index i=2}
|
||
|
||
### Direct metadata print
|
||
Metadata can be printed directly using a shorthand. The advantage of this is it
|
||
keeps the formatting from the metadatum's definition
|
||
|
||
```markdown {.group}
|
||
---
|
||
a:
|
||
b: some text with **strong**
|
||
---
|
||
[$a.b]{}
|
||
```
|
||
|
||
### Syntax highlighting
|
||
Formátítko uses [pygments](https://pygments.org/) to highlight syntax in code
|
||
blocks. To turn it off for a single block, don't specify a language or set the
|
||
`highlight` attribute to `False`. You can also set the metadatum `highlight` to
|
||
`false` in the FrontMatter to disable it in a given Group. To change the [highlighting
|
||
style](https://pygments.org/styles/), you have to set the `highlight-style`
|
||
metadatum or the `style` attribute directly on the element.
|
||
|
||
**Examples:**
|
||
```python
|
||
print("cool")
|
||
```
|
||
|
||
```python {style=manni}
|
||
print("freezing")
|
||
```
|
||
|
||
```zsh {highlight=False}
|
||
./formatitko.py README.md
|
||
```
|
||
|
||
### Language awareness
|
||
Formátítko is language aware, this means that the `lang` metadatum is
|
||
somewhat special. (It is also special for pandoc)
|
||
|
||
### NBSP
|
||
Formátítko automatically inserts no-break spaces according to its sorta smart
|
||
rules. (See the `whitespace.py` file for more info) These rules **depend on the
|
||
chosen language**. (`cs` has some additional rules)
|
||
|
||
To insert a literal no-break space, you can either insert the unicode no-break
|
||
space or use the html escape.
|
||
|
||
Enforcing a breakable space is not as painless, you should insert a zero-width
|
||
space beside the normal​ space.
|
||
|
||
### Smart quotes
|
||
Quotes get automatically converted to the slanted ones according to the current
|
||
language.
|
||
|
||
**Examples:**
|
||
|
||
::: {.group lang=cs}
|
||
"Uvozovky se v českém testu píší 'jinak' než v angličtině."
|
||
:::
|
||
|
||
"In Czech texts, quotes are written 'differently' than in English"
|
||
|
||
### Math
|
||
Math blocks get automatically converted to HTML using $Ka\TeX$ and fall out
|
||
unchanged into TeX output.
|
||
|
||
To make KaTeX as consistent with TeX as possible, the `\begingroup` and
|
||
`\endgroup` that are produced by [Groups](#groups) are also emulated in the
|
||
KaTeX environment, so macro definitions should be isolated as you expect.
|
||
|
||
### Images
|
||
|
||
#### Figures
|
||
Pandoc's [implicit
|
||
figures](https://pandoc.org/MANUAL.html#extension-implicit_figures) are enabled,
|
||
so images which are alone in a paragraph are automatically converted to figures:
|
||
|
||
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers (it defaults to the caption)"){width=10em}
|
||
|
||
To prevent this, add a backslash at the end of the line with the image:
|
||
|
||
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers"){width=10em}\
|
||
|
||
#### Image gathering
|
||
Images are automatically searched for in the directory where each markdown file is
|
||
(including partials) and also in directories listed in the `--lookup-dirs`
|
||
command line parameter. After processing, they're all put into the folder
|
||
specified with `--public-dir`.
|
||
|
||
Formátítko also does dependency management, which means that all images will be
|
||
regenerated only when their dependencies are newer.
|
||
|
||
#### Image processing
|
||
Images are automatically processed so that they can be successfully used in both
|
||
output formats. This includes generating multiple sizes and providing a
|
||
[srcset](https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images).
|
||
|
||
To customize this, the `file-width`, `file-height`, `file-dpi`, `file-quality`
|
||
and `no-srcset` attributes are available. All but the last one should be
|
||
integers.
|
||
|
||
Processing also includes Asymptote images -- you can simply include an asymptote
|
||
program as an image and formátítko handles the rest for you.
|
||
|
||
#### Content headers and footers
|
||
|
||
If you want formatitko to generate fully formed html files for you, you might
|
||
want to add a HTML partial with the starting tags and `<head>`. This would
|
||
normally not work, because the entire document is wrapped with `<main>`. Using
|
||
the special `.header_content` and `.footer_content` classes of divs, you can
|
||
append content to a header and footer, which are popped to the output before and
|
||
after the document.
|
||
|
||
:::: {.header_content}
|
||
::: {partial="test/test-top.html" type="html"}
|
||
:::
|
||
::::
|
||
|
||
## Working with the produced output
|
||
|
||
### HTML
|
||
The HTML should be almost usable as-is. The styles for synstax-highlighting are
|
||
added automatically. The styles for KaTeX however are not and should be added in
|
||
your `<head>`^[This is taken directly from [KaTeX's docs](https://katex.org/docs/browser.html)]:
|
||
|
||
```html
|
||
<link rel='stylesheet' href='https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css' integrity='sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0' crossorigin='anonymous'>
|
||
```
|
||
|
||
You can see how this is done in `test/test.md`
|
||
|
||
### TeX
|
||
The TeX output is not usable as is. Many of the elements are just converted to
|
||
macros, which you have to define yourself. There is an example implementation in
|
||
`formatitko.tex`, which uses LuaTeX and the ucwmac package, but you should
|
||
customize it to your needs (and to the context in which the output is used).
|
||
|
||
## More examples
|
||
|
||
More usage examples can be found (even though a bit chaotically) in the test
|
||
directory.
|