Browse Source
- Inline groups - Raw partils - Fixed image paths - Overall revamped image processing - Untrusted partialspull/28/head
Jan Černohorský
2 years ago
23 changed files with 869 additions and 411 deletions
@ -0,0 +1,376 @@ |
|||
--- |
|||
language: en |
|||
highlight-style: native |
|||
--- |
|||
|
|||
# Formátítko 2.0 |
|||
A python program based on [pandoc](https://pandoc.org/) and its python library |
|||
[panflute](http://scorreia.com/software/panflute) for converting from markdown |
|||
to TeX and HTML with added fancy features like image processing, python-based |
|||
macros and much more. |
|||
|
|||
## Requirements |
|||
This project requires `panflute 2.3.0` that itself requires `pandoc 3.0`. If the |
|||
version of `pandoc` doesn't match, very weird things can happen. ImageMagick and |
|||
Inkscape are used for image processing. Nodejs is used for KaTeX. |
|||
|
|||
## Usage |
|||
``` |
|||
usage: formatitko.py [-h] [-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]] [-p IMG_PUBLIC_DIR] [-i IMG_WEB_PATH] [-w OUTPUT_HTML] [-t OUTPUT_TEX] input_filename |
|||
|
|||
positional arguments: |
|||
input_filename The markdown file to process. |
|||
|
|||
options: |
|||
-h, --help show this help message and exit |
|||
-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...], --img-lookup-dirs IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...] |
|||
Image lookup directories. When processing images, the program will try to find the image in them first. Always looks for images in the same folder as the markdown |
|||
file. (default: []) |
|||
-p IMG_PUBLIC_DIR, --img-public-dir IMG_PUBLIC_DIR |
|||
Directory to put processed images into. The program will not overwrite existing images. (default: public) |
|||
-i IMG_WEB_PATH, --img-web-path IMG_WEB_PATH |
|||
Path where the processed images are available on the website. (default: /) |
|||
-w OUTPUT_HTML, --output-html OUTPUT_HTML |
|||
The HTML file (for Web) to write into. (default: output.html) |
|||
-t OUTPUT_TEX, --output-tex OUTPUT_TEX |
|||
The TEX file to write into. (default: output.tex) |
|||
``` |
|||
|
|||
## Format |
|||
Formátítko uses all the default pandoc markdown extensions except for |
|||
definition lists and citations. It also adds its own custom features. |
|||
|
|||
## Features |
|||
|
|||
### Hiding and showing elements based on flags |
|||
|
|||
Flags can be set in the Front Matter or with python code. Then, elements with |
|||
the `if` attribute will only be shown if the flag is set to True and elements |
|||
with the `ifn` attribute will only be show if the flag is not set to True. |
|||
|
|||
**Example:** |
|||
|
|||
```markdown {.group} |
|||
--- |
|||
flags: |
|||
foo: true |
|||
--- |
|||
[This will be shown]{if=foo} |
|||
|
|||
[This will not be shown]{if=bar} |
|||
|
|||
[This will be shown]{ifn=bar} |
|||
``` |
|||
|
|||
### Including other files |
|||
|
|||
There are two ways of including files. |
|||
|
|||
#### Importing |
|||
The first is importing, which only takes the state (defined commands, metadata, |
|||
etc.) from the file and any content is omitted. This is useful for creating |
|||
libraries of commands. The syntax is as follows: |
|||
|
|||
[#test/empty.md]{} |
|||
|
|||
The curly braces are required for pandoc to parse the import properly and should |
|||
be left empty. |
|||
|
|||
#### Partials |
|||
Partials are the very opposite of imports, they have their own context, which |
|||
inherits everything from the context they're included in, but gets reset after |
|||
the file ends. |
|||
|
|||
:::{partial=test/empty.md} |
|||
::: |
|||
|
|||
If the `untrusted` attribute is set to True, the partial and all its children |
|||
will not be able to define commands or run inline blocks (but it will be able to |
|||
run commands defined in the parent). ^[Please don't trust this for any security |
|||
though, we're playing with *eval* fire, this will never be secure.] |
|||
|
|||
You can also import raw HTML and TeX if you set the `type` attribute of the |
|||
partial to `tex` or `html`. |
|||
|
|||
### Groups |
|||
|
|||
Groups are pieces of markdown with their own sandboxed context, in other words, |
|||
inline partials. They function exactly the same as partials, namely can have |
|||
their own front matter. |
|||
|
|||
```markdown {.group} |
|||
--- |
|||
language: cs |
|||
--- |
|||
OOOoo český mód |
|||
``` |
|||
|
|||
If you need to nest groups or have code blocks inside groups, you can increase |
|||
the amount of backticks around the outer block: |
|||
|
|||
````markdown {.group} |
|||
```go |
|||
fmt.Pritln("owo") |
|||
``` |
|||
```` |
|||
|
|||
Groups and partials are also enclosed in `\begingroup` and `\endgroup` in the |
|||
output TeX. |
|||
|
|||
### Raw HTML and TeX ^[This is a pandoc feature] |
|||
If raw HTML or TeX is included in the markdown file, it will automagically pop |
|||
out into the respective output file. |
|||
|
|||
<em style="color: red">red text</em> |
|||
|
|||
\vskip1em |
|||
|
|||
This has the advantage and disadvantage of being very *"automagic"*, which means |
|||
that for instance markdown inside HTML will still get interpreted as markdown. |
|||
It is however very very unreliable, so in most cases, you should use explicit |
|||
raw blocks with the unnamed attribute set to either `html` or `tex`. ^[Still a |
|||
pandoc feature.] |
|||
|
|||
``` {=html} |
|||
<span style="color: red">red text</span> |
|||
``` |
|||
|
|||
### Running python code |
|||
|
|||
Formátítko allows you to run Python code directly from your MD file. Any |
|||
`python` code block with the class `run` will be executed: |
|||
|
|||
#### Context |
|||
|
|||
You can access the current context using the `ctx` variable. The context |
|||
provides read/write access to the FrontMatter metadata. The context has the |
|||
following methods: |
|||
|
|||
`ctx.get_metadata(key: str, simple: bool=True, immediate: bool=False)` |
|||
|
|||
- `key`: The key of the metadatum you want to get. Separate child keys with |
|||
dots: `ctx.get_metadata("flags.foo")` |
|||
- `simple`: Whether to use python's simple builtin types or panflute's |
|||
MetaValues. MetaValues can contain formatted text, simple values loose all |
|||
formatting. |
|||
- `immediate`: Only get metadatum from the current context, not from its |
|||
parents. |
|||
|
|||
`ctx.set_metadata(key: str, value)` |
|||
|
|||
- `key`: The key of the metadatum you want to get. Separate child keys with |
|||
dots: `ctx.get_metadata("flags.foo")` |
|||
- `value`: Any value you want to assign to the metadatum |
|||
|
|||
`ctx.unset_metadata(key: str)` |
|||
|
|||
Delete the metadatum in the current context and allow it to inherit the value |
|||
from the parent context. |
|||
|
|||
- `key`: The key of the metadatum you want to get. Separate child keys with |
|||
dots: `ctx.get_metadata("flags.foo")` |
|||
|
|||
Helper functions for flags exist which work the same as for metadata: |
|||
|
|||
`ctx.is_flag_set(flag: str) -> bool` |
|||
|
|||
`ctx.set_flag(flag: str, val: bool)` |
|||
|
|||
`ctx.unset_flag(flag: str)` |
|||
|
|||
#### Writing output |
|||
|
|||
There are two modes of writing output, plaintext and element-based. |
|||
|
|||
Plaintext mode uses the `print(text: str)` and `println(text: str)` functions, |
|||
that append text to a buffer which is then interpreted as markdown input. |
|||
|
|||
Element-based mode uses the `appendChild(element: pf.Element)` and |
|||
`appendChildren(*elements: List[pf.Element])` functions which allow you to |
|||
append `panflute` elements to a list which is then again interpreted as input. |
|||
The `panflute` library is available as `pf`. |
|||
|
|||
When one of these functions is called, the mode is set and functions from the |
|||
other mode cannot be called within the same block of code. |
|||
|
|||
**Examples:** |
|||
|
|||
````markdown {.group} |
|||
--- |
|||
title: Foo |
|||
--- |
|||
```python {.run} |
|||
println("*wooo*") |
|||
println() |
|||
println("The title of this file is: " + ctx.get_metadata("title")) |
|||
``` |
|||
```` |
|||
|
|||
```python {.run} |
|||
appendChild(pf.Para(pf.Strong(pf.Str("foo")))) |
|||
``` |
|||
|
|||
### Defining and running commands |
|||
|
|||
Code blocks can be also saved and executed later. Defining is done using the |
|||
`define` attribute: |
|||
|
|||
**Example:** |
|||
|
|||
```python {define=commandname} |
|||
print("foo") |
|||
``` |
|||
|
|||
If you try to define the same command twice, you will get an error. To redefine |
|||
a command, use the `define` attribute instead of `redefine`. |
|||
|
|||
### Running defined commands |
|||
|
|||
There are multiple ways of running commands. There is the shorthand way: |
|||
|
|||
[!commandname]{} |
|||
|
|||
Or using the `c` attribute on a span or a div: |
|||
|
|||
[Some content]{c=commandname} |
|||
|
|||
:::{c=commandname} |
|||
Some content |
|||
::: |
|||
|
|||
To access the content or attributes of the div or span the command has been |
|||
called on, the `element` variable is available, which contains the `panflute` |
|||
representation of the element. |
|||
|
|||
**Example:** |
|||
|
|||
```python {define=index} |
|||
appendChild(element.content[int(element.attributes["i"])]) |
|||
``` |
|||
|
|||
[Pick the third element from this span]{c=index i=2} |
|||
|
|||
### Direct metadata print |
|||
Metadata can be printed directly using a shorthand. The advantage of this is it |
|||
keeps the formatting from the metadatum's definition |
|||
|
|||
```markdown {.group} |
|||
--- |
|||
a: |
|||
b: some text with **strong** |
|||
--- |
|||
[$a.b]{} |
|||
``` |
|||
|
|||
### Syntax highlighting |
|||
Formátítko uses [pygments](https://pygments.org/) to highlight syntax in code |
|||
blocks. To turn it off for a single block, don't specify a language or set the |
|||
`highlight` attribute to `False`. You can also set the metadatum `highlight` to |
|||
`false` in the FrontMatter to disable it in a given Group. To change the [highlighting |
|||
style](https://pygments.org/styles/), you have to set the `highlight-style` |
|||
metadatum in the **top-level document** this is to prevent the need for many |
|||
inline style definitions. |
|||
|
|||
**Examples:** |
|||
```python |
|||
print("cool") |
|||
``` |
|||
|
|||
```zsh {highlight=False} |
|||
./formatitko.py README.md |
|||
``` |
|||
|
|||
### Language awareness |
|||
Formátítko is language aware, this means that the `language` metadatum is |
|||
somewhat special. When set using the front matter, it is also popped out to TeX |
|||
as a `\languagexx` macro. Currently supported values are `cs` and `en` for |
|||
internal uses but can be set to anything. |
|||
|
|||
### NBSP |
|||
Formátítko automatically inserts no-break spaces according to its sorta smart |
|||
rules. (See the `whitespace.py` file for more info) These rules **depend on the |
|||
chosen language**. (`cs` has some additional rules) |
|||
|
|||
To insert a literal no-break space, you can either insert the unicode no-break |
|||
space or use the html escape. |
|||
|
|||
Enforcing a breakable space is not as painless, you should insert a zero-width |
|||
space beside the normal​ space. |
|||
|
|||
### Smart quotes |
|||
Quotes get automatically converted to the slanted ones according to the current |
|||
language. |
|||
|
|||
**Examples:** |
|||
|
|||
```markdown {.group} |
|||
--- |
|||
language: cs |
|||
--- |
|||
"Uvozovky se v českém testu píší 'jinak' než v angličtině." |
|||
``` |
|||
|
|||
"In Czech texts, quotes are written 'differently' than in English" |
|||
|
|||
### Math |
|||
Math blocks get automatically converted to HTML using $Ka\TeX$ and fall out |
|||
unchanged into TeX output. |
|||
|
|||
To make KaTeX as consistent with TeX as possible, the `\begingroup` and |
|||
`\endgroup` that are produced by [Groups](#groups) are also emulated in the |
|||
KaTeX environment, so macro definitions should be isolated as you expect. |
|||
|
|||
### Images |
|||
|
|||
#### Figures |
|||
Pandoc's [implicit |
|||
figures](https://pandoc.org/MANUAL.html#extension-implicit_figures) are enabled, |
|||
so images which are alone in a paragraph are automatically converted to figures: |
|||
|
|||
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers (it defaults to the caption)"){width=10em} |
|||
|
|||
To prevent this, add a backslash at the end of the line with the image: |
|||
|
|||
![A single pixel image, wow!](test/1px.png "This is the alt text shown to screen readers"){width=10em}\ |
|||
|
|||
#### Image gathering |
|||
Images are automatically searched for in the directory where each markdown file is |
|||
(including partials) and also in directories listed in the `--lookup-dirs` |
|||
command line parameter. After processing, they're all put into the folder |
|||
specified with `--public-dir`. |
|||
|
|||
#### Image processing |
|||
Images are automatically processed so that they can be successfully used in both |
|||
output formats. This includes generating multiple sizes and providing a |
|||
[srcset](https://developer.mozilla.org/en-US/docs/Learn/HTML/Multimedia_and_embedding/Responsive_images). |
|||
|
|||
To customize this, the `file-width`, `file-height`, `file-dpi`, `file-quality` |
|||
and `no-srcset` attributes are available. All but the last one should be |
|||
integers. |
|||
|
|||
Keep in mind that the processing tries to be as lazy as possible, so it never |
|||
overwrites any files and if it finds the right format or resolution (only |
|||
judging by the filenames) in the lookup directories it will just copy that. This |
|||
means that any automatic attempts at conversion can be overridden by converting |
|||
the file yourself, naming it accordingly and placing it either in the public or |
|||
one of the lookup directories. |
|||
|
|||
## Working with the produced output |
|||
|
|||
### HTML |
|||
The HTML should be almost usable as-is. The styles for synstax-highlighting are |
|||
added automatically. The styles for KaTeX however are not and should be added in |
|||
your `<head>`^[This is taken directly from [KaTeX's docs](https://katex.org/docs/browser.html)]: |
|||
|
|||
```html |
|||
<link rel='stylesheet' href='https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css' integrity='sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0' crossorigin='anonymous'> |
|||
``` |
|||
|
|||
Also the output HTML is not intended as a standalone file but should be included |
|||
as part of a larger template. (That includes a doctype, other css, etc.) |
|||
|
|||
### TeX |
|||
The TeX output is not usable as is. Many of the elements are just converted to |
|||
macros, which you have to define yourself. There is an example implementation in |
|||
`formatitko.tex`, which uses LuaTeX and the ucwmac package, but you should |
|||
customize it to your needs (and to the context in which the output is used). |
@ -1,8 +0,0 @@ |
|||
from panflute import Block |
|||
from typing import Dict |
|||
|
|||
class Group(Block): |
|||
def __init__(self, *args, identifier='', classes=[], attributes={}, metadata={}): |
|||
self._set_ica(identifier, classes, attributes) |
|||
self._set_content(args, Block) |
|||
self.metadata = metadata |
@ -0,0 +1,4 @@ |
|||
Pygments==2.14.0 |
|||
panflute==2.3.0 |
|||
fontTools==4.38.0 |
|||
Pillow==9.4.0 |
@ -1,162 +0,0 @@ |
|||
--- |
|||
title: 'Wooooo a title' |
|||
subtitle: 'A subtitle' |
|||
are_we_there_yet: False |
|||
language: "en" |
|||
--- |
|||
[#test-import.md]{} |
|||
|
|||
# Hello world! |
|||
|
|||
This is an *example* **yay**! |
|||
|
|||
This is *very **strongly** emphasised* |
|||
|
|||
Příliš žluťoučký kůň pěl dábelské ódy. *Příliš žluťoučký kůň pěl dábelské ódy.* **Příliš žluťoučký kůň pěl dábelské ódy.** ***Příliš žluťoučký kůň pěl dábelské ódy.*** |
|||
|
|||
|
|||
:::{partial=test-partial.md} |
|||
::: |
|||
|
|||
:::{if=cat} |
|||
This should only be shown to cats |
|||
::: |
|||
|
|||
|
|||
```python {.run} |
|||
ctx.set_flag("cat", True) |
|||
``` |
|||
|
|||
```python {.run} |
|||
println(f"The main document's title is '{ctx.get_metadata('title')}'") |
|||
ctx.set_metadata("a", {}) |
|||
ctx.set_metadata("a.b", {}) |
|||
ctx.set_metadata("a.b.c", "Bruh **bruh** bruh") |
|||
``` |
|||
|
|||
```python {style=native} |
|||
def bruh(no): |
|||
wat |
|||
``` |
|||
|
|||
Inline `code` |
|||
|
|||
::::{if=cat} |
|||
This should only be shown to cats the second time |
|||
:::: |
|||
|
|||
# [$are_we_there_yet]{} |
|||
|
|||
![This is a figure, go figure...](/tmp/logo.pdf) |
|||
|
|||
![This is a figure, go figure...](/tmp/logo.jpg){width=10em} |
|||
|
|||
![This is a figure, go figure...](/tmp/logo.png){width=10em} |
|||
|
|||
![Fakt epesní reproduktor](/tmp/reproduktor.jpeg){width=10em} |
|||
|
|||
```python {.run} |
|||
ctx.set_metadata("language", "cs") |
|||
``` |
|||
[!opendatatask]{} |
|||
```python {.run} |
|||
ctx.set_metadata("language","en") |
|||
``` |
|||
[This too!]{if=cat} |
|||
|
|||
[What]{.co} |
|||
|
|||
[An inline command with contents and **bold** and another [!nop]{} inside!]{c=nop} |
|||
|
|||
[!nop]{a=b}<!-- A special command! WOW --> |
|||
|
|||
> OOO a blockquote mate init |
|||
> |
|||
>> Nesting?? |
|||
>> Woah |
|||
|
|||
A non-breakable space bro |
|||
|
|||
A lot of spaces |
|||
|
|||
A text with some inline math: $\sum_{i=1}^nn^2$. Plus some display math: |
|||
|
|||
A link with the link in the link: <https://bruh.com> |
|||
|
|||
H~2~O is a liquid. 2^10^ is 1024. |
|||
|
|||
[Underline]{.underline} |
|||
|
|||
:::{only=html} |
|||
$$ |
|||
\def\eqalign#1{\begin{align*}#1\end{align*}} |
|||
$$ |
|||
::: |
|||
|
|||
$$ |
|||
\eqalign{ |
|||
2 x_2 + 6 x_3 &= 14 \cr |
|||
x_1 - 3 x_2 + 2 x_3 &= 5 \cr |
|||
-x_1 + 4 x_2 + \phantom{1} x_3 &= 2 |
|||
} |
|||
$$ |
|||
|
|||
:::{partial=test-partial.md} |
|||
::: |
|||
|
|||
--- |
|||
|
|||
This should be seen by all.^[This is a footnote] |
|||
|
|||
| Matematicko-fyzikální fakulta University Karlovy |
|||
| Malostranské nám. 2/25 |
|||
| 118 00 Praha 1 |
|||
|
|||
More footnotes.^[I am a foot] |
|||
|
|||
To Do: |
|||
|
|||
- buy eggs |
|||
- buy milk |
|||
- ??? |
|||
- profit |
|||
- also create sublists preferrably |
|||
|
|||
1. Woah |
|||
2. Wooo |
|||
3. no |
|||
|
|||
4) WOO |
|||
|
|||
``` {=html} |
|||
<figure> |
|||
<video src="woah.mp4" autoplay></video> |
|||
<figcaption> This is indeed a video </figcaption> |
|||
</figure> |
|||
``` |
|||
|
|||
#. brum |
|||
#. BRUHHH |
|||
#. woah |
|||
|
|||
i. bro |
|||
ii. wym bro |
|||
|
|||
|
|||
+---------------------+-----------------------+ |
|||
| Location | Temperature 1961-1990 | |
|||
| | in degree Celsius | |
|||
+---------------------+-------+-------+-------+ |
|||
| | min | mean | max | |
|||
+=====================+=======+=======+======:+ |
|||
| Antarctica | -89.2 | N/A | 19.8 | |
|||
+---------------------+-------+-------+-------+ |
|||
| Earth | -89.2 | 14 | 56.7 | |
|||
+---------------------+-------+-------+-------+ |
|||
|
|||
------- ------ ---------- ------- |
|||
12 12 12 12 |
|||
123 123 123 123 |
|||
1 1 1 1 |
|||
------- ------ ---------- ------- |
|||
|
After Width: | Height: | Size: 311 B |
@ -0,0 +1 @@ |
|||
|
@ -0,0 +1,8 @@ |
|||
--- |
|||
title: "I am a little evil md file hehe" |
|||
--- |
|||
```python {.run} |
|||
import sys |
|||
sys.exit(666) |
|||
``` |
|||
I am very innocent wym bro :( |
@ -1,23 +1,25 @@ |
|||
from panflute import Element, Block, Inline, Null, Str, Doc, convert_text, Para, Plain |
|||
import re |
|||
|
|||
# It sometimes happens that an element contains a single paragraph or even a |
|||
# single plaintext line. It can be sometimes useful to extract this single |
|||
# paragraph, which is inline. |
|||
def inlinify(e: Element) -> Element: |
|||
if len(e.content) == 1 and (isinstance(e.content[0], Para) or isinstance(e.content[0], Plain)): |
|||
return e.content[0].content |
|||
|
|||
def replaceEl(e: Element, r: Element) -> Element: |
|||
parent = e.parent |
|||
parent.content[e.index] = r |
|||
r.parent = parent |
|||
return r |
|||
def deleteEl(e: Element): |
|||
del e.parent.content[e.index] |
|||
|
|||
# In transform, inline elements cannot be replaced with Block ones and also |
|||
# cannot be removed from the tree entirely, because that would mess up the |
|||
# iteration process through the tree. We replace them with null elements |
|||
# instead which never make it to the output. |
|||
def nullify(e: Element): |
|||
if isinstance(e, Inline): |
|||
return Str("") |
|||
elif isinstance(e, Block): |
|||
return Null() |
|||
|
|||
# A helper function to import markdown using panflute (which calls pandoc). If |
|||
# we ever want to disable or enable some of panflute's markdown extensions, |
|||
# this is the place to do it. |
|||
def import_md(s: str, standalone: bool=True) -> Doc: |
|||
return convert_text(s, standalone=standalone) |
|||
return convert_text(s, standalone=standalone, input_format="markdown-definition_lists-citations") |
|||
|
Loading…
Reference in new issue