Compare commits

...

2 commits

Author SHA1 Message Date
fa2cf0a5cc Fixed html escape of code blocks. 2023-09-20 23:58:41 +02:00
9fa0cb2582 Updated README #39. 2023-09-20 23:58:30 +02:00
3 changed files with 170 additions and 60 deletions

209
README.md
View file

@ -16,7 +16,9 @@ Inkscape are used for image processing. Nodejs is used for KaTeX.
## Usage
```
usage: formatitko.py [-h] [-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]] [-p IMG_PUBLIC_DIR] [-i IMG_WEB_PATH] [-w OUTPUT_HTML] [-t OUTPUT_TEX] input_filename
usage: formatitko [-h] [-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]] [-p IMG_PUBLIC_DIR] [-c IMG_CACHE_DIR] [-i IMG_WEB_PATH] [-w OUTPUT_HTML] [-t OUTPUT_TEX] [-m OUTPUT_MD]
[-j OUTPUT_JSON] [--katex-server] [-k KATEX_SOCKET] [--debug]
input_filename
positional arguments:
input_filename The markdown file to process.
@ -24,16 +26,27 @@ positional arguments:
options:
-h, --help show this help message and exit
-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...], --img-lookup-dirs IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]
Image lookup directories. When processing images, the program will try to find the image in them first. Always looks for images in the same folder as the markdown
file. (default: [])
Image lookup directories. When processing images, the program will try to find the image in them first. Always looks for images in the same folder
as the markdown file. (default: [])
-p IMG_PUBLIC_DIR, --img-public-dir IMG_PUBLIC_DIR
Directory to put processed images into. The program will not overwrite existing images. (default: public)
Directory to put processed images into. The program will overwrite images, whose dependencies are newer. (default: public)
-c IMG_CACHE_DIR, --img-cache-dir IMG_CACHE_DIR
Directory to cache processed images and intermediate products. The program will overwrite files, whose dependencies are newer. (default: cache)
-i IMG_WEB_PATH, --img-web-path IMG_WEB_PATH
Path where the processed images are available on the website. (default: /)
-w OUTPUT_HTML, --output-html OUTPUT_HTML
The HTML file (for Web) to write into. (default: output.html)
The HTML file (for Web) to write into. (default: None)
-t OUTPUT_TEX, --output-tex OUTPUT_TEX
The TEX file to write into. (default: output.tex)
The TEX file to write into. (default: None)
-m OUTPUT_MD, --output-md OUTPUT_MD
The Markdown file to write into. (Uses pandoc to generate markdown) (default: None)
-j OUTPUT_JSON, --output-json OUTPUT_JSON
The JSON file to dump the pandoc-compatible AST into. (default: None)
--katex-server Starts a KaTeX server and prints the socket filename onto stdout. Useful for running formatitko many times without starting the KaTeX server each
time. (default: False)
-k KATEX_SOCKET, --katex-socket KATEX_SOCKET
The KaTeX server socket filename obtained by running with `--katex-server`. (default: None)
--debug
```
## Format
@ -69,12 +82,28 @@ There are two ways of including files.
#### Importing
The first is importing, which only takes the state (defined commands, metadata,
etc.) from the file and any content is omitted. This is useful for creating
libraries of commands. The syntax is as follows:
libraries of commands.
[#test/empty.md]{}
There are three types of imports:
The curly braces are required for pandoc to parse the import properly and should
be left empty.
##### Python Module (the default)
```markdown
[#ksp_formatitko as ksp]{}
```
or
```markdown
[#ksp_formatitko]{}
```
with an optional `type=module` in the curly brackets, tries to import a python
module as a set of formatitko commands. See below for more details about
commands.
##### JSON Metadata
[#test/test.json]{type=metadata key=orgs}
This will import metadata from a JSON file. THe optional `key` argument sets the
key under which the whole JSON file will be placed. Dictionaries are merged,
others overwritten.
#### Partials
Partials are the very opposite of imports, they have their own context, which
@ -95,12 +124,19 @@ partial to `tex` or `html`.
### Groups
Groups are pieces of markdown with their own sandboxed context, in other words,
inline partials. They function exactly the same as partials, namely can have
their own front matter.
inline partials. Syntax-wise they are pandoc Divs with the `.group` class. All
attributes of the Div will be passed down as metadata to the group.
::: {.group lang=cs}
OOOoo český mód
:::
If you want to have more fancy metadata, that can only be specified in a front
matter, you can use the following syntax:
```markdown {.group}
---
language: cs
lang: cs
---
OOOoo český mód
```
@ -114,6 +150,9 @@ fmt.Pritln("owo")
```
````
Note however, that when this syntax is used, pandoc is executed for each of
these blocks which could get slow. Using divs is preferred.
Groups and partials are also enclosed in `\begingroup` and `\endgroup` in the
output TeX.
@ -138,15 +177,38 @@ pandoc feature.]
### Running python code
Formátítko allows you to run Python code directly from your MD file. Any
`python` code block with the class `run` will be executed:
`python` code block with the class `run` will be executed.
#### Context
#### Command environment
You can access the current context using the `ctx` variable. The context
The commands will be executed as functions with the following signature:
```python
def command(element: Command, context: Context) -> list[Element]:
```
some global variables may be available, and are defined in `command_env.py`:
```python
import panflute as pf
import formatitko.elements as fe
from formatitko.util import import_md_list
from formatitko.util import parse_string
from formatitko.context import Context
from formatitko.command import Command
from panflute import Element
```
##### `element` parameter
The `element` parameter holds the element the command is currently being executed
on. In the case of running python blocks directly, it is probably not
interesting but will get interesting later.
##### `context` parameter
You can access the current context using the `context` parameter. The context
provides read/write access to the FrontMatter metadata. The context has the
following methods:
`ctx.get_metadata(key: str, simple: bool=True, immediate: bool=False)`
`context.get_metadata(key: str, simple: bool=True, immediate: bool=False)`
- `key`: The key of the metadatum you want to get. Separate child keys with
dots: `ctx.get_metadata("flags.foo")`
@ -156,13 +218,13 @@ following methods:
- `immediate`: Only get metadatum from the current context, not from its
parents.
`ctx.set_metadata(key: str, value)`
`context.set_metadata(key: str, value)`
- `key`: The key of the metadatum you want to get. Separate child keys with
dots: `ctx.get_metadata("flags.foo")`
- `value`: Any value you want to assign to the metadatum
`ctx.unset_metadata(key: str)`
`context.unset_metadata(key: str)`
Delete the metadatum in the current context and allow it to inherit the value
from the parent context.
@ -172,26 +234,31 @@ from the parent context.
Helper functions for flags exist which work the same as for metadata:
`ctx.is_flag_set(flag: str) -> bool`
`context.is_flag_set(flag: str) -> bool`
`ctx.set_flag(flag: str, val: bool)`
`context.set_flag(flag: str, val: bool)`
`ctx.unset_flag(flag: str)`
`context.unset_flag(flag: str)`
#### Writing output
There are also other useful functions, which you can see for yourself in
`context.py`.
There are two modes of writing output, plaintext and element-based.
> **WARNING**: Writing to metadata should **only** be done **at the beginning**
> of the document or a group (before any printable content). Writing to metadata
> in other places in the document might cause undefined behaviour (mostly some
> elements might behave as if the metadata was set elsewhere).
Plaintext mode uses the `print(text: str)` and `println(text: str)` functions,
that append text to a buffer which is then interpreted as markdown input.
##### Return value
The function **must** return a list of valid Elements. This list may be empty.
These elements will be placed in the document in the location where the command
was invoked.
Element-based mode uses the `appendChild(element: pf.Element)` and
`appendChildren(*elements: List[pf.Element])` functions which allow you to
append `panflute` elements to a list which is then again interpreted as input.
The `panflute` library is available as `pf`.
The `parse_string` function might be useful, it turns a simple string into a
list of panflute's `Str`s and `Space`s (without any formatting). If you want to
use markdown in your function output, you have to convert it yourself using
`import_md` but beware this calls pandoc, is potentially slow and is
discouraged.
When one of these functions is called, the mode is set and functions from the
other mode cannot be called within the same block of code.
**Examples:**
@ -200,14 +267,15 @@ other mode cannot be called within the same block of code.
title: Foo
---
```python {.run}
println("*wooo*")
println()
println("The title of this file is: " + ctx.get_metadata("title"))
return [
pf.Para(pf.Emph(pf.Str("wooo"))),
pf.Para(*parse_string("The title of this file is: " + context.get_metadata("title")))
]
```
````
```python {.run}
appendChild(pf.Para(pf.Strong(pf.Str("foo"))))
return [pf.Strong(*parse_string("Hello world!"))]
```
### Defining and running commands
@ -218,7 +286,7 @@ Code blocks can be also saved and executed later. Defining is done using the
**Example:**
```python {define=commandname}
print("foo")
return [pf.Str("foo")]
```
If you try to define the same command twice, you will get an error. To redefine
@ -230,7 +298,7 @@ There are multiple ways of running commands. There is the shorthand way:
[!commandname]{}
Or using the `c` attribute on a span or a div:
Or using the `c` attribute on a span or a div (new: or a codeblock!):
[Some content]{c=commandname}
@ -238,6 +306,16 @@ Or using the `c` attribute on a span or a div:
Some content
:::
```python {define=bash}
import subprocess
c = subprocess.run(["bash", "-c", element.text], stdout=subprocess.PIPE, check=True, encoding="utf-8")
return [pf.Para(pf.Str(c.stdout))]
```
```bash {c=bash}
cat /etc/hostname
```
To access the content or attributes of the div or span the command has been
called on, the `element` variable is available, which contains the `panflute`
representation of the element.
@ -245,7 +323,7 @@ representation of the element.
**Example:**
```python {define=index}
appendChild(element.content[int(element.attributes["i"])])
return [element.content[int(element.attributes["i"])]]
```
[Pick the third element from this span]{c=index i=2}
@ -268,23 +346,24 @@ blocks. To turn it off for a single block, don't specify a language or set the
`highlight` attribute to `False`. You can also set the metadatum `highlight` to
`false` in the FrontMatter to disable it in a given Group. To change the [highlighting
style](https://pygments.org/styles/), you have to set the `highlight-style`
metadatum in the **top-level document** this is to prevent the need for many
inline style definitions.
metadatum or the `style` attribute directly on the element.
**Examples:**
```python
print("cool")
```
```python {style=manni}
print("freezing")
```
```zsh {highlight=False}
./formatitko.py README.md
```
### Language awareness
Formátítko is language aware, this means that the `language` metadatum is
somewhat special. When set using the front matter, it is also popped out to TeX
as a `\languagexx` macro. Currently supported values are `cs` and `en` for
internal uses but can be set to anything.
Formátítko is language aware, this means that the `lang` metadatum is
somewhat special. (It is also special for pandoc)
### NBSP
Formátítko automatically inserts no-break spaces according to its sorta smart
@ -303,12 +382,9 @@ language.
**Examples:**
```markdown {.group}
---
language: cs
---
::: {.group lang=cs}
"Uvozovky se v českém testu píší 'jinak' než v angličtině."
```
:::
"In Czech texts, quotes are written 'differently' than in English"
@ -339,6 +415,9 @@ Images are automatically searched for in the directory where each markdown file
command line parameter. After processing, they're all put into the folder
specified with `--public-dir`.
Formátítko also does dependency management, which means that all images will be
regenerated only when their dependencies are newer.
#### Image processing
Images are automatically processed so that they can be successfully used in both
output formats. This includes generating multiple sizes and providing a
@ -348,12 +427,22 @@ To customize this, the `file-width`, `file-height`, `file-dpi`, `file-quality`
and `no-srcset` attributes are available. All but the last one should be
integers.
Keep in mind that the processing tries to be as lazy as possible, so it never
overwrites any files and if it finds the right format or resolution (only
judging by the filenames) in the lookup directories it will just copy that. This
means that any automatic attempts at conversion can be overridden by converting
the file yourself, naming it accordingly and placing it either in the public or
one of the lookup directories.
Processing also includes Asymptote images -- you can simply include an asymptote
program as an image and formátítko handles the rest for you.
#### Content headers and footers
If you want formatitko to generate fully formed html files for you, you might
want to add a HTML partial with the starting tags and `<head>`. This would
normally not work, because the entire document is wrapped with `<main>`. Using
the special `.header_content` and `.footer_content` classes of divs, you can
append content to a header and footer, which are popped to the output before and
after the document.
:::: {.header_content}
::: {partial="test/test-top.html" type="html"}
:::
::::
## Working with the produced output
@ -366,11 +455,15 @@ your `<head>`^[This is taken directly from [KaTeX's docs](https://katex.org/docs
<link rel='stylesheet' href='https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css' integrity='sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0' crossorigin='anonymous'>
```
Also the output HTML is not intended as a standalone file but should be included
as part of a larger template. (That includes a doctype, other css, etc.)
You can see how this is done in `test/test.md`
### TeX
The TeX output is not usable as is. Many of the elements are just converted to
macros, which you have to define yourself. There is an example implementation in
`formatitko.tex`, which uses LuaTeX and the ucwmac package, but you should
customize it to your needs (and to the context in which the output is used).
## More examples
More usage examples can be found (even though a bit chaotically) in the test
directory.

View file

@ -1,4 +1,4 @@
from panflute import Cite, Emph, Image, LineBreak, Link, Math, Note, RawInline, SmallCaps, Str, Strikeout, Subscript, Superscript, Underline
from panflute import Cite, Code, Emph, Image, LineBreak, Link, Math, Note, RawInline, SmallCaps, Str, Strikeout, Subscript, Superscript, Underline
from panflute import BulletList, Citation, CodeBlock, Definition, DefinitionItem, DefinitionList, Header, HorizontalRule, LineBlock, LineItem, ListItem, Null, OrderedList, Para, Plain, RawBlock, TableBody, TableFoot, TableHead
from panflute import TableRow, TableCell, Caption, Doc
from panflute import ListContainer, Element
@ -37,7 +37,7 @@ class HTMLGenerator(OutputGenerator):
def escape_special_chars(self, text: str) -> str:
text = text.replace("&", "&amp;")
text = text.replace("<", "&lt;")
text = text.replace(">", "&rt;")
text = text.replace(">", "&gt;")
text = text.replace("\"", "&quot;")
text = text.replace("'", "&#39;")
# text = text.replace(" ", "&nbsp;") # Don't replace no-break spaces with HTML escapes, because we trust unicode?
@ -125,8 +125,13 @@ class HTMLGenerator(OutputGenerator):
result = highlight(e.text, lexer, formatter)
self.writeraw(result)
else:
e.text = self.escape_special_chars(e.text)
self.generate_simple_tag(e, tag="pre")
def generate_Code(self, e: Code):
e.text = self.escape_special_chars(e.text)
self.generate_simple_tag(e)
def generate_Image(self, e: Image):
url = e.url

View file

@ -56,6 +56,11 @@ def bruh(no):
wat
```
```python {style=dracula}
def bruhec(bruzek):
hah
```
Inline `code`
::::{if=cat}
@ -200,4 +205,11 @@ return [pf.Para(pf.Str(c.stdout))]
cat /etc/hostname
```
```html
<div>
hahahahaah
</div>
```
`<div>`