No description

Find a file

Jiri Kalvoda 5a61ea5fac UCWTeX: Alternativní figures		2024-05-11 11:50:04 +02:00
ksp	Koncept jak by mohla vypadat KSPí implementace. #8	2023-07-21 15:02:15 +02:00
src/formatitko	UCWTeX: Alternativní figures	2024-05-11 11:50:04 +02:00
test	Merge remote-tracking branch 'origin/master' into jk-bakalarka	2024-02-25 18:02:59 +01:00
ucwmac@18104ac1a8	Miscellaneous bugfixes, a huge TeX mess.	2023-02-04 18:49:24 +01:00
.editorconfig	Přidán .editorconfig, aby editory nedávaly mezery, kam nepatří. :)	2023-07-18 18:05:59 +02:00
.gitignore	#1 : Zabalíčkování katex-serveru	2023-07-18 17:15:09 +02:00
.gitmodules	OK the submodule was broken.	2024-02-20 18:27:34 +01:00
LICENSE	Add LICENSE	2023-06-07 12:46:55 +00:00
pyproject.toml	Odstranění divého tabu	2023-08-18 20:26:28 +02:00
README.md	Updated README #39 .	2023-09-20 23:58:30 +02:00
requirements.txt	A large set of changes + README + comments everywhere.	2023-02-07 02:47:48 +01:00

README.md

language	highlight-style
en	native

Formátítko 2.0

A python program based on pandoc and its python library panflute for converting from markdown to TeX and HTML with added fancy features like image processing, python-based macros and much more.

Requirements

This project requires panflute 2.3.0 that itself requires pandoc 3.0. If the version of pandoc doesn't match, very weird things can happen. ImageMagick and Inkscape are used for image processing. Nodejs is used for KaTeX.

Usage

usage: formatitko [-h] [-l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]] [-p IMG_PUBLIC_DIR] [-c IMG_CACHE_DIR] [-i IMG_WEB_PATH] [-w OUTPUT_HTML] [-t OUTPUT_TEX] [-m OUTPUT_MD]
                  [-j OUTPUT_JSON] [--katex-server] [-k KATEX_SOCKET] [--debug]
                  input_filename

positional arguments:
  input_filename        The markdown file to process.

options:
  -h, --help            show this help message and exit
  -l IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...], --img-lookup-dirs IMG_LOOKUP_DIRS [IMG_LOOKUP_DIRS ...]
                        Image lookup directories. When processing images, the program will try to find the image in them first. Always looks for images in the same folder
                        as the markdown file. (default: [])
  -p IMG_PUBLIC_DIR, --img-public-dir IMG_PUBLIC_DIR
                        Directory to put processed images into. The program will overwrite images, whose dependencies are newer. (default: public)
  -c IMG_CACHE_DIR, --img-cache-dir IMG_CACHE_DIR
                        Directory to cache processed images and intermediate products. The program will overwrite files, whose dependencies are newer. (default: cache)
  -i IMG_WEB_PATH, --img-web-path IMG_WEB_PATH
                        Path where the processed images are available on the website. (default: /)
  -w OUTPUT_HTML, --output-html OUTPUT_HTML
                        The HTML file (for Web) to write into. (default: None)
  -t OUTPUT_TEX, --output-tex OUTPUT_TEX
                        The TEX file to write into. (default: None)
  -m OUTPUT_MD, --output-md OUTPUT_MD
                        The Markdown file to write into. (Uses pandoc to generate markdown) (default: None)
  -j OUTPUT_JSON, --output-json OUTPUT_JSON
                        The JSON file to dump the pandoc-compatible AST into. (default: None)
  --katex-server        Starts a KaTeX server and prints the socket filename onto stdout. Useful for running formatitko many times without starting the KaTeX server each
                        time. (default: False)
  -k KATEX_SOCKET, --katex-socket KATEX_SOCKET
                        The KaTeX server socket filename obtained by running with `--katex-server`. (default: None)
  --debug

Format

Formátítko uses all the default pandoc markdown extensions except for definition lists and citations. It also adds its own custom features.

Features

Hiding and showing elements based on flags

Flags can be set in the Front Matter or with python code. Then, elements with the if attribute will only be shown if the flag is set to True and elements with the ifn attribute will only be show if the flag is not set to True.

Example:

---
flags:
  foo: true
---
[This will be shown]{if=foo}

[This will not be shown]{if=bar}

[This will be shown]{ifn=bar}

Including other files

There are two ways of including files.

Importing

The first is importing, which only takes the state (defined commands, metadata, etc.) from the file and any content is omitted. This is useful for creating libraries of commands.

There are three types of imports:

Python Module (the default)

[#ksp_formatitko as ksp]{}

[#ksp_formatitko]{}

with an optional type=module in the curly brackets, tries to import a python module as a set of formatitko commands. See below for more details about commands.

JSON Metadata

[#test/test.json]{type=metadata key=orgs}

This will import metadata from a JSON file. THe optional key argument sets the key under which the whole JSON file will be placed. Dictionaries are merged, others overwritten.

Partials

Partials are the very opposite of imports, they have their own context, which inherits everything from the context they're included in, but gets reset after the file ends.

:::{partial=test/empty.md} :::

If the untrusted attribute is set to True, the partial and all its children will not be able to define commands or run inline blocks (but it will be able to run commands defined in the parent). ^[Please don't trust this for any security though, we're playing with eval fire, this will never be secure.]

You can also import raw HTML and TeX if you set the type attribute of the partial to tex or html.

Groups

Groups are pieces of markdown with their own sandboxed context, in other words, inline partials. Syntax-wise they are pandoc Divs with the .group class. All attributes of the Div will be passed down as metadata to the group.

::: {.group lang=cs} OOOoo český mód :::

If you want to have more fancy metadata, that can only be specified in a front matter, you can use the following syntax:

---
lang: cs
---
OOOoo český mód

If you need to nest groups or have code blocks inside groups, you can increase the amount of backticks around the outer block:

```go
fmt.Pritln("owo")
```

Note however, that when this syntax is used, pandoc is executed for each of these blocks which could get slow. Using divs is preferred.

Groups and partials are also enclosed in \begingroup and \endgroup in the output TeX.

Raw HTML and TeX ^[This is a pandoc feature]

If raw HTML or TeX is included in the markdown file, it will automagically pop out into the respective output file.

red text

\vskip1em

This has the advantage and disadvantage of being very "automagic", which means that for instance markdown inside HTML will still get interpreted as markdown. It is however very very unreliable, so in most cases, you should use explicit raw blocks with the unnamed attribute set to either html or tex. ^[Still a pandoc feature.]

<span style="color: red">red text</span>

Running python code

Formátítko allows you to run Python code directly from your MD file. Any python code block with the class run will be executed.

Command environment

The commands will be executed as functions with the following signature:

def command(element: Command, context: Context) -> list[Element]:

some global variables may be available, and are defined in command_env.py:

import panflute as pf
import formatitko.elements as fe
from formatitko.util import import_md_list
from formatitko.util import parse_string

from formatitko.context import Context
from formatitko.command import Command
from panflute import Element

`element` parameter

The element parameter holds the element the command is currently being executed on. In the case of running python blocks directly, it is probably not interesting but will get interesting later.

`context` parameter

You can access the current context using the context parameter. The context provides read/write access to the FrontMatter metadata. The context has the following methods:

context.get_metadata(key: str, simple: bool=True, immediate: bool=False)

key: The key of the metadatum you want to get. Separate child keys with dots: ctx.get_metadata("flags.foo")
simple: Whether to use python's simple builtin types or panflute's MetaValues. MetaValues can contain formatted text, simple values loose all formatting.
immediate: Only get metadatum from the current context, not from its parents.

context.set_metadata(key: str, value)

key: The key of the metadatum you want to get. Separate child keys with dots: ctx.get_metadata("flags.foo")
value: Any value you want to assign to the metadatum

context.unset_metadata(key: str)

Delete the metadatum in the current context and allow it to inherit the value from the parent context.

key: The key of the metadatum you want to get. Separate child keys with dots: ctx.get_metadata("flags.foo")

Helper functions for flags exist which work the same as for metadata:

context.is_flag_set(flag: str) -> bool

context.set_flag(flag: str, val: bool)

context.unset_flag(flag: str)

There are also other useful functions, which you can see for yourself in context.py.

WARNING: Writing to metadata should only be done at the beginning of the document or a group (before any printable content). Writing to metadata in other places in the document might cause undefined behaviour (mostly some elements might behave as if the metadata was set elsewhere).

Return value

The function must return a list of valid Elements. This list may be empty. These elements will be placed in the document in the location where the command was invoked.

The parse_string function might be useful, it turns a simple string into a list of panflute's Strs and Spaces (without any formatting). If you want to use markdown in your function output, you have to convert it yourself using import_md but beware this calls pandoc, is potentially slow and is discouraged.

Examples:

---
title: Foo
---
```python {.run}
return [
	pf.Para(pf.Emph(pf.Str("wooo"))),
	pf.Para(*parse_string("The title of this file is: " + context.get_metadata("title")))
]
```

return [pf.Strong(*parse_string("Hello world!"))]

Defining and running commands

Code blocks can be also saved and executed later. Defining is done using the define attribute:

Example:

return [pf.Str("foo")]

If you try to define the same command twice, you will get an error. To redefine a command, use the define attribute instead of redefine.

Running defined commands

There are multiple ways of running commands. There is the shorthand way:

[!commandname]{}

Or using the c attribute on a span or a div (new: or a codeblock!):

[Some content]{c=commandname}

:::{c=commandname} Some content :::

import subprocess
c = subprocess.run(["bash", "-c", element.text], stdout=subprocess.PIPE, check=True, encoding="utf-8")
return [pf.Para(pf.Str(c.stdout))]

cat /etc/hostname

To access the content or attributes of the div or span the command has been called on, the element variable is available, which contains the panflute representation of the element.

Example:

return [element.content[int(element.attributes["i"])]]

[Pick the third element from this span]{c=index i=2}

Direct metadata print

Metadata can be printed directly using a shorthand. The advantage of this is it keeps the formatting from the metadatum's definition

---
a:
  b: some text with **strong**
---
[$a.b]{}

Syntax highlighting

Formátítko uses pygments to highlight syntax in code blocks. To turn it off for a single block, don't specify a language or set the highlight attribute to False. You can also set the metadatum highlight to false in the FrontMatter to disable it in a given Group. To change the highlighting style, you have to set the highlight-style metadatum or the style attribute directly on the element.

Examples:

print("cool")

print("freezing")

./formatitko.py README.md

Language awareness

Formátítko is language aware, this means that the lang metadatum is somewhat special. (It is also special for pandoc)

NBSP

Formátítko automatically inserts no-break spaces according to its sorta smart rules. (See the whitespace.py file for more info) These rules depend on the chosen language. (cs has some additional rules)

To insert a literal no-break space, you can either insert the unicode no-break space or use the html escape.

Enforcing a breakable space is not as painless, you should insert a zero-width space beside the normal space.

Smart quotes

Quotes get automatically converted to the slanted ones according to the current language.

Examples:

::: {.group lang=cs} "Uvozovky se v českém testu píší 'jinak' než v angličtině." :::

"In Czech texts, quotes are written 'differently' than in English"

Math

Math blocks get automatically converted to HTML using Ka\TeX and fall out unchanged into TeX output.

To make KaTeX as consistent with TeX as possible, the \begingroup and \endgroup that are produced by Groups are also emulated in the KaTeX environment, so macro definitions should be isolated as you expect.

Images

Figures

Pandoc's implicit figures are enabled, so images which are alone in a paragraph are automatically converted to figures:

{width=10em}

To prevent this, add a backslash at the end of the line with the image:

{width=10em}\

Image gathering

Images are automatically searched for in the directory where each markdown file is (including partials) and also in directories listed in the --lookup-dirs command line parameter. After processing, they're all put into the folder specified with --public-dir.

Formátítko also does dependency management, which means that all images will be regenerated only when their dependencies are newer.

Image processing

Images are automatically processed so that they can be successfully used in both output formats. This includes generating multiple sizes and providing a srcset.

To customize this, the file-width, file-height, file-dpi, file-quality and no-srcset attributes are available. All but the last one should be integers.

Processing also includes Asymptote images -- you can simply include an asymptote program as an image and formátítko handles the rest for you.

Content headers and footers

If you want formatitko to generate fully formed html files for you, you might want to add a HTML partial with the starting tags and <head>. This would normally not work, because the entire document is wrapped with <main>. Using the special .header_content and .footer_content classes of divs, you can append content to a header and footer, which are popped to the output before and after the document.

:::: {.header_content} ::: {partial="test/test-top.html" type="html"} ::: ::::

Working with the produced output

HTML

The HTML should be almost usable as-is. The styles for synstax-highlighting are added automatically. The styles for KaTeX however are not and should be added in your <head>^[This is taken directly from KaTeX's docs]:

<link rel='stylesheet' href='https://cdn.jsdelivr.net/npm/katex@0.16.4/dist/katex.min.css' integrity='sha384-vKruj+a13U8yHIkAyGgK1J3ArTLzrFGBbBc0tDp4ad/EyewESeXE/Iv67Aj8gKZ0' crossorigin='anonymous'>

You can see how this is done in test/test.md

TeX

The TeX output is not usable as is. Many of the elements are just converted to macros, which you have to define yourself. There is an example implementation in formatitko.tex, which uses LuaTeX and the ucwmac package, but you should customize it to your needs (and to the context in which the output is used).

More examples

More usage examples can be found (even though a bit chaotically) in the test directory.

README.md Unescape Escape

Formátítko 2.0

Requirements

Usage

Format

Features

Hiding and showing elements based on flags

Including other files

Importing

Python Module (the default)

JSON Metadata

Partials

Groups

Raw HTML and TeX ^[This is a pandoc feature]

Running python code

Command environment

element parameter

context parameter

Return value

Defining and running commands

Running defined commands

Direct metadata print

Syntax highlighting

Language awareness

NBSP

Smart quotes

Math

Images

Figures

Image gathering

Image processing

Content headers and footers

Working with the produced output

HTML

TeX

More examples

README.md

`element` parameter

`context` parameter