Regular Expressions and Finite Automata
**Regular Expressions (Regex)** and **Finite Automata** are foundational concepts in computer science used for pattern matching and text processing. In Wikantik, these tools are used extensively for parsing Markdown, scanning links, and validating metadata.
Finite Automata
A Finite Automaton is a mathematical model of computation. It consists of a finite number of states and transitions between them based on input symbols.
- **Deterministic (DFA):** For every state and input, there is exactly one transition.
- **Non-deterministic (NFA):** Can have multiple transitions for the same input or "epsilon" transitions (without input).
Regular Expressions
Regex is a formal language used to describe sets of strings. Every regular expression can be converted into an equivalent Finite Automaton (and vice versa).
Common Syntax
- `.` (Any character)
- `*` (Zero or more)
- `+` (One or more)
- `[a-z]` (Character class)
- `^` / `$` (Anchors)
Applications in Wikantik
1. Markdown Parsing
The **Flexmark** parser used by Wikantik uses complex regular expressions to identify headings, bold text, links, and code blocks within the Markdown source.
2. Link Scanning
The `MarkdownLinkScanner` (in `wikantik-api`) uses regex to find internal wiki links (e.g., `[PageName](PageName)`) and external URLs.
- **Example:** `\[\[([^|\]]+)(?:\|([^\]]+))?\]\]` matches standard wiki brackets with optional display text.
3. Frontmatter Validation
Wikantik uses regex to validate the format of mandatory fields like `canonical_id` (ensuring it is a 26-character ULID) and `date`.
4. Search and Retrieval
While the primary search is BM25-based, regex can be used in administrative tools to perform "power searches" across the corpus for specific patterns or legacy JSPWiki constructs.
See Also
- [Markdown Links](MarkdownLinks) — The syntax powered by these patterns.
- [Frontmatter Conventions](FrontmatterConventions) — How metadata is validated.
- [Search and Retrieval](WikantikSearchAndRetrieval) — The broader context of finding information.