Wikantik Observability System Design
Overview
This document describes an open-source observability stack for Wikantik, leveraging the existing Log4j2 logging infrastructure. The design follows the three pillars of observability: **Logs**, **Metrics**, and **Traces**.
Architecture
```
┌─────────────────────────────────────────────────────────────────────────────┐
│ Observability Architecture │
└─────────────────────────────────────────────────────────────────────────────┘
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Wikantik │ │ Tomcat │ │ System Host │
│ (Log4j2) │ │ Access Logs │ │ (node_exporter)│
└────────┬─────────┘ └────────┬─────────┘ └────────┬─────────┘
│ │ │
│ JSON logs │ Combined format │ Metrics
▼ ▼ │
┌─────────────────────────────────────────────┐ │
│ Promtail │ │
│ (Log collector agent) │ │
└────────────────────┬────────────────────────┘ │
│ │
│ Push logs │
▼ ▼
┌─────────────────────────────────────────────────────────────────────────────┐
│ Grafana Stack │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Loki │ │ Prometheus │ │ Grafana │ │
│ │ (Log storage) │ │ (Metrics store) │ │ (Visualization) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
```
Components
1. Log Collection Layer
Promtail (Log Shipper)
- **Purpose**: Collects logs from files and ships to Loki.
- **Mechanism**: Watches log files, extracts labels, and pushes to Loki using a gRPC stream.
2. Storage Layer
Loki (Log Aggregation)
- **Purpose**: Horizontally-scalable, highly-available log aggregation.
- **Advantage**: Unlike Elasticsearch, Loki only indexes labels (not full text), making it lightweight and cost-effective.
Prometheus (Metrics)
- **Purpose**: Time-series metrics storage and alerting.
- **Standard**: Industry standard for metrics, with native multi-dimensional data models and query language (PromQL).
3. Visualization Layer
Grafana (Dashboards)
- **Purpose**: Unified dashboards for logs, metrics, and alerts. Provides a single pane of glass for heterogeneous data sources.
4. System Metrics
node_exporter (Host Metrics)
- **Purpose**: Exposes Linux system metrics (CPU, memory, disk, network) in Prometheus-compatible format.
-----
Configuration
Step 1: Log4j2 Configuration for JSON Output
Update Wikantik's Log4j2 configuration to output JSON for machine parsing.
- **File**: `/var/jspwiki/log4j2.properties`
```properties
status = warn
name = jspwiki-log4j2-observability
Console appender for container environments
appenders = console, rolling
appender.console.type = Console
appender.console.name = Console
appender.console.layout.type = JsonTemplateLayout
appender.console.layout.eventTemplateUri = classpath:EcsLayout.json
Rolling file appender with JSON format
appender.rolling.type = RollingFile
appender.rolling.name = RollingFile
appender.rolling.fileName = /var/log/jspwiki/jspwiki.log
appender.rolling.filePattern = /var/log/jspwiki/jspwiki-%d{yyyy-MM-dd}-%i.log.gz
appender.rolling.layout.type = JsonTemplateLayout
appender.rolling.layout.eventTemplateUri = classpath:EcsLayout.json
appender.rolling.policies.type = Policies
appender.rolling.policies.time.type = TimeBasedTriggeringPolicy
appender.rolling.policies.time.interval = 1
appender.rolling.policies.size.type = SizeBasedTriggeringPolicy
appender.rolling.policies.size.size = 50MB
appender.rolling.strategy.type = DefaultRolloverStrategy
appender.rolling.strategy.max = 14
rootLogger.level = info
rootLogger.appenderRef.console.ref = Console
rootLogger.appenderRef.rolling.ref = RollingFile
Specific loggers for key components
logger.wiki.name = com.wikantik
logger.wiki.level = info
logger.auth.name = com.wikantik.auth
logger.auth.level = info
logger.pages.name = com.wikantik.pages
logger.pages.level = info
```
Step 2: Tomcat Access Log Configuration
Configure Tomcat to output access logs in a parseable format, including Cloudflare headers for geographic data.
- **File**: `tomcat/conf/server.xml` (inside `<Host>` element)
```xml
<Valve className="org.apache.catalina.valves.AccessLogValve"
directory="/var/log/tomcat"
prefix="access"
suffix=".log"
pattern="%{CF-Connecting-IP}i %l %u %t "%r" %s %b %D %{CF-IPCountry}i %{User-Agent}i %{Referer}i" />
```
Step 3: Promtail Configuration
- **File**: `/etc/promtail/promtail.yaml`
```yaml
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /var/lib/promtail/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: jspwiki
static_configs:
- targets:
- localhost
labels:
job: jspwiki
app: jspwiki
path: /var/log/jspwiki/*.log
pipeline_stages:
- json:
expressions:
level: level
logger: logger_name
message: message
timestamp: "@timestamp"
thread: thread_name
- labels:
level:
logger:
- timestamp:
source: timestamp
format: RFC3339Nano
```
-----
Alerting Rules
Prometheus Alert Rules
- **File**: `/etc/prometheus/rules/jspwiki-alerts.yml`
```yaml
groups:
- name: jspwiki
rules:
- alert: DiskSpaceLow
expr: (1 - node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: "Disk space is running low"
description: "Disk usage is {{ $value }}%"
- alert: HighMemoryUsage
expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
for: 5m
labels:
severity: critical
annotations:
summary: "High memory usage"
description: "Memory usage is {{ $value }}%"
```
-----
Future Enhancements
1. **[Distributed Tracing](DistributedTracing)**: Implement OpenTelemetry and Jaeger for request tracing.
2. **JMX Metrics**: Export JVM internal metrics (GC time, heap usage) using `jmx_exporter`.
3. **Synthetic Monitoring**: Use Prometheus `blackbox_exporter` for pro-active uptime monitoring.
---
**See Also:**
- [High Availability](HighAvailability)
- [Distributed Systems Hub](DistributedSystemsHub)
- [Logging Architecture](LoggingConfig)