Operations

Monitoring L1ES plugin health in production, backup and disaster recovery for plugin indices, and safe rolling upgrades with version-pinned plugins.

How do I monitor L1ES plugin health in production

Health check endpoint:

curl http://localhost:9200/_l1es

Expected response:

{
  "name": "l1es-plugin",
  "version": "0.3.0",
  "elasticsearch_version": "8.17.0",
  "status": "active"
}

Set up Elasticsearch Watcher alert (Kibana):

POST _watcher/watch/l1es_plugin_health
{
  "trigger": { "schedule": { "interval": "5m" } },
  "input": {
    "http": {
      "request": {
        "url": "http://localhost:9200/_l1es",
        "method": "GET"
      }
    }
  },
  "condition": {
    "compare": { "ctx.payload.status": { "ne": "active" } }
  },
  "actions": {
    "notify_team": {
      "email": {
        "to": "ops@company.com",
        "subject": "L1ES Plugin Health Alert"
      }
    }
  }
}

Monitor template index health:

# Check shard status
GET _cat/shards/l1es_dml?v
# All shards should be GREEN (STARTED)

# Check index stats
GET l1es_dml/_stats
# Look for: docs.count (template count), store.size_in_bytes
Backup and disaster recovery for L1ES plugin

Monthly backup procedure:

L1ES stores two critical indices: templates and field mappings.

# Backup 1: Templates (monthly)
curl -X GET "localhost:9200/l1es_dml/_search?size=10000&scroll=5m" \
  -H "Content-Type: application/json" > templates_backup_$(date +%Y%m%d).json

# Backup 2: Field mappings (monthly)
curl -X GET "localhost:9200/l1es_dml_indices/_search?size=1000" \
  -H "Content-Type: application/json" > mappings_backup_$(date +%Y%m%d).json

# Store in S3 or backup system
aws s3 cp templates_backup_*.json s3://my-backups/l1es/

Recovery: l1es_dml index deleted

  1. Check what's lost:

    GET l1es_dml/_search
    # Should return 0 if index is gone
    

  2. Restore from backup:

    # Re-create and bulk-load templates from backup
    curl -X POST "localhost:9200/_bulk" \
      -H "Content-Type: application/json" \
      --data-binary @templates_backup_20260227.json
    

  3. Verify restoration:

    GET l1es_dml/_count
    # Should match pre-deletion count
    

Impact during recovery: Queries will NOT expand until templates are restored (encoded events remain encoded). Recovery time: ~5 minutes for 100K templates.

Recovery: l1es_dml_indices deleted

  1. Re-register field mappings:

    POST _l1es/add-dml-index
    {
      "index_name": "my-logs",
      "source": "message",
      "dest": "decoded_message"
    }
    

  2. Verify:

    GET l1es_dml_indices/_search
    # Should show your index mapping
    

Impact: Expansion stops for that index until re-registered. Recovery is immediate (< 1 second).

Rolling upgrade: Elasticsearch version bump with L1ES plugin

For 3-node Elasticsearch cluster upgrade (e.g., 8.16 → 8.17):

# Step 1: Disable shard allocation (prevents rebalancing during upgrade)
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "none"
  }
}

# Step 2: Stop and upgrade node 1
# (On node 1)
curl -X POST "localhost:9200/_nodes/self/_shutdown"

# Upgrade ES binary
# (e.g., apt upgrade elasticsearch)

# Upgrade L1ES plugin
bin/elasticsearch-plugin remove l1es-plugin
bin/elasticsearch-plugin install file:///path/to/l1es-0.3.0.es.8.17.0.zip

# Start node 1
systemctl start elasticsearch

# Step 3: Wait for node 1 to rejoin cluster and recover shards
GET _cluster/health
# Watch until status=green, relocating_shards=0

# Step 4: Re-enable shard allocation
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}

# Step 5: Verify cluster is healthy
GET _cluster/health
# Should show: status=green, active_shards=<all shards>

# Repeat for nodes 2 and 3

CRITICAL: Plugin version must match Elasticsearch version

L1ES Version Elasticsearch Status
0.3.0 8.17.0 ✓ Tested
0.3.0 8.16.x ⚠ Likely works, not officially tested
0.2.x 7.10.0 ❌ Legacy

If plugin version mismatch: - Cluster will refuse to start or may behave unpredictably - Always verify plugin version matches ES version before starting node