replication-manager 3.1 — What Shipped in 2025 (3.1.1 to 3.1.16)

The 3.1 branch launched in May 2025 and shipped sixteen versions through the end of the year. This post covers the major user-visible features: a full application provisioning framework, Restic backup management, schema analysis tools, improved failover reliability, and a hardened authentication layer.


Application Provisioning Framework

3.1.7 introduced a complete application provisioning layer on top of cluster management.

Operators can now:

  • Provision and drop applications directly from the cluster API and UI
  • Define HA topology settings per application, with credit-based resource management
  • Use template repositories: save a running configuration as a template, reset from a template, or clone from a Git repository with volume and subpath support
  • Mount S3 storage and configure storage paths per application
  • Run Docker commands as part of application configuration

3.1.14 added post-backup script support, allowing automated actions to trigger after a successful backup — useful for offloading, notification, or chaining backup pipelines.

Why it matters: Managing MariaDB-backed applications at scale requires more than cluster orchestration. The provisioning framework brings application lifecycle — deploy, configure, backup, tear down — under the same control plane as replication management.


Restic Backup Management

Restic integration was progressively hardened across the 3.1 cycle.

Task Queue with Full Lifecycle Control (3.1.16)

The Restic task queue now supports cancel, pause, and resume operations, exposed via dedicated API endpoints. Operators can inspect the queue state, pause non-urgent operations, and cancel tasks that are no longer needed without restarting the manager.

Password Management (3.1.11)

Restic repository passwords can now be managed directly from the API, including backup and restore of the Restic configuration file itself. This closes the gap where a lost password rendered a repository unrecoverable.

Active-Passive Topology Support (3.1.16)

Restic backup operations are now topology-aware: they correctly handle active-passive setups and route backup tasks to the appropriate server without operator intervention.

Why it matters: Backup pipelines that cannot be monitored or controlled under load are a liability. Queue management, password handling, and topology awareness turn Restic from a configured tool into a managed service.


Schema Analysis and Persistent Analyze

3.1.3 introduced persistent analysis support: operators can trigger ANALYZE TABLE with persistence across restarts, configurable per cluster. The feature includes a dedicated cluster-analyze ACL grant and Swagger documentation.

A DataTable UI component was added with grouping and expanding features for browsing schema analysis results.

3.1.12 added configurable log optimization levels with UI support, and API endpoints to check whether schema analysis tasks need to be performed on a given server — avoiding redundant analysis runs in large clusters.

Why it matters: Table statistics drift silently in busy clusters. Persistent, orchestrated analysis keeps query plans optimal without manual DBA intervention.


Failover Reliability Improvements

Several releases focused on reducing false positives and improving failover correctness.

Heartbeat and False Positive Reduction (3.1.4)

Heartbeat checks for replicas were refactored to use goroutines with configurable context timeouts. The context timeout for the heartbeat check was increased, significantly reducing false-positive failover triggers in environments with transient network latency.

MySQL 8.4 Compatibility (3.1.15)

A mysqlbinlog option incompatibility after rejoin operations on MySQL 8.4 was fixed. SSL mode checks were updated to reflect MySQL 8.0 deprecations.

Process Monitoring Regression (3.1.3)

A process monitoring regression affecting MySQL servers below version 8 was corrected. Maintenance mode now correctly triggers bash script execution on state change.

Why it matters: False-positive failovers are expensive — they cause unnecessary downtime, split traffic, and require manual recovery. Each of these fixes reduces the probability of an unwarranted topology change.


Error and Slow Query Log Management

3.1.10 added cookie-based management for error log and slow query log fetch operations, with configurable fetch levels exposed in both the configuration file and the API. Log levels are independently tunable for each log type.

3.1.12 added API endpoints to check whether a cluster can fetch logs from a given server, preventing fetch attempts against servers that are unavailable or unconfigured.

3.1.16 extended this with audit log fetch level configuration and tailer support, bringing audit logs into the same management model as error and slow query logs.

Why it matters: In regulated environments, audit logs are as important as replication health. Unified, level-configurable log fetching gives operations teams a consistent interface regardless of log type.


OpenSVC Integration Improvements

3.1.13 added OpenSVC workload component integration with stats retrieval exposed in the Home, Top, and cluster views. CPU load visualization and threshold checks were added to the OpenSVC node card.

Access to OpenSVC stats is controlled by ACL, with conditional authentication based on the UseAPI flag.

Why it matters: For teams running replication-manager on OpenSVC infrastructure, unified visibility into node-level resource consumption alongside database health reduces the number of tools needed during an incident.


Authentication and Security

3.1.6 fixed masking of script credentials in log output — script passwords were previously visible in plaintext under certain logging configurations.

3.1.7 refactored the authentication flow to use Redux state management with JWT-based cluster subscriptions, removing reliance on localStorage for sensitive session state.

3.1.8 added sponsor user creation during bootstrap and improved Slack alert logging levels.

3.1.11 added environment variable support for username and password in the CLI, enabling credential injection from secrets managers without config file exposure.

3.1.14 added immutability checks for ProxySQL and MDBS Proxy password rotation, preventing accidental rotation of credentials that must remain stable.

Why it matters: Security gaps in orchestrators are high-impact — they sit on top of production databases with administrative credentials. Each of these changes reduces the attack surface and closes a concrete credential exposure path.


App Service Management API

3.1.10 added API endpoints for application service lifecycle management: start, stop, and restart actions for services within a provisioned application. This enables integration with external automation tools without requiring direct access to the underlying infrastructure.


Upgrade Notes

3.1.4: If you rely on heartbeat-based failover in high-latency environments, review the new context timeout configuration. The defaults are conservative; adjust per-cluster if needed.

3.1.7: The authentication refactor moves session state from localStorage to Redux. Users with custom tooling that reads localStorage tokens will need to adapt.

3.1.11: CLI credential injection via environment variables is now supported. Review your deployment scripts to take advantage of this in containerized environments.

3.1.14: Post-backup scripts are opt-in. Configure post-backup-script per cluster if you want automated post-backup actions.


replication-manager 3.1.16 is available on GitHub. Packages for RPM and DEB distributions are available through the standard release pipeline.