The 3.1 branch has matured significantly over six releases. This post summarizes the major user-visible features shipped between 3.1.17 and 3.1.22, covering schema monitoring, a production-ready Restic backup pipeline, table checksums, and operational tooling for DBAs and SRE teams.
Starting in 3.1.17, replication-manager monitors schema differences between master and replicas — tables, columns, and indexes. The feature integrates with the shardproxy layer and surfaces divergences before they become replication failures or application inconsistencies.
3.1.21 significantly hardened this feature:
Why it matters: Schema drift between master and replicas is a common and silent source of replication failures. Detecting it proactively — before a query fails — gives operators time to act.
3.1.18 introduced table checksum scheduling with schema cache requirements and bounded polling. 3.1.21 and 3.1.22 extended the feature substantially:
Why it matters: Checksumming at scale requires correctness with composite keys and efficiency under load. The repair workflow closes the loop — detect divergence, then fix it, without manual intervention.
The Restic integration received the most development activity across this release cycle. The pipeline is now production-ready for both local and cloud-backed deployments.
Additional environment variable overrides allow fine-grained control of S3 and custom backend configuration without modifying config files — useful in containerized deployments where secrets are injected at runtime.
?format=legacy for backward-compatible toolingRestic unmount now works across platforms. Mount directories are admin-selectable under strict path checks, eliminating the permission conflicts that affected earlier versions.
Backup sizes are now displayed in human-readable format in both the API responses and the GUI table views.
Why it matters: A backup system that cannot be monitored or integrated is a liability. Progress visibility, S3 flexibility, and human-readable output reduce the operational gap between "backups configured" and "backups trusted."
3.1.17 introduced the splitdump CLI tool. 3.1.18 integrated it fully into the cluster backup workflows.
Key capabilities:
Why it matters: Large logical backups are difficult to parallelize and recover from selectively. Splitdump addresses both problems — sharded dumps enable parallel restore and selective table recovery without third-party tools.
A three-tier preserved variables system allows controlled manual overrides of database variables with UI indicators. The HasConfigDiff field and the Variables tab let operators see at a glance which variables deviate from expected configuration.
Why it matters: In production clusters, variables drift over time — applied hotfixes, emergency changes, vendor recommendations. Tracking that drift inside the orchestrator gives DBAs a single place to audit and correct it.
The replication master retry count is now configurable from both the configuration file and the UI. This controls how many times a replica attempts to reconnect to the master before declaring it lost.
Why it matters: In unstable network environments, the default retry count leads to premature failovers. Tuning it per-cluster avoids unnecessary topology changes.
All Docker image variants now ship rootless counterparts running as the repman non-root user with fixed UID/GID 10001:10001.
To use rootless images with volume mounts:
sudo chown -R 10001:10001 /path/to/data /path/to/config
Why it matters: Container security requirements in regulated or multi-tenant environments mandate non-root execution. Fixed UID/GID ensures consistent permissions across hosts and rebuilds.
dbhelper package rewritten with parameterized queries and a vendor abstraction layermysql.RegisterTLSConfigDocker rootless: ensure volume ownership before switching to rootless images.
Restic S3: the new endpoint and prefix fields are optional. Existing configurations continue to work without changes.
Splitdump: shard size defaults are applied per-cluster. Review the new UI control to align with your storage and restore SLA requirements.
Schema monitoring: scan timeout is configurable. Set it based on your largest table sizes to avoid monitoring loop delays.
replication-manager 3.1.22 is available on GitHub. Packages for RPM and DEB distributions are available through the standard release pipeline.