EOS/docs/akkudoktoreos/database.md

% SPDX-License-Identifier: Apache-2.0
(database-page)=

# Database

## Overview

The EOS database system provides a flexible, pluggable persistence layer for time-series data
records with automatic lazy loading, dirty tracking, and multi-backend support. The architecture
separates the abstract database interface from concrete storage implementations, allowing seamless
switching between LMDB and SQLite backends.

## Architecture

### Three-Layer Design

**Abstract Interface Layer** (`DatabaseABC`)

- Defines the contract for all database operations
- Provides compression/decompression utilities
- Backend-agnostic API

**Backend Implementation Layer** (`DatabaseBackendABC`)

- Concrete implementations: `LMDBDatabase`, `SQLiteDatabase`
- Singleton pattern ensures single instance per backend
- Thread-safe operations via internal locking

**Record Protocol Layer** (`DatabaseRecordProtocolMixin`)

- Manages in-memory record lifecycle
- Implements lazy loading strategies
- Handles dirty tracking and autosave

## Configuration

### Database Settings (`DatabaseCommonSettings`)

```python
provider: Optional[str] = None        # "LMDB" or "SQLite"
compression_level: int = 9            # 0-9, gzip compression
initial_load_window_h: Optional[int] = None  # Hours, None = full load
keep_duration_h: Optional[int] = None        # Retention period
autosave_interval_sec: Optional[int] = None  # Auto-flush interval
compaction_interval_sec: Optional[int] = 604800  # Compaction interval
batch_size: int = 100                 # Batch operation size
```

### User Configuration Guide

This section explains what each setting does in practical terms and gives
concrete recommendations for common deployment scenarios.

#### `provider` — choosing a backend

Set `provider` to `"LMDB"` or `"SQLite"`. Leave it `None` only during
development or unit testing — with `None` set, nothing is persisted to disk and
all data is lost on restart.

**Use LMDB** for a long-running home server that records data continuously. It
is significantly faster for high-frequency writes and range reads because it
uses memory-mapped files. The trade-off is that it pre-allocates a large file
on disk (default 10 GB) even when mostly empty.

**Use SQLite** when disk space is constrained, for portable single-file
deployments, or when you want to inspect or manipulate the database with
standard SQL tools. SQLite is slightly slower for bulk writes but perfectly
adequate for home energy data volumes.

**Do not** switch backends while data exists in the old backend — records are
not migrated automatically. If you need to switch, vacuum the old database
first, export your data, then reconfigure.

#### `compression_level` — storage size vs. CPU

Values range from `0` (no compression) to `9` (maximum compression). The default of `9` is
appropriate for most deployments: home energy time-series data compresses very well (often
60–80 % reduction) and the CPU overhead is negligible on modern hardware.

**Set to `0`** only if you are running on very constrained hardware (e.g. a single-core ARM
board at full load) and storage space is not a concern.

**Do not** change this setting after data has been written — the database stores each record
with the compression level active at write time and auto-detects the format on read, so mixed
levels are fine technically, but you will not reclaim space from already-written records until
they are rewritten by compaction.

#### `initial_load_window_h` — startup memory usage

Controls how much history is loaded into memory when the application first accesses a namespace.

**Set a window** (e.g. `48`) on systems with limited RAM or large databases. Only the most
recent 48 hours are loaded immediately; older data is fetched on demand if a query reaches
outside that window.

**Leave as `None`** (the default) on well-resourced systems or when you need guaranteed
access to all history from the first query. Full load is simpler and avoids the small latency
spike of incremental loads.

**Do not** set this to a very small value (e.g. `1`) if your forecasting or reporting queries
routinely look back further — every out-of-window query triggers a database read, and many
small reads are slower than one full load.

#### `keep_duration_h` — data retention

Sets the age limit (in hours) for the vacuum operation. Records older than
`max_timestamp - keep_duration_h` are permanently deleted when vacuum runs.

**Set this** to match your actual analysis needs. If your forecast models only look back 7 days,
keeping 14 days (`336`) gives a comfortable safety margin without accumulating indefinitely.

**Leave as `None`** only if you have a strong archival requirement and understand that the
database will grow without bound. Even with compaction reducing resolution, old data is not
deleted unless vacuum runs with a retention limit.

**Do not** set `keep_duration_h` shorter than the oldest data your forecast or reporting
queries ever request — vacuum is permanent and irreversible.

#### `autosave_interval_sec` — write durability

Controls how often dirty (modified) records are flushed to disk automatically, in seconds.

**Set to a low value** (e.g. `10`–`30`) on a system that could lose power unexpectedly,
such as a Raspberry Pi without a UPS. A power cut between autosaves loses that window of data.

**Set to a higher value** (e.g. `300`) on stable systems to reduce write amplification. Each
autosave is a full flush of all dirty records, so frequent saves on large dirty sets are
more expensive.

**Leave as `None`** only if you call `db_save_records()` manually at appropriate points in
your application code. With `None`, data written since the last manual save is lost on crash.

#### `compaction_interval_sec` — automatic tiered downsampling

Controls how often the compaction maintenance job runs, in seconds. The default is
604 800 (one week). Set to `None` to disable automatic compaction entirely.

Compaction applies a tiered downsampling policy to old records:

- Records older than **2 hours** are downsampled to **15-minute** resolution
- Records older than **14 days** are downsampled to **1-hour** resolution

This reduces storage and speeds up range queries on historical data while preserving full
resolution for recent data where it matters most. Each tier is processed incrementally —
only the window since the last compaction run is examined, so weekly runs are fast regardless
of total history length.

**Leave at the default weekly interval** for most deployments. Compaction is idempotent and
cheap when run frequently on small new windows.

**Set to a shorter interval** (e.g. `86400`, daily) if your device records at very high
frequency (sub-minute) and disk space is a concern.

**Set to `None`** only if you have a custom retention policy and manage downsampling manually,
or if you store data that must not be averaged (e.g. raw event logs where mean resampling
would be meaningless).

**Do not** set the interval shorter than `autosave_interval_sec` — compaction reads from the
backend and a record that has not been saved yet will not be visible to it.

**Interaction with vacuum:** compaction and vacuum are complementary. Compaction reduces
resolution of old data; vacuum deletes it entirely past `keep_duration_h`. The recommended
pipeline is: compaction runs first (weekly), then vacuum runs immediately after. This means
vacuum always operates on already-downsampled data, which is faster and produces cleaner
storage boundaries.

### Recommended Configurations by Scenario

#### Home server, typical (Raspberry Pi 4, SSD)

```python
provider = "LMDB"
compression_level = 9
initial_load_window_h = 48
keep_duration_h = 720          # 30 days
autosave_interval_sec = 30
compaction_interval_sec = 604800  # weekly
```

#### Home server, low storage (Raspberry Pi Zero, SD card)

```python
provider = "SQLite"
compression_level = 9
initial_load_window_h = 24
keep_duration_h = 168          # 7 days
autosave_interval_sec = 60
compaction_interval_sec = 86400   # daily — reclaim space faster
```

#### Development / testing

```python
provider = "SQLite"            # or None for fully in-memory
compression_level = 0          # faster without compression overhead
initial_load_window_h = None   # always load everything
keep_duration_h = None         # never vacuum automatically
autosave_interval_sec = None   # manual saves only
compaction_interval_sec = None # disable compaction
```

#### High-frequency recording (sub-minute intervals)

```python
provider = "LMDB"
compression_level = 9
initial_load_window_h = 24
keep_duration_h = 336          # 14 days
autosave_interval_sec = 10
compaction_interval_sec = 86400   # daily — essential at high frequency
```

## Storage Backends

### LMDB Backend

**Characteristics:**

- Memory-mapped file database
- Native namespace support via DBIs (Database Instances)
- High-performance reads with MVCC
- Configurable map size (default: 10 GB)

**Configuration:**

```python
map_size: int = 10 * 1024 * 1024 * 1024  # 10 GB
writemap=True, map_async=True             # Performance optimizations
max_dbs=128                                # Maximum namespaces
```

**File Structure:**

```text
data_folder_path/
└── db/
    └── lmdbdatabase/
        ├── data.mdb
        └── lock.mdb
```

### SQLite Backend

**Characteristics:**

- Single-file relational database
- Namespace emulation via `namespace` column
- ACID transactions with autocommit mode
- Cross-platform compatibility

**Schema:**

```sql
CREATE TABLE records (
    namespace TEXT NOT NULL DEFAULT '',
    key BLOB NOT NULL,
    value BLOB NOT NULL,
    PRIMARY KEY (namespace, key)
);

CREATE TABLE metadata (
    namespace TEXT PRIMARY KEY,
    value BLOB
);
```

**File Structure:**

```text
data_folder_path/
└── db/
    └── sqlitedatabase/
        └── data.db
```

## Timestamp System

### DatabaseTimestamp

All records are indexed by UTC timestamps in sortable ISO 8601 format:

```python
DatabaseTimestamp.from_datetime(dt: DateTime) -> "20241027T123456[Z]"
```

**Properties:**
- Always stored in UTC (timezone-aware required)
- Lexicographically sortable
- Bijective conversion to/from `pendulum.DateTime`
- Second-level precision

### Unbounded Sentinels

```python
UNBOUND_START  # Smaller than any timestamp
UNBOUND_END    # Greater than any timestamp
```

Used for open-ended range queries without special-casing `None`.

## Lazy Loading Strategy

### Three-Phase Loading

The system uses a progressive loading model to minimize memory footprint:

#### **Phase 0: NONE**

- No records loaded
- First query triggers either:
  - Initial window load (if `initial_load_window_h` configured)
  - Full database load (if `initial_load_window_h = None`)
  - Targeted range load (if explicit range requested)

#### **Phase 1: INITIAL**

- Partial time window loaded
- `_db_loaded_range` tracks coverage: `[start_timestamp, end_timestamp)`
- Out-of-window queries trigger incremental expansion:
  - Left expansion: load records before current window
  - Right expansion: load records after current window
- Unbounded queries escalate to FULL

#### **Phase 2: FULL**

- All database records in memory
- No further database access needed
- `_db_loaded_range` spans entire dataset

### Boundary Extension

When loading a range `[start, end)`, the system automatically extends boundaries to include:
- **First record before** `start` (for interpolation/context)
- **First record at or after** `end` (for closing boundary)

This prevents additional database lookups during nearest-neighbor searches.

## Namespace Support

Namespaces provide logical isolation within a single database instance:

```python
# LMDB: uses native DBIs
db.save_records(records, namespace="measurement")

# SQLite: uses namespace column
SELECT * FROM records WHERE namespace='measurement'
```

**Default Namespace:**
- Can be set during `open(namespace="default")`
- Operations with `namespace=None` use the default
- Each record class typically defines its own namespace via `db_namespace()`

## Record Lifecycle

### Insertion

```python
db_insert_record(record, mark_dirty=True)
```

1. Normalize `record.date_time` to UTC `DatabaseTimestamp`
2. Ensure timestamp range is loaded (lazy load if needed)
3. Check for duplicates (raises `ValueError`)
4. Insert into sorted position in memory
5. Update index: `_db_record_index[timestamp] = record`
6. Mark dirty if `mark_dirty=True`

### Retrieval

```python
db_get_record(target_timestamp, time_window=None)
```

**Search Strategies:**

| `time_window` | Behavior |
|---|---|
| `None` | Exact match only |
| `UNBOUND_WINDOW` | Nearest record (unlimited search) |
| `Duration` | Nearest within symmetric window |

**Memory-First:** Checks in-memory index before querying database.

### Deletion

```python
db_delete_records(start_timestamp, end_timestamp)
```

1. Ensure range is fully loaded
2. Remove from memory: `records`, `_db_sorted_timestamps`, `_db_record_index`
3. Add to `_db_deleted_timestamps` (tombstone)
4. Discard from dirty sets (cancel pending writes)
5. Physical deletion deferred until `db_save_records()`

## Dirty Tracking

The system maintains three dirty sets to optimize writes:

```python
_db_dirty_timestamps: set[DatabaseTimestamp]    # Modified records
_db_new_timestamps: set[DatabaseTimestamp]      # Newly inserted
_db_deleted_timestamps: set[DatabaseTimestamp]  # Pending deletes
```

**Write Strategy:**

1. **Saves first:** Insert/update all dirty records
2. **Deletes last:** Remove tombstoned records
3. **Clear tracking sets:** Reset dirty state

**Autosave:** Triggered periodically if `autosave_interval_sec` configured.

## Compression

Optional gzip compression reduces storage footprint:

```python
# Serialize
data = pickle.dumps(record.model_dump())
if compression_level > 0:
    data = gzip.compress(data, compresslevel=compression_level)

# Deserialize (auto-detect)
if data[:2] == b'\x1f\x8b':  # gzip magic bytes
    data = gzip.decompress(data)
record_data = pickle.loads(data)
```

**Compression is transparent:** Application code never handles compressed data directly.

## Metadata

Each namespace can store arbitrary metadata (version, creation time, provider):

```python
_db_metadata = {
    "version": 1,
    "created": "2024-01-01T00:00:00Z",
    "provider_id": "LMDB",
    "compression": True,
    "backend": "LMDBDatabase"
}
```

Stored separately from records using reserved key `__metadata__`.

## Compaction

Compaction reduces storage by downsampling old records to a lower time resolution. Unlike
vacuum — which deletes records outright — compaction preserves the full time span of the
data while replacing many fine-grained records with fewer coarse-grained averages.

### Tiered Downsampling Policy

The default policy has two tiers, applied coarsest-first:

| Age threshold | Target resolution | Effect |
|---|---|---|
| Older than 14 days | 1 hour | 15-min records → 1 per hour (75 % reduction) |
| Older than 2 hours | 15 minutes | 1-min records → 1 per 15 min (93 % reduction) |

Records within the most recent 2 hours are never touched.

### How Compaction Works

Each tier is processed incrementally using a stored cutoff timestamp per tier. On each run,
only the window `[last_cutoff, new_cutoff)` is examined — records already compacted in a
previous run are never re-processed. This makes weekly runs fast even on years of history.

For each writable numeric field, records in the window are mean-resampled at the target
interval using time interpolation. The original records are deleted and the downsampled
records are written back. A **sparse-data guard** skips any window where the existing record
count is already at or below the resampled bucket count, preventing compaction from
accidentally *increasing* record count for data that is already coarse or irregular.

### Customising the Policy per Namespace

Individual data providers can override `db_compact_tiers()` to use a different policy:

```python
class PriceDataProvider(DataProvider):
    def db_compact_tiers(self):
        # Price data is already at 15-min resolution from the source.
        # Skip the first tier; only compact to hourly after 2 weeks.
        return [(to_duration("14 days"), to_duration("1 hour"))]
```

Return an empty list to disable compaction for a specific namespace entirely:

```python
class EventLogProvider(DataProvider):
    def db_compact_tiers(self):
        return []  # Raw events must not be averaged
```

### Manual Invocation

```python
# Compact all providers in the container
data_container.db_compact()

# Compact a single provider
provider.db_compact()

# Use a one-off policy without changing the instance default
provider.db_compact(compact_tiers=[
    (to_duration("7 days"), to_duration("1 hour"))
])
```

### Interaction with Vacuum

Compaction and vacuum are complementary and should always run in this order:

```text
compact → vacuum
```

Compact first so that vacuum operates on already-downsampled records. This produces cleaner
retention boundaries and ensures the vacuum cutoff falls on hour-aligned timestamps rather
than arbitrary sub-minute ones. Running them in reverse order (vacuum then compact) wastes
work: vacuum may delete records that compaction would have downsampled and kept.

The `RetentionManager` registers both jobs and ensures compaction always runs before vacuum
within the same maintenance window.

## Vacuum Operation

Remove old records to reclaim space:

```python
db_vacuum(keep_hours=48)        # Keep last 48 hours
db_vacuum(keep_timestamp=cutoff) # Keep from cutoff onward
```

**Strategy:**
- Computes cutoff relative to `max_timestamp - keep_hours`
- Deletes all records before cutoff
- Immediately persists changes via `db_save_records()`

## Thread Safety

- **LMDB:** Internal lock protects write transactions; reads are lock-free via MVCC
- **SQLite:** Lock guards all operations (autocommit mode eliminates transaction deadlocks)
- **Record Protocol:** No internal locking (assumes single-threaded access per instance)

## Performance Characteristics

| Operation | LMDB | SQLite |
|---|---|---|
| Sequential read | Excellent (mmap) | Good (indexed) |
| Random read | Excellent (mmap) | Good (B-tree) |
| Bulk write | Excellent (single txn) | Good (batch insert) |
| Range query | Excellent (cursor) | Good (indexed scan) |
| Disk usage | Moderate (pre-allocated) | Compact (auto-grow) |
| Concurrency | High (MVCC readers) | Low (write serialization) |

**Recommendation:** Use LMDB for high-frequency time-series workloads;
SQLite for portability and simpler deployment.

## Example Usage

```python
# Configuration
config.database.provider = "LMDB"
config.database.compression_level = 9
config.database.initial_load_window_h = 24  # Load last 24h initially
config.database.keep_duration_h = 720       # Retain 30 days
config.database.compaction_interval_sec = 604800  # Compact weekly

# Access (automatic singleton initialization)
class MeasurementData(DatabaseRecordProtocolMixin):
    records: list[MeasurementRecord] = []

    def db_namespace(self) -> str:
        return "measurement"

# Operations
measurement = MeasurementData()

# Lazy load on first access
record = measurement.db_get_record(
    DatabaseTimestamp.from_datetime(now),
    time_window=Duration(hours=1)
)

# Insert new record
measurement.db_insert_record(new_record)

# Automatic save (if autosave configured) or manual
measurement.db_save_records()

# Maintenance pipeline (normally handled by RetentionManager)
measurement.db_compact()    # downsample old records first
measurement.db_vacuum(keep_hours=720)  # then delete beyond retention
```
-												Add database support for measurements and historic prediction data. (#848)

The database supports backend selection, compression, incremental data load,
automatic data saving to storage, automatic vaccum and compaction.

Make SQLite3 and LMDB database backends available.

Update tests for new interface conventions regarding data sequences,
data containers, data providers. This includes the measurements provider and
the prediction providers.

Add database documentation.

The fix includes several bug fixes that are not directly related to the database
implementation but are necessary to keep EOS running properly and to test and
document the changes.

* fix: config eos test setup

  Make the config_eos fixture generate a new instance of the config_eos singleton.
  Use correct env names to setup data folder path.

* fix: startup with no config

  Make cache and measurements complain about missing data path configuration but
  do not bail out.

* fix: soc data preparation and usage for genetic optimization.

  Search for soc measurments 48 hours around the optimization start time.
  Only clamp soc to maximum in battery device simulation.

* fix: dashboard bailout on zero value solution display

  Do not use zero values to calculate the chart values adjustment for display.

* fix: openapi generation script

  Make the script also replace data_folder_path and data_output_path to hide
  real (test) environment pathes.

* feat: add make repeated task function

  make_repeated_task allows to wrap a function to be repeated cyclically.

* chore: removed index based data sequence access

  Index based data sequence access does not make sense as the sequence can be backed
  by the database. The sequence is now purely time series data.

* chore: refactor eos startup to avoid module import startup

  Avoid module import initialisation expecially of the EOS configuration.
  Config mutation, singleton initialization, logging setup, argparse parsing,
  background task definitions depending on config and environment-dependent behavior
  is now done at function startup.

* chore: introduce retention manager

  A single long-running background task that owns the scheduling of all periodic
  server-maintenance jobs (cache cleanup, DB autosave, …)

* chore: canonicalize timezone name for UTC

  Timezone names that are semantically identical to UTC are canonicalized to UTC.

* chore: extend config file migration for default value handling

  Extend the config file migration handling values None or nonexisting values
  that will invoke a default value generation in the new config file. Also
  adapt test to handle this situation.

* chore: extend datetime util test cases

* chore: make version test check for untracked files

  Check for files that are not tracked by git. Version calculation will be
  wrong if these files will not be commited.

* chore: bump pandas to 3.0.0

  Pandas 3.0 now performs inference on the appropriate resolution (a.k.a. unit)
  for the output dtype which may become datetime64[us] (before it was ns). Also
  numeric dtype detection is now more strict which needs a different detection for
  numerics.

* chore: bump pydantic-settings to 2.12.0

  pydantic-settings 2.12.0 under pytest creates a different behaviour. The tests
  were adapted and a workaround was introduced. Also ConfigEOS was adapted
  to allow for fine grain initialization control to be able to switch
  off certain settings such as file settings during test.

* chore: remove sci learn kit from dependencies

  The sci learn kit is not strictly necessary as long as we have scipy.

* chore: add documentation mode guarding for sphinx autosummary

  Sphinx autosummary excecutes functions. Prevent exceptions in case of pure doc
  mode.

* chore: adapt docker-build CI workflow to stricter GitHub handling

Signed-off-by: Bobby Noelte <b0661n0e17e@gmail.com>
											
										
										
											2026-02-22 14:12:42 +01:00
+								% SPDX-License-Identifier: Apache-2.0
 								(database-page)=
 								# Database
 								## Overview
 								The EOS database system provides a flexible, pluggable persistence layer for time-series data
 								records with automatic lazy loading, dirty tracking, and multi-backend support. The architecture
 								separates the abstract database interface from concrete storage implementations, allowing seamless
 								switching between LMDB and SQLite backends.
 								## Architecture
 								### Three-Layer Design
 								**Abstract Interface Layer** (`DatabaseABC`)
 								- Defines the contract for all database operations
 								- Provides compression/decompression utilities
 								- Backend-agnostic API
 								**Backend Implementation Layer** (`DatabaseBackendABC`)
 								- Concrete implementations: `LMDBDatabase`, `SQLiteDatabase`
 								- Singleton pattern ensures single instance per backend
 								- Thread-safe operations via internal locking
 								**Record Protocol Layer** (`DatabaseRecordProtocolMixin`)
 								- Manages in-memory record lifecycle
 								- Implements lazy loading strategies
 								- Handles dirty tracking and autosave
 								## Configuration
 								### Database Settings (`DatabaseCommonSettings`)
 								```python
 								provider: Optional[str] = None        # "LMDB" or "SQLite"
 								compression_level: int = 9            # 0-9, gzip compression
 								initial_load_window_h: Optional[int] = None  # Hours, None = full load
 								keep_duration_h: Optional[int] = None        # Retention period
 								autosave_interval_sec: Optional[int] = None  # Auto-flush interval
 								compaction_interval_sec: Optional[int] = 604800  # Compaction interval
 								batch_size: int = 100                 # Batch operation size
 								```
 								### User Configuration Guide
 								This section explains what each setting does in practical terms and gives
 								concrete recommendations for common deployment scenarios.
 								#### `provider` — choosing a backend
 								Set `provider` to `"LMDB"` or `"SQLite"`. Leave it `None` only during
 								development or unit testing — with `None` set, nothing is persisted to disk and
 								all data is lost on restart.
 								**Use LMDB** for a long-running home server that records data continuously. It
 								is significantly faster for high-frequency writes and range reads because it
 								uses memory-mapped files. The trade-off is that it pre-allocates a large file
 								on disk (default 10 GB) even when mostly empty.
 								**Use SQLite** when disk space is constrained, for portable single-file
 								deployments, or when you want to inspect or manipulate the database with
 								standard SQL tools. SQLite is slightly slower for bulk writes but perfectly
 								adequate for home energy data volumes.
 								**Do not** switch backends while data exists in the old backend — records are
 								not migrated automatically. If you need to switch, vacuum the old database
 								first, export your data, then reconfigure.
 								#### `compression_level` — storage size vs. CPU
 								Values range from `0` (no compression) to `9` (maximum compression). The default of `9` is
 								appropriate for most deployments: home energy time-series data compresses very well (often
 –80 % reduction) and the CPU overhead is negligible on modern hardware.
 								**Set to `0`** only if you are running on very constrained hardware (e.g. a single-core ARM
 								board at full load) and storage space is not a concern.
 								**Do not** change this setting after data has been written — the database stores each record
 								with the compression level active at write time and auto-detects the format on read, so mixed
 								levels are fine technically, but you will not reclaim space from already-written records until
 								they are rewritten by compaction.
 								#### `initial_load_window_h` — startup memory usage
 								Controls how much history is loaded into memory when the application first accesses a namespace.
 								**Set a window** (e.g. `48`) on systems with limited RAM or large databases. Only the most
 								recent 48 hours are loaded immediately; older data is fetched on demand if a query reaches
 								outside that window.
 								**Leave as `None`** (the default) on well-resourced systems or when you need guaranteed
 								access to all history from the first query. Full load is simpler and avoids the small latency
 								spike of incremental loads.
 								**Do not** set this to a very small value (e.g. `1`) if your forecasting or reporting queries
 								routinely look back further — every out-of-window query triggers a database read, and many
 								small reads are slower than one full load.
 								#### `keep_duration_h` — data retention
 								Sets the age limit (in hours) for the vacuum operation. Records older than
 								`max_timestamp - keep_duration_h` are permanently deleted when vacuum runs.
 								**Set this** to match your actual analysis needs. If your forecast models only look back 7 days,
 								keeping 14 days (`336`) gives a comfortable safety margin without accumulating indefinitely.
 								**Leave as `None`** only if you have a strong archival requirement and understand that the
 								database will grow without bound. Even with compaction reducing resolution, old data is not
 								deleted unless vacuum runs with a retention limit.
 								**Do not** set `keep_duration_h` shorter than the oldest data your forecast or reporting
 								queries ever request — vacuum is permanent and irreversible.
 								#### `autosave_interval_sec` — write durability
 								Controls how often dirty (modified) records are flushed to disk automatically, in seconds.
 								**Set to a low value** (e.g. `10`–`30`) on a system that could lose power unexpectedly,
 								such as a Raspberry Pi without a UPS. A power cut between autosaves loses that window of data.
 								**Set to a higher value** (e.g. `300`) on stable systems to reduce write amplification. Each
 								autosave is a full flush of all dirty records, so frequent saves on large dirty sets are
 								more expensive.
 								**Leave as `None`** only if you call `db_save_records()` manually at appropriate points in
 								your application code. With `None`, data written since the last manual save is lost on crash.
 								#### `compaction_interval_sec` — automatic tiered downsampling
 								Controls how often the compaction maintenance job runs, in seconds. The default is
 800 (one week). Set to `None` to disable automatic compaction entirely.
 								Compaction applies a tiered downsampling policy to old records:
 								- Records older than **2 hours** are downsampled to **15-minute** resolution
 								- Records older than **14 days** are downsampled to **1-hour** resolution
 								This reduces storage and speeds up range queries on historical data while preserving full
 								resolution for recent data where it matters most. Each tier is processed incrementally —
 								only the window since the last compaction run is examined, so weekly runs are fast regardless
 								of total history length.
 								**Leave at the default weekly interval** for most deployments. Compaction is idempotent and
 								cheap when run frequently on small new windows.
 								**Set to a shorter interval** (e.g. `86400`, daily) if your device records at very high
 								frequency (sub-minute) and disk space is a concern.
 								**Set to `None`** only if you have a custom retention policy and manage downsampling manually,
 								or if you store data that must not be averaged (e.g. raw event logs where mean resampling
 								would be meaningless).
 								**Do not** set the interval shorter than `autosave_interval_sec` — compaction reads from the
 								backend and a record that has not been saved yet will not be visible to it.
 								**Interaction with vacuum:** compaction and vacuum are complementary. Compaction reduces
 								resolution of old data; vacuum deletes it entirely past `keep_duration_h`. The recommended
 								pipeline is: compaction runs first (weekly), then vacuum runs immediately after. This means
 								vacuum always operates on already-downsampled data, which is faster and produces cleaner
 								storage boundaries.
 								### Recommended Configurations by Scenario
 								#### Home server, typical (Raspberry Pi 4, SSD)
 								```python
 								provider = "LMDB"
 								compression_level = 9
 								initial_load_window_h = 48
 								keep_duration_h = 720          # 30 days
 								autosave_interval_sec = 30
 								compaction_interval_sec = 604800  # weekly
 								```
 								#### Home server, low storage (Raspberry Pi Zero, SD card)
 								```python
 								provider = "SQLite"
 								compression_level = 9
 								initial_load_window_h = 24
 								keep_duration_h = 168          # 7 days
 								autosave_interval_sec = 60
 								compaction_interval_sec = 86400   # daily — reclaim space faster
 								```
 								#### Development / testing
 								```python
 								provider = "SQLite"            # or None for fully in-memory
 								compression_level = 0          # faster without compression overhead
 								initial_load_window_h = None   # always load everything
 								keep_duration_h = None         # never vacuum automatically
 								autosave_interval_sec = None   # manual saves only
 								compaction_interval_sec = None # disable compaction
 								```
 								#### High-frequency recording (sub-minute intervals)
 								```python
 								provider = "LMDB"
 								compression_level = 9
 								initial_load_window_h = 24
 								keep_duration_h = 336          # 14 days
 								autosave_interval_sec = 10
 								compaction_interval_sec = 86400   # daily — essential at high frequency
 								```
 								## Storage Backends
 								### LMDB Backend
 								**Characteristics:**
 								- Memory-mapped file database
 								- Native namespace support via DBIs (Database Instances)
 								- High-performance reads with MVCC
 								- Configurable map size (default: 10 GB)
 								**Configuration:**
 								```python
 								map_size: int = 10 * 1024 * 1024 * 1024  # 10 GB
 								writemap=True, map_async=True             # Performance optimizations
 								max_dbs=128                                # Maximum namespaces
 								```
 								**File Structure:**
 								```text
 								data_folder_path/
 								└── db/
 								    └── lmdbdatabase/
 								        ├── data.mdb
 								        └── lock.mdb
 								```
 								### SQLite Backend
 								**Characteristics:**
 								- Single-file relational database
 								- Namespace emulation via `namespace` column
 								- ACID transactions with autocommit mode
 								- Cross-platform compatibility
 								**Schema:**
 								```sql
 								CREATE TABLE records (
 								    namespace TEXT NOT NULL DEFAULT '',
 								    key BLOB NOT NULL,
 								    value BLOB NOT NULL,
 								    PRIMARY KEY (namespace, key)
 								);
 								CREATE TABLE metadata (
 								    namespace TEXT PRIMARY KEY,
 								    value BLOB
 								);
 								```
 								**File Structure:**
 								```text
 								data_folder_path/
 								└── db/
 								    └── sqlitedatabase/
 								        └── data.db
 								```
 								## Timestamp System
 								### DatabaseTimestamp
 								All records are indexed by UTC timestamps in sortable ISO 8601 format:
 								```python
 								DatabaseTimestamp.from_datetime(dt: DateTime) -> "20241027T123456[Z]"
 								```
 								**Properties:**
 								- Always stored in UTC (timezone-aware required)
 								- Lexicographically sortable
 								- Bijective conversion to/from `pendulum.DateTime`
 								- Second-level precision
 								### Unbounded Sentinels
 								```python
 								UNBOUND_START  # Smaller than any timestamp
 								UNBOUND_END    # Greater than any timestamp
 								```
 								Used for open-ended range queries without special-casing `None`.
 								## Lazy Loading Strategy
 								### Three-Phase Loading
 								The system uses a progressive loading model to minimize memory footprint:
 								#### **Phase 0: NONE**
 								- No records loaded
 								- First query triggers either:
 								  - Initial window load (if `initial_load_window_h` configured)
 								  - Full database load (if `initial_load_window_h = None`)
 								  - Targeted range load (if explicit range requested)
 								#### **Phase 1: INITIAL**
 								- Partial time window loaded
 								- `_db_loaded_range` tracks coverage: `[start_timestamp, end_timestamp)`
 								- Out-of-window queries trigger incremental expansion:
 								  - Left expansion: load records before current window
 								  - Right expansion: load records after current window
 								- Unbounded queries escalate to FULL
 								#### **Phase 2: FULL**
 								- All database records in memory
 								- No further database access needed
 								- `_db_loaded_range` spans entire dataset
 								### Boundary Extension
 								When loading a range `[start, end)`, the system automatically extends boundaries to include:
 								- **First record before** `start` (for interpolation/context)
 								- **First record at or after** `end` (for closing boundary)
 								This prevents additional database lookups during nearest-neighbor searches.
 								## Namespace Support
 								Namespaces provide logical isolation within a single database instance:
 								```python
 								# LMDB: uses native DBIs
 								db.save_records(records, namespace="measurement")
 								# SQLite: uses namespace column
 								SELECT * FROM records WHERE namespace='measurement'
 								```
 								**Default Namespace:**
 								- Can be set during `open(namespace="default")`
 								- Operations with `namespace=None` use the default
 								- Each record class typically defines its own namespace via `db_namespace()`
 								## Record Lifecycle
 								### Insertion
 								```python
 								db_insert_record(record, mark_dirty=True)
 								```
 . Normalize `record.date_time` to UTC `DatabaseTimestamp`
 . Ensure timestamp range is loaded (lazy load if needed)
 . Check for duplicates (raises `ValueError`)
 . Insert into sorted position in memory
 . Update index: `_db_record_index[timestamp] = record`
 . Mark dirty if `mark_dirty=True`
 								### Retrieval
 								```python
 								db_get_record(target_timestamp, time_window=None)
 								```
 								**Search Strategies:**
 								| `time_window` | Behavior |
 								|---|---|
 								| `None` | Exact match only |
 								| `UNBOUND_WINDOW` | Nearest record (unlimited search) |
 								| `Duration` | Nearest within symmetric window |
 								**Memory-First:** Checks in-memory index before querying database.
 								### Deletion
 								```python
 								db_delete_records(start_timestamp, end_timestamp)
 								```
 . Ensure range is fully loaded
 . Remove from memory: `records`, `_db_sorted_timestamps`, `_db_record_index`
 . Add to `_db_deleted_timestamps` (tombstone)
 . Discard from dirty sets (cancel pending writes)
 . Physical deletion deferred until `db_save_records()`
 								## Dirty Tracking
 								The system maintains three dirty sets to optimize writes:
 								```python
 								_db_dirty_timestamps: set[DatabaseTimestamp]    # Modified records
 								_db_new_timestamps: set[DatabaseTimestamp]      # Newly inserted
 								_db_deleted_timestamps: set[DatabaseTimestamp]  # Pending deletes
 								```
 								**Write Strategy:**
 . **Saves first:** Insert/update all dirty records
 . **Deletes last:** Remove tombstoned records
 . **Clear tracking sets:** Reset dirty state
 								**Autosave:** Triggered periodically if `autosave_interval_sec` configured.
 								## Compression
 								Optional gzip compression reduces storage footprint:
 								```python
 								# Serialize
 								data = pickle.dumps(record.model_dump())
 								if compression_level > 0:
 								    data = gzip.compress(data, compresslevel=compression_level)
 								# Deserialize (auto-detect)
 								if data[:2] == b'\x1f\x8b':  # gzip magic bytes
 								    data = gzip.decompress(data)
 								record_data = pickle.loads(data)
 								```
 								**Compression is transparent:** Application code never handles compressed data directly.
 								## Metadata
 								Each namespace can store arbitrary metadata (version, creation time, provider):
 								```python
 								_db_metadata = {
 								    "version": 1,
 								    "created": "2024-01-01T00:00:00Z",
 								    "provider_id": "LMDB",
 								    "compression": True,
 								    "backend": "LMDBDatabase"
 								}
 								```
 								Stored separately from records using reserved key `__metadata__`.
 								## Compaction
 								Compaction reduces storage by downsampling old records to a lower time resolution. Unlike
 								vacuum — which deletes records outright — compaction preserves the full time span of the
 								data while replacing many fine-grained records with fewer coarse-grained averages.
 								### Tiered Downsampling Policy
 								The default policy has two tiers, applied coarsest-first:
 								| Age threshold | Target resolution | Effect |
 								|---|---|---|
 								| Older than 14 days | 1 hour | 15-min records → 1 per hour (75 % reduction) |
 								| Older than 2 hours | 15 minutes | 1-min records → 1 per 15 min (93 % reduction) |
 								Records within the most recent 2 hours are never touched.
 								### How Compaction Works
 								Each tier is processed incrementally using a stored cutoff timestamp per tier. On each run,
 								only the window `[last_cutoff, new_cutoff)` is examined — records already compacted in a
 								previous run are never re-processed. This makes weekly runs fast even on years of history.
 								For each writable numeric field, records in the window are mean-resampled at the target
 								interval using time interpolation. The original records are deleted and the downsampled
 								records are written back. A **sparse-data guard** skips any window where the existing record
 								count is already at or below the resampled bucket count, preventing compaction from
 								accidentally *increasing* record count for data that is already coarse or irregular.
 								### Customising the Policy per Namespace
 								Individual data providers can override `db_compact_tiers()` to use a different policy:
 								```python
 								class PriceDataProvider(DataProvider):
 								    def db_compact_tiers(self):
 								        # Price data is already at 15-min resolution from the source.
 								        # Skip the first tier; only compact to hourly after 2 weeks.
 								        return [(to_duration("14 days"), to_duration("1 hour"))]
 								```
 								Return an empty list to disable compaction for a specific namespace entirely:
 								```python
 								class EventLogProvider(DataProvider):
 								    def db_compact_tiers(self):
 								        return []  # Raw events must not be averaged
 								```
 								### Manual Invocation
 								```python
 								# Compact all providers in the container
 								data_container.db_compact()
 								# Compact a single provider
 								provider.db_compact()
 								# Use a one-off policy without changing the instance default
 								provider.db_compact(compact_tiers=[
 								    (to_duration("7 days"), to_duration("1 hour"))
 								])
 								```
 								### Interaction with Vacuum
 								Compaction and vacuum are complementary and should always run in this order:
 								```text
 								compact → vacuum
 								```
 								Compact first so that vacuum operates on already-downsampled records. This produces cleaner
 								retention boundaries and ensures the vacuum cutoff falls on hour-aligned timestamps rather
 								than arbitrary sub-minute ones. Running them in reverse order (vacuum then compact) wastes
 								work: vacuum may delete records that compaction would have downsampled and kept.
 								The `RetentionManager` registers both jobs and ensures compaction always runs before vacuum
 								within the same maintenance window.
 								## Vacuum Operation
 								Remove old records to reclaim space:
 								```python
 								db_vacuum(keep_hours=48)        # Keep last 48 hours
 								db_vacuum(keep_timestamp=cutoff) # Keep from cutoff onward
 								```
 								**Strategy:**
 								- Computes cutoff relative to `max_timestamp - keep_hours`
 								- Deletes all records before cutoff
 								- Immediately persists changes via `db_save_records()`
 								## Thread Safety
 								- **LMDB:** Internal lock protects write transactions; reads are lock-free via MVCC
 								- **SQLite:** Lock guards all operations (autocommit mode eliminates transaction deadlocks)
 								- **Record Protocol:** No internal locking (assumes single-threaded access per instance)
 								## Performance Characteristics
 								| Operation | LMDB | SQLite |
 								|---|---|---|
 								| Sequential read | Excellent (mmap) | Good (indexed) |
 								| Random read | Excellent (mmap) | Good (B-tree) |
 								| Bulk write | Excellent (single txn) | Good (batch insert) |
 								| Range query | Excellent (cursor) | Good (indexed scan) |
 								| Disk usage | Moderate (pre-allocated) | Compact (auto-grow) |
 								| Concurrency | High (MVCC readers) | Low (write serialization) |
 								**Recommendation:** Use LMDB for high-frequency time-series workloads;
 								SQLite for portability and simpler deployment.
 								## Example Usage
 								```python
 								# Configuration
 								config.database.provider = "LMDB"
 								config.database.compression_level = 9
 								config.database.initial_load_window_h = 24  # Load last 24h initially
 								config.database.keep_duration_h = 720       # Retain 30 days
 								config.database.compaction_interval_sec = 604800  # Compact weekly
 								# Access (automatic singleton initialization)
 								class MeasurementData(DatabaseRecordProtocolMixin):
 								    records: list[MeasurementRecord] = []
 								    def db_namespace(self) -> str:
 								        return "measurement"
 								# Operations
 								measurement = MeasurementData()
 								# Lazy load on first access
 								record = measurement.db_get_record(
 								    DatabaseTimestamp.from_datetime(now),
 								    time_window=Duration(hours=1)
 								)
 								# Insert new record
 								measurement.db_insert_record(new_record)
 								# Automatic save (if autosave configured) or manual
 								measurement.db_save_records()
 								# Maintenance pipeline (normally handled by RetentionManager)
 								measurement.db_compact()    # downsample old records first
 								measurement.db_vacuum(keep_hours=720)  # then delete beyond retention
 								```