Rename test_g6_protocol_surface_gaps.py to test_protocol_surface_gaps.py.
Strip G6 from module docstring, and finding IDs (N01, N02, N03, N05,
N09, R04) from docstrings and section comments.
Rename test_g5_asyncio_lifecycle.py to test_asyncio_lifecycle.py.
Strip G5 from module docstring, finding IDs (F05, F07, F08, F19)
from class names and docstrings.
Rename test_g4_transport_symmetry.py to test_transport_symmetry.py.
Strip G4 from module docstring, _g4_ from function names, and finding
IDs (F04, NEW-A, F18, M06, N04) from docstrings and section comments.
Rename test_g2_error_handling.py to test_error_handling.py. Strip G2
prefix from module docstring, _g2_ from function names, and finding
IDs (F22, F21/M01, M02, M04, N06, F14) from docstrings and section
comments. Proposal cross-references removed.
Strip internal forensics finding references (F06, N07, NEW-C, R02)
from docstrings, comments, and assertion messages. Descriptive text
is preserved — only the ID prefixes are removed.
Strip internal forensics finding references (F01, F02, F03, N11)
from docstrings and section comments. The descriptive text is
preserved — only the ID prefixes are removed.
Strip G1/ prefix from docstrings, _g1_ from function names,
and G1 from section comments. Finding IDs (F06, N07, etc.)
are preserved as they are meaningful standalone references.
The mock_dispatcher fixture's fake_subscribe recorded event handlers
but never called them, causing asyncio.wait() to block for the full
DEFAULT_TIMEOUT (15s) on every test that passes expected_events to
send(). With 28 affected tests, the suite wasted ~8 minutes on dead
waits and required an undocumented pytest-timeout plugin to complete.
Add call_soon to the default fake_subscribe so futures resolve on the
next event loop iteration, matching the pattern already used by
setup_event_response(). Override with a non-resolving mock in
test_send_timeout to preserve timeout path coverage.
Suite now completes in <1 second with no --timeout flag.
Why: Five CommandType enum entries had no user-facing SDK method:
SEND_RAW_DATA (25), HAS_CONNECTION (28), GET_CONTACT_BY_KEY (30),
GET_TUNING_PARAMS (43), FACTORY_RESET (51). Added send_raw_data() in
MessagingCommands, has_connection()/get_tuning()/request_factory_reset()/
confirm_factory_reset() in DeviceCommands, get_contact_by_key() in
ContactCommands. FACTORY_RESET uses a two-step token pattern as a
Python-side safety measure against accidental invocation.
Refs: Forensics report finding N09 (also N03 for get_tuning)
Why: Firmware requires strict len > 10 (MyMesh.cpp:1620). When path is
empty, send_trace() builds exactly 10 bytes (cmd+tag+auth+flags), which
is silently rejected. Appending one zero byte when the packet is <= 10
bytes satisfies the firmware gate without changing the semantic content.
Refs: Forensics report finding N05
Why: Three firmware push/response codes had no SDK handler — frames fell
through to the "Unhandled" debug log. CONTACT_DELETED (0x8F) carries a
32-byte pubkey from onContactOverwrite(); CONTACTS_FULL (0x90) is a
1-byte push from onContactsFull(); TUNING_PARAMS (23) is the 9-byte
response to CMD_GET_TUNING_PARAMS carrying rx_delay and airtime_factor.
All three now dispatch typed events. Short frames are guarded.
Refs: Forensics report findings N01, N02, N03
Why: PacketType was missing CONTACTS_FULL (0x90), emitted by
MyMesh::onContactsFull(). CommandType was missing GET_STATS (56),
used by get_stats_core/radio/packets but referenced only as literal
b"\\x38". Adding both enum entries prepares for handler and wrapper
implementations in subsequent commits.
Refs: Forensics report findings N02, R04
10 new tests in tests/unit/test_g5_asyncio_lifecycle.py:
- TestF05: _spawn_background retains tasks in TCP, Serial, and
EventDispatcher; tracked tasks survive gc.collect(); TCP handle_rx
and connection_lost use tracked dispatch.
- TestF07: stop() waits for in-flight async callbacks to complete.
- TestF08: EventDispatcher.queue is None before start(), created on
start(), dispatch() before start() raises RuntimeError;
CommandHandlerBase lock is None before access, created lazily.
- TestF19: send() calls get_running_loop (not get_event_loop).
Refs: Forensics report findings F05, F07, F08, F19
Why: asyncio.get_event_loop() inside an async function emits
DeprecationWarning since Python 3.10 and raises in some contexts on
Python 3.12+. The call in CommandHandlerBase.send() is always inside
a running async context where get_running_loop() is the correct API.
Refs: Forensics report finding F19
Why: On Python 3.9/3.10, asyncio.Queue() and asyncio.Lock() bind to
the running event loop at construction time. If the SDK is instantiated
from a synchronous factory before an event loop exists, both primitives
raise "RuntimeError: ... is bound to a different event loop" on first
use. Fix: EventDispatcher defers Queue creation to start(), with a
guard in dispatch() that raises RuntimeError if called before start().
CommandHandlerBase defers Lock creation via a lazy @property accessor.
Both document the contract change in class docstrings.
Refs: Forensics report finding F08
Why: EventDispatcher._process_events() calls task_done() on the queue
immediately after spawning async callback tasks. await queue.join() in
stop() therefore returns as soon as all items are marked done, even if
their async callbacks are still executing. Any caller that does
"await dispatcher.stop(); cleanup()" could race with still-running
callbacks. Fix: after queue.join(), gather all tracked background tasks
before cancelling the dispatch loop.
Refs: Forensics report finding F07
Why: Python's asyncio holds only weak references to tasks created via
create_task(). Under GC pressure (especially Python < 3.11), unretained
tasks can be silently cancelled mid-execution, and any exceptions are
swallowed as "Task exception was never retrieved." Seven call sites
across TCPConnection, BLEConnection, SerialConnection, and
EventDispatcher used fire-and-forget create_task with no stored
reference. Fix: introduce _background_tasks set and _spawn_background()
helper on each class, following the standard pattern from the asyncio
docs (task.add_done_callback(set.discard)).
Refs: Forensics report finding F05
10 new tests in tests/unit/test_g4_transport_symmetry.py covering all
G4 findings:
- test_g4_tcp_send_write_error_fires_disconnect (F04): TCP write
OSError fires _disconnect_callback.
- test_g4_serial_send_no_transport_fires_disconnect (NEW-A): serial
send on None transport fires _disconnect_callback.
- test_g4_serial_send_write_error_fires_disconnect (F04): serial write
OSError fires _disconnect_callback.
- test_g4_ble_send_no_client_fires_disconnect (F04): BLE send with no
client fires _disconnect_callback.
- test_g4_serial_connect_timeout (F18): connect raises TimeoutError
when connection_made never fires.
- test_g4_tcp_oversize_frame_empty_data_returns (M06): oversize header
with empty trailing data returns without dispatch.
- test_g4_serial_oversize_frame_empty_data_returns (M06): same for
serial transport.
- test_g4_tcp_receive_count_per_frame_not_per_segment (N04): 3 TCP
segments carrying 1 frame yield _receive_count == 1.
- test_g4_tcp_multiple_frames_count_correctly (N04): 2 complete frames
yield _receive_count == 2.
F16 and F17 are covered by the updated pre-existing test in
tests/test_ble_pin_pairing.py (committed with F17).
Refs: Forensics report findings F04, NEW-A, F18, M06, N04
Why: _receive_count incremented inside data_received(), which fires
once per TCP read — not once per completed MeshCore frame. Under TCP
fragmentation a single frame can arrive in multiple segments, inflating
_receive_count relative to _send_count. The disconnect-detection
heuristic (send_count - receive_count >= 5) then never fires because
the receive side is over-counted. Moving the increment into handle_rx
after a complete frame is assembled makes the counter semantically
correct: one increment per MeshCore frame dispatched to the reader.
Refs: Forensics report finding N04
Why: When BLE pairing failed during connect(), the exception was caught
as a warning and connect() continued normally. This left the transport
in a half-usable state — the BLE link was up but the pairing handshake
never completed, so encrypted characteristics could silently fail.
Now the pairing exception disconnects the client and re-raises, giving
the caller a clean failure. Updated the pre-existing
test_ble_connection_with_pin_failed_pairing test to assert the re-raise
behavior instead of the old swallow-and-continue behavior.
Refs: Forensics report finding F17
Why: handle_disconnect resets self.client to self._user_provided_client.
That client's disconnected_callback was not re-registered, so subsequent
BLE disconnects after a successful reconnect cycle were missed —
ConnectionManager never learned the link dropped again. Now re-registers
via set_disconnected_callback with a hasattr guard and try/except for
bleak version compatibility. The next connect() call would also re-create
the client with the callback, but this closes the gap between disconnect
and reconnect.
Refs: Forensics report finding F16
Why: When handle_rx detects an oversize frame header (>300 bytes), it
resets state (header, inframe, frame_expected_size) and re-calls itself
on any remaining data. But when data is empty after the header, the
method fell through to the frame-assembly code with inframe=b"",
eventually dispatching an empty frame to reader.handle_rx. Currently
absorbed by F06's umbrella try/except, but a guaranteed crash if that
guard moves. Adding a bare return after the reset block prevents the
fallthrough in both TCP and serial transports.
Refs: Forensics report finding M06
Why: After create_serial_connection, connect() awaited
_connected_event.wait() with no timeout. If the serial device opened
but connection_made was never called (driver bug, USB adapter glitch),
connect() hung indefinitely. Now wrapped in asyncio.wait_for with a
configurable timeout (default 10s). asyncio.TimeoutError propagates
to the caller for clean failure handling.
Refs: Forensics report finding F18
Why: TCP was the only transport that fired _disconnect_callback on a dead
transport or send failure. Serial and BLE returned silently, leaving
ConnectionManager unaware the link was down. Additionally, transport.write()
(TCP/serial) and write_gatt_char (BLE) had no try/except — any write-time
exception escaped the send() coroutine unhandled. This commit makes all three
transports symmetric: (a) fire _disconnect_callback when transport is None/dead,
(b) wrap the write call in try/except and fire _disconnect_callback on failure.
Refs: Forensics report findings F04, NEW-A
Why: send_appstart() only expected [SELF_INFO]. If firmware returned
RESP_CODE_ERR (version mismatch, unsupported feature flag), wait_for_event
never matched and the command hung until DEFAULT_TIMEOUT (5s) fired.
Bootstrap is called on every initial connect, so a 5s hang on error was
user-visible. Now expects [SELF_INFO, ERROR] so firmware errors are
returned immediately as Event objects instead of timing out.
Refs: Forensics report finding M02
Why: When send_msg() returned an ERROR event (e.g. firmware rejected
the send), the error-check logged the failure but did not return or
continue. Execution fell through to result.payload["expected_ack"],
which raised KeyError because the ERROR payload is {"reason": "..."}.
The retry loop — the entire purpose of this function — never ran.
Now the ERROR path increments attempt counters and continues the
loop, preserving the retry semantics the function name promises.
Refs: Forensics report findings F21, M01
Why: wait_for_event matches a single EventType; when callers pass
[X, ERROR] to send() or wait_for_events, the return value may be an
error response whose payload is {"reason": "..."} — not the command-
specific keys the caller expects. Without a documented contract and
a convenience helper, every call site independently forgets to check
.type before accessing payload keys, leading to KeyError (F21/M01,
M04) or silent fallthrough. The is_error() helper and docstrings on
send()/wait_for_events() establish the contract that subsequent
commits in this branch rely on.
Refs: Forensics report finding F22
Seven new tests in tests/unit/test_connection_manager.py covering all
four G3 findings:
- test_g3_tcp_connect_returns_plain_string (F01): CONNECTED event
payload contains a plain string, not an asyncio.Future.
- test_g3_reconnect_loop_does_not_compound (F03): after max_attempts
failures, exactly that many connect() calls are made — no fan-out.
- test_g3_disconnect_cancels_reconnect_loop (F03): disconnect()
mid-loop cancels the single task cleanly.
- test_g3_reconnect_callback_called_after_reconnect (F02): callback
is invoked after a successful reconnect.
- test_g3_reconnect_callback_failure_does_not_crash_loop (F02):
callback exception is logged, reconnect still succeeds.
- test_g3_connect_none_is_soft_failure (N11): connect() returning
None does not set _is_connected or emit CONNECTED.
- test_g3_no_reconnect_callback_is_noop (N11/F02): no callback
provided — reconnect works, backwards-compatible.
Refs: Forensics report findings F01, F02, F03, N11
ConnectionManager._attempt_reconnect called self.connection.connect()
directly, bypassing MeshCore.connect() which runs send_appstart().
Firmware requires CMD_APP_START after every transport-level connection
to initialize the session. Without it, the reconnected transport has
no active session — sends go unanswered, tcp_no_response fires after
5 attempts, handle_disconnect re-enters _attempt_reconnect, and the
reconnect storm begins.
Fix: add an optional reconnect_callback parameter to
ConnectionManager.__init__. MeshCore passes self._on_reconnect which
calls send_appstart() after the transport reconnects. The callback
is invoked inside _attempt_reconnect immediately after a successful
connect(), before the CONNECTED event is emitted. Callback failures
are logged as warnings but do not break the reconnect — the transport
is up regardless. Default None keeps the API backwards-compatible
for direct ConnectionManager users.
Refs: Forensics report finding F02
_attempt_reconnect previously tail-recursed via asyncio.create_task,
re-assigning self._reconnect_task from inside the running coroutine.
This orphaned the current task — disconnect() cancelled only the
newest pointer, leaving previous-generation attempts in flight. Those
orphaned tasks could set _is_connected = True after the caller thought
the session was closed.
Fix: replace with a single iterative while loop that holds one
persistent task for the entire reconnect session. The task is created
once in handle_disconnect and torn down only on success, max attempts
exhausted, or disconnect() cancellation.
Refs: Forensics report finding F03
The three transports had inconsistent connect() return contracts:
TCP returned an asyncio.Future (fixed by F01), serial returned
self.port (always truthy), BLE returned self.address or None.
ConnectionManager's success check `if result is not None:` was
tautological for TCP and serial. With F01 fixed, the check is now
meaningful for all three.
Add a comprehensive docstring to ConnectionProtocol documenting the
contract: return truthy on success (included in CONNECTED event
payload), return None for soft failure (retry), or raise for hard
failure (also retry, logged). Also import Awaitable for the F02
reconnect_callback type hint that follows.
Refs: Forensics report finding N11
TCPConnection.connect() returned a resolved asyncio.Future wrapping
self.host instead of the plain string. ConnectionManager put this
Future directly into the CONNECTED event payload, which crashed any
downstream serializer (e.g. HA recorder's sanitize_event_data) that
tried to walk the payload dict. BLE and serial already return plain
strings. Fix: delete the Future creation and return self.host
directly.
Refs: Forensics report finding F01
A BATTERY frame with len(data) < 3 caused dbuf.read(2) to return short
bytes; int.from_bytes(b"", ...) silently yielded 0, propagating a bogus
level=0 to HA sensors. Same silent-zero class as N07 (storage fields).
Option B: early-return with debug log, matching the NEW-C pattern for
STATUS_RESPONSE. No BATTERY event is dispatched for malformed frames.
Not in the original forensics report — discovered during G1 N07 work and
logged in issues_log.md. Resolved here because no later branch touches
this handler.
Files changed:
- src/meshcore/reader.py: add `if len(data) < 3: return` guard before
the level read in the BATTERY branch
- tests/unit/test_reader.py: add test_g1_battery_too_short_for_level —
sends a 1-byte frame (type only), asserts no BATTERY event dispatched
and debug log emitted
Adds the four unit tests required by proposal §4.1 verification:
(a) test_g1_handle_rx_malformed_frame_logged_and_swallowed — F06.
Sends a 4-byte CHANNEL_MSG_RECV_V3 frame missing the channel_idx
byte. The handler raises IndexError on dbuf.read(1)[0]; the F06
umbrella must catch it, log "handle_rx parse error" with a
traceback, and return cleanly without dispatching any event.
(b) test_g1_battery_short_frame_omits_storage_fields — N07.
Sends a 3-byte BATTERY frame (type + 2-byte level only). Verifies
that exactly one BATTERY event is dispatched, the payload contains
`level` but does NOT contain `used_kb` or `total_kb`. Pre-fix the
`len(data) > 3` gate would have produced both fields with bogus
silent zeros.
(c) test_g1_status_response_short_frame_skipped — NEW-C.
Sends a 30-byte STATUS_RESPONSE push frame (well below the 60-byte
minimum). Verifies that no STATUS_RESPONSE event is dispatched and
the new "STATUS_RESPONSE push frame too short" debug log fires.
(d) test_g1_parse_packet_payload_txt_type_decodes_high_bits — R02.
Sets up a synthetic channel with a known 16-byte AES key, encrypts
a 16-byte plaintext where byte 4 = (5 << 2) | 1 = 0x15, builds the
full pkt_payload (chan_hash + cipher_mac + ciphertext), wraps it in
a minimal route header, calls parsePacketPayload directly, and
asserts log_data["txt_type"] == 5 and log_data["attempt"] == 1.
Pre-R02-fix `uncrypted[4:4]` was the empty slice, so txt_type was
always 0 — this test would have failed with txt_type=0.
Adds a small _CapturingDispatcher helper class. Adds imports for
logging at the top of the file. The existing test_binary_response is
left untouched.
File: tests/unit/test_reader.py
Three small cleanup fixes bundled per proposal §4.1 commit order.
N08 — CONTROL_DATA empty payload guard. The handler reads
`payload = dbuf.read()` then immediately dereferences `payload[0]`
without checking length. A zero-length payload (firmware truncation
or garbled frame) raises IndexError. Pre-F06 the IndexError would
escape; post-F06 it would log and skip the dispatch via the umbrella.
Adding an explicit `if len(payload) == 0: return` after the read
short-circuits the empty case before it touches `payload[0]`, with a
debug log noting the empty payload. The `return` exits handle_rx
cleanly without engaging the F06 umbrella's parse-error path, which
is the correct behavior — an empty CONTROL_DATA frame is not a parse
error, it's an unusable frame.
F12 — print(res) leftover debug. The RAW_DATA handler had a stray
`print(res)` polluting stdout. Replaced with `logger.debug(res)` to
match the surrounding `logger.debug("Received raw data")` line.
N10 — magic numbers 16 and 17. Two `elif packet_type_value == 16/17`
branches hardcoded the integer values for CONTACT_MSG_RECV_V3 and
CHANNEL_MSG_RECV_V3, both already declared in packets.py:94-95.
Replaced with `PacketType.CONTACT_MSG_RECV_V3.value` and
`PacketType.CHANNEL_MSG_RECV_V3.value` to eliminate drift risk if
the enum is ever renumbered.
Findings: N08 (Info), F12 (Info), N10 (Info)
File: src/meshcore/reader.py
`uncrypted[4:4]` is the empty slice. `int.from_bytes(b"", "little")`
returns 0, so `txt_type` was always 0 for every decrypted channel
message — silently masking the upper 6 bits of byte 4. The line
immediately above (`attempt = uncrypted[4] & 3`) already proves byte
4 is in range, so widening the slice to `[4:5]` is safe.
This is a one-character fix and changes the observable value of
`txt_type` for all callers. Existing consumers that branched on
`txt_type` were effectively dead code; this restores the intended
behavior.
Finding: R02 (Info)
File: src/meshcore/meshcore_parser.py
The ADVERT branch in MeshcorePacketParser.parsePacketPayload reads the
flags byte with `pk_buf.read(1)[0]`, which IndexErrors on a short
advert payload (the minimum advert is 32 + 4 + 64 + 1 = 101 bytes
before any optional fields). Pre-F06, the IndexError would escape as a
swallowed task exception. With F06's umbrella now in place it would
log and skip the dispatch, but the proposal §4.1 NEW-B asks for a
narrower local guard so a malformed advert doesn't poison the rest of
the parse path.
The optional `lat/lon/feat1/feat2` reads after the flags byte also
silently produce zeros on short reads (`int.from_bytes(b"", ...)`
returns 0), which would propagate bogus zero coordinates upstream.
Wrapping the whole branch limits the blast radius to a single
malformed advert.
Wrap the entire body of the ADVERT elif (from `pk_buf = io.BytesIO(...)`
through the final `log_data["adv_feat2"]` assignment) in
`try/except (IndexError, ValueError)` and log a debug message with the
exception type, message, and `pkt_payload` length on failure. This
matches the defensive pattern the proposal specifies.
Finding: NEW-B (S3)
File: src/meshcore/meshcore_parser.py
Add a 2-byte hard-floor length check at the top of
MeshcorePacketParser.parsePacketPayload. The minimum viable payload is 1
header byte + 1 path_byte for a direct route; anything shorter would
crash on `path_byte = pbuf.read(1)[0]` (IndexError on the empty buffer).
The reader.py LOG_DATA branch only requires `len(data) > 3`, so a 4-byte
LOG_DATA frame produces a 1-byte payload here — that path is reachable.
The caller in reader.py dereferences log_data['route_type'],
['payload_type'], ['path_len'], and ['path'] immediately after the
parse, so an empty log_data would KeyError on the dispatch (caught by
the F06 umbrella, but the dispatch would still be skipped). Populate
sentinel values matching the existing route_typename = "UNK" pattern in
the function before the early return so the caller's downstream lookups
don't KeyError, then return early with a debug log.
Finding: R01 (S2)
File: src/meshcore/meshcore_parser.py