Rename test_g5_asyncio_lifecycle.py to test_asyncio_lifecycle.py.
Strip G5 from module docstring, finding IDs (F05, F07, F08, F19)
from class names and docstrings.
Rename test_g2_error_handling.py to test_error_handling.py. Strip G2
prefix from module docstring, _g2_ from function names, and finding
IDs (F22, F21/M01, M02, M04, N06, F14) from docstrings and section
comments. Proposal cross-references removed.
Strip internal forensics finding references (F01, F02, F03, N11)
from docstrings and section comments. The descriptive text is
preserved — only the ID prefixes are removed.
The mock_dispatcher fixture's fake_subscribe recorded event handlers
but never called them, causing asyncio.wait() to block for the full
DEFAULT_TIMEOUT (15s) on every test that passes expected_events to
send(). With 28 affected tests, the suite wasted ~8 minutes on dead
waits and required an undocumented pytest-timeout plugin to complete.
Add call_soon to the default fake_subscribe so futures resolve on the
next event loop iteration, matching the pattern already used by
setup_event_response(). Override with a non-resolving mock in
test_send_timeout to preserve timeout path coverage.
Suite now completes in <1 second with no --timeout flag.
10 new tests in tests/unit/test_g5_asyncio_lifecycle.py:
- TestF05: _spawn_background retains tasks in TCP, Serial, and
EventDispatcher; tracked tasks survive gc.collect(); TCP handle_rx
and connection_lost use tracked dispatch.
- TestF07: stop() waits for in-flight async callbacks to complete.
- TestF08: EventDispatcher.queue is None before start(), created on
start(), dispatch() before start() raises RuntimeError;
CommandHandlerBase lock is None before access, created lazily.
- TestF19: send() calls get_running_loop (not get_event_loop).
Refs: Forensics report findings F05, F07, F08, F19
Why: asyncio.get_event_loop() inside an async function emits
DeprecationWarning since Python 3.10 and raises in some contexts on
Python 3.12+. The call in CommandHandlerBase.send() is always inside
a running async context where get_running_loop() is the correct API.
Refs: Forensics report finding F19
Why: On Python 3.9/3.10, asyncio.Queue() and asyncio.Lock() bind to
the running event loop at construction time. If the SDK is instantiated
from a synchronous factory before an event loop exists, both primitives
raise "RuntimeError: ... is bound to a different event loop" on first
use. Fix: EventDispatcher defers Queue creation to start(), with a
guard in dispatch() that raises RuntimeError if called before start().
CommandHandlerBase defers Lock creation via a lazy @property accessor.
Both document the contract change in class docstrings.
Refs: Forensics report finding F08
Why: EventDispatcher._process_events() calls task_done() on the queue
immediately after spawning async callback tasks. await queue.join() in
stop() therefore returns as soon as all items are marked done, even if
their async callbacks are still executing. Any caller that does
"await dispatcher.stop(); cleanup()" could race with still-running
callbacks. Fix: after queue.join(), gather all tracked background tasks
before cancelling the dispatch loop.
Refs: Forensics report finding F07
Why: Python's asyncio holds only weak references to tasks created via
create_task(). Under GC pressure (especially Python < 3.11), unretained
tasks can be silently cancelled mid-execution, and any exceptions are
swallowed as "Task exception was never retrieved." Seven call sites
across TCPConnection, BLEConnection, SerialConnection, and
EventDispatcher used fire-and-forget create_task with no stored
reference. Fix: introduce _background_tasks set and _spawn_background()
helper on each class, following the standard pattern from the asyncio
docs (task.add_done_callback(set.discard)).
Refs: Forensics report finding F05
Why: send_appstart() only expected [SELF_INFO]. If firmware returned
RESP_CODE_ERR (version mismatch, unsupported feature flag), wait_for_event
never matched and the command hung until DEFAULT_TIMEOUT (5s) fired.
Bootstrap is called on every initial connect, so a 5s hang on error was
user-visible. Now expects [SELF_INFO, ERROR] so firmware errors are
returned immediately as Event objects instead of timing out.
Refs: Forensics report finding M02
Why: When send_msg() returned an ERROR event (e.g. firmware rejected
the send), the error-check logged the failure but did not return or
continue. Execution fell through to result.payload["expected_ack"],
which raised KeyError because the ERROR payload is {"reason": "..."}.
The retry loop — the entire purpose of this function — never ran.
Now the ERROR path increments attempt counters and continues the
loop, preserving the retry semantics the function name promises.
Refs: Forensics report findings F21, M01
Why: wait_for_event matches a single EventType; when callers pass
[X, ERROR] to send() or wait_for_events, the return value may be an
error response whose payload is {"reason": "..."} — not the command-
specific keys the caller expects. Without a documented contract and
a convenience helper, every call site independently forgets to check
.type before accessing payload keys, leading to KeyError (F21/M01,
M04) or silent fallthrough. The is_error() helper and docstrings on
send()/wait_for_events() establish the contract that subsequent
commits in this branch rely on.
Refs: Forensics report finding F22
Seven new tests in tests/unit/test_connection_manager.py covering all
four G3 findings:
- test_g3_tcp_connect_returns_plain_string (F01): CONNECTED event
payload contains a plain string, not an asyncio.Future.
- test_g3_reconnect_loop_does_not_compound (F03): after max_attempts
failures, exactly that many connect() calls are made — no fan-out.
- test_g3_disconnect_cancels_reconnect_loop (F03): disconnect()
mid-loop cancels the single task cleanly.
- test_g3_reconnect_callback_called_after_reconnect (F02): callback
is invoked after a successful reconnect.
- test_g3_reconnect_callback_failure_does_not_crash_loop (F02):
callback exception is logged, reconnect still succeeds.
- test_g3_connect_none_is_soft_failure (N11): connect() returning
None does not set _is_connected or emit CONNECTED.
- test_g3_no_reconnect_callback_is_noop (N11/F02): no callback
provided — reconnect works, backwards-compatible.
Refs: Forensics report findings F01, F02, F03, N11
ConnectionManager._attempt_reconnect called self.connection.connect()
directly, bypassing MeshCore.connect() which runs send_appstart().
Firmware requires CMD_APP_START after every transport-level connection
to initialize the session. Without it, the reconnected transport has
no active session — sends go unanswered, tcp_no_response fires after
5 attempts, handle_disconnect re-enters _attempt_reconnect, and the
reconnect storm begins.
Fix: add an optional reconnect_callback parameter to
ConnectionManager.__init__. MeshCore passes self._on_reconnect which
calls send_appstart() after the transport reconnects. The callback
is invoked inside _attempt_reconnect immediately after a successful
connect(), before the CONNECTED event is emitted. Callback failures
are logged as warnings but do not break the reconnect — the transport
is up regardless. Default None keeps the API backwards-compatible
for direct ConnectionManager users.
Refs: Forensics report finding F02
_attempt_reconnect previously tail-recursed via asyncio.create_task,
re-assigning self._reconnect_task from inside the running coroutine.
This orphaned the current task — disconnect() cancelled only the
newest pointer, leaving previous-generation attempts in flight. Those
orphaned tasks could set _is_connected = True after the caller thought
the session was closed.
Fix: replace with a single iterative while loop that holds one
persistent task for the entire reconnect session. The task is created
once in handle_disconnect and torn down only on success, max attempts
exhausted, or disconnect() cancellation.
Refs: Forensics report finding F03
The three transports had inconsistent connect() return contracts:
TCP returned an asyncio.Future (fixed by F01), serial returned
self.port (always truthy), BLE returned self.address or None.
ConnectionManager's success check `if result is not None:` was
tautological for TCP and serial. With F01 fixed, the check is now
meaningful for all three.
Add a comprehensive docstring to ConnectionProtocol documenting the
contract: return truthy on success (included in CONNECTED event
payload), return None for soft failure (retry), or raise for hard
failure (also retry, logged). Also import Awaitable for the F02
reconnect_callback type hint that follows.
Refs: Forensics report finding N11
TCPConnection.connect() returned a resolved asyncio.Future wrapping
self.host instead of the plain string. ConnectionManager put this
Future directly into the CONNECTED event payload, which crashed any
downstream serializer (e.g. HA recorder's sanitize_event_data) that
tried to walk the payload dict. BLE and serial already return plain
strings. Fix: delete the Future creation and return self.host
directly.
Refs: Forensics report finding F01
- Update mock dispatcher to use subscribe-before-send pattern matching
the rewritten CommandHandler.send() method
- Use 32-byte pubkeys in tests for commands that now require
prefix_length=32 (login, logout, statusreq, reset_path, share/export/remove contact)
- Fix send_trace test path format to match flags=1 (2-byte path hashes)
- Update LPP current test to expect signed wrap for values > 32.767
- Fix BinaryReqType import (moved from meshcore.parsing to meshcore.packets)
- Fix register_binary_request call signature (added pubkey_prefix param)
- Update timeout test to expect 'no_event_received' instead of 'timeout'
Add warnings to send_login, send_statusreq, send_telemetry_req, and
send_path_discovery pointing users to their _sync counterparts. The
fire-and-forget versions bypass the mesh request lock and can cause
silent response drops due to firmware clearPendingReqs() behavior.
The companion firmware can only track one outstanding mesh request at a
time — clearPendingReqs() zeros all pending response flags before each
outgoing mesh request. Overlapping mesh commands cause silent response
drops.
Adds _mesh_request_lock to CommandHandlerBase and wraps all _sync
methods with it. Also adds send_login_sync and send_path_discovery_sync
for complete round-trip serialization of those commands.
Local commands (get_bat, get_channel, set_time, send_msg, etc.) are
unaffected — they don't trigger clearPendingReqs() on the firmware.