Commit Graph

320 Commits

Author SHA1 Message Date
fdlamotte
1c08569f07 Merge pull request #75 from mwolter805/fix/reconnect-path
fix: resolve reconnect storm — TCP Future return, missing appstart, task overwrite race
2026-04-25 15:02:30 +02:00
fdlamotte
df6cec1d0b Merge pull request #78 from mwolter805/fix/asyncio-lifecycle
fix: track background tasks, defer Queue/Lock construction, use get_running_loop
2026-04-25 15:00:44 +02:00
fdlamotte
ba6dcd459e Merge pull request #73 from mwolter805/fix/test-timeout-waste
test: resolve mock_dispatcher futures to drop suite runtime from ~8 min to <1s
2026-04-25 14:53:27 +02:00
Florent
ac7b035b9e solve issue when login fails 2026-04-21 08:03:33 +02:00
Matthew Wolter
4fddbffa3d Remove internal references from asyncio lifecycle tests
Rename test_g5_asyncio_lifecycle.py to test_asyncio_lifecycle.py.
Strip G5 from module docstring, finding IDs (F05, F07, F08, F19)
from class names and docstrings.
2026-04-12 07:57:23 -07:00
Matthew Wolter
f3aa131019 Remove finding IDs from test_connection_manager.py
Strip internal forensics finding references (F01, F02, F03, N11)
from docstrings and section comments. The descriptive text is
preserved — only the ID prefixes are removed.
2026-04-12 07:54:02 -07:00
Matthew Wolter
9e2fc0d63e Remove internal G-numbering from test_connection_manager.py
Strip G3 from module docstring and _g3_ from function names.
Finding IDs (F01, F02, F03, N11) are preserved.
2026-04-12 07:53:10 -07:00
Matthew Wolter
75c4a58841 Fix test fixture to resolve events immediately instead of blocking
The mock_dispatcher fixture's fake_subscribe recorded event handlers
but never called them, causing asyncio.wait() to block for the full
DEFAULT_TIMEOUT (15s) on every test that passes expected_events to
send(). With 28 affected tests, the suite wasted ~8 minutes on dead
waits and required an undocumented pytest-timeout plugin to complete.

Add call_soon to the default fake_subscribe so futures resolve on the
next event loop iteration, matching the pattern already used by
setup_event_response(). Override with a non-resolving mock in
test_send_timeout to preserve timeout path coverage.

Suite now completes in <1 second with no --timeout flag.
2026-04-12 06:57:53 -07:00
Matthew Wolter
7b459aa6a5 G5: add verification tests for F05, F07, F08, F19
10 new tests in tests/unit/test_g5_asyncio_lifecycle.py:

- TestF05: _spawn_background retains tasks in TCP, Serial, and
  EventDispatcher; tracked tasks survive gc.collect(); TCP handle_rx
  and connection_lost use tracked dispatch.
- TestF07: stop() waits for in-flight async callbacks to complete.
- TestF08: EventDispatcher.queue is None before start(), created on
  start(), dispatch() before start() raises RuntimeError;
  CommandHandlerBase lock is None before access, created lazily.
- TestF19: send() calls get_running_loop (not get_event_loop).

Refs: Forensics report findings F05, F07, F08, F19
2026-04-12 03:57:35 -07:00
Matthew Wolter
1b404221a2 G5: F19 — replace deprecated get_event_loop with get_running_loop
Why: asyncio.get_event_loop() inside an async function emits
DeprecationWarning since Python 3.10 and raises in some contexts on
Python 3.12+. The call in CommandHandlerBase.send() is always inside
a running async context where get_running_loop() is the correct API.

Refs: Forensics report finding F19
2026-04-12 03:57:20 -07:00
Matthew Wolter
b4cd5840ab G5: F08 — defer asyncio.Queue and asyncio.Lock construction
Why: On Python 3.9/3.10, asyncio.Queue() and asyncio.Lock() bind to
the running event loop at construction time. If the SDK is instantiated
from a synchronous factory before an event loop exists, both primitives
raise "RuntimeError: ... is bound to a different event loop" on first
use. Fix: EventDispatcher defers Queue creation to start(), with a
guard in dispatch() that raises RuntimeError if called before start().
CommandHandlerBase defers Lock creation via a lazy @property accessor.
Both document the contract change in class docstrings.

Refs: Forensics report finding F08
2026-04-12 03:57:06 -07:00
Matthew Wolter
d4581a8e13 G5: F07 — await in-flight async callbacks before stop() returns
Why: EventDispatcher._process_events() calls task_done() on the queue
immediately after spawning async callback tasks. await queue.join() in
stop() therefore returns as soon as all items are marked done, even if
their async callbacks are still executing. Any caller that does
"await dispatcher.stop(); cleanup()" could race with still-running
callbacks. Fix: after queue.join(), gather all tracked background tasks
before cancelling the dispatch loop.

Refs: Forensics report finding F07
2026-04-12 03:56:28 -07:00
Matthew Wolter
26141d0353 G5: F05 — track fire-and-forget asyncio.create_task references
Why: Python's asyncio holds only weak references to tasks created via
create_task(). Under GC pressure (especially Python < 3.11), unretained
tasks can be silently cancelled mid-execution, and any exceptions are
swallowed as "Task exception was never retrieved." Seven call sites
across TCPConnection, BLEConnection, SerialConnection, and
EventDispatcher used fire-and-forget create_task with no stored
reference. Fix: introduce _background_tasks set and _spawn_background()
helper on each class, following the standard pattern from the asyncio
docs (task.add_done_callback(set.discard)).

Refs: Forensics report finding F05
2026-04-12 03:56:09 -07:00
Matthew Wolter
073fa26aa0 G3: add verification tests for F01, F02, F03, N11
Seven new tests in tests/unit/test_connection_manager.py covering all
four G3 findings:

- test_g3_tcp_connect_returns_plain_string (F01): CONNECTED event
  payload contains a plain string, not an asyncio.Future.
- test_g3_reconnect_loop_does_not_compound (F03): after max_attempts
  failures, exactly that many connect() calls are made — no fan-out.
- test_g3_disconnect_cancels_reconnect_loop (F03): disconnect()
  mid-loop cancels the single task cleanly.
- test_g3_reconnect_callback_called_after_reconnect (F02): callback
  is invoked after a successful reconnect.
- test_g3_reconnect_callback_failure_does_not_crash_loop (F02):
  callback exception is logged, reconnect still succeeds.
- test_g3_connect_none_is_soft_failure (N11): connect() returning
  None does not set _is_connected or emit CONNECTED.
- test_g3_no_reconnect_callback_is_noop (N11/F02): no callback
  provided — reconnect works, backwards-compatible.

Refs: Forensics report findings F01, F02, F03, N11
2026-04-11 19:48:02 -07:00
Matthew Wolter
ae0aa33dc8 G3: F02 — inject reconnect callback for send_appstart after reconnect
ConnectionManager._attempt_reconnect called self.connection.connect()
directly, bypassing MeshCore.connect() which runs send_appstart().
Firmware requires CMD_APP_START after every transport-level connection
to initialize the session.  Without it, the reconnected transport has
no active session — sends go unanswered, tcp_no_response fires after
5 attempts, handle_disconnect re-enters _attempt_reconnect, and the
reconnect storm begins.

Fix: add an optional reconnect_callback parameter to
ConnectionManager.__init__.  MeshCore passes self._on_reconnect which
calls send_appstart() after the transport reconnects.  The callback
is invoked inside _attempt_reconnect immediately after a successful
connect(), before the CONNECTED event is emitted.  Callback failures
are logged as warnings but do not break the reconnect — the transport
is up regardless.  Default None keeps the API backwards-compatible
for direct ConnectionManager users.

Refs: Forensics report finding F02
2026-04-11 19:47:49 -07:00
Matthew Wolter
ab4c27dcae G3: F03 — restructure _attempt_reconnect from tail-recursion to loop
_attempt_reconnect previously tail-recursed via asyncio.create_task,
re-assigning self._reconnect_task from inside the running coroutine.
This orphaned the current task — disconnect() cancelled only the
newest pointer, leaving previous-generation attempts in flight.  Those
orphaned tasks could set _is_connected = True after the caller thought
the session was closed.

Fix: replace with a single iterative while loop that holds one
persistent task for the entire reconnect session.  The task is created
once in handle_disconnect and torn down only on success, max attempts
exhausted, or disconnect() cancellation.

Refs: Forensics report finding F03
2026-04-11 19:47:12 -07:00
Matthew Wolter
115a402ac2 G3: N11 — document transport connect() return contract
The three transports had inconsistent connect() return contracts:
TCP returned an asyncio.Future (fixed by F01), serial returned
self.port (always truthy), BLE returned self.address or None.
ConnectionManager's success check `if result is not None:` was
tautological for TCP and serial.  With F01 fixed, the check is now
meaningful for all three.

Add a comprehensive docstring to ConnectionProtocol documenting the
contract: return truthy on success (included in CONNECTED event
payload), return None for soft failure (retry), or raise for hard
failure (also retry, logged).  Also import Awaitable for the F02
reconnect_callback type hint that follows.

Refs: Forensics report finding N11
2026-04-11 19:46:50 -07:00
Matthew Wolter
47f6df4797 G3: F01 — remove asyncio.Future wrap from TCP connect()
TCPConnection.connect() returned a resolved asyncio.Future wrapping
self.host instead of the plain string.  ConnectionManager put this
Future directly into the CONNECTED event payload, which crashed any
downstream serializer (e.g. HA recorder's sanitize_event_data) that
tried to walk the payload dict.  BLE and serial already return plain
strings.  Fix: delete the Future creation and return self.host
directly.

Refs: Forensics report finding F01
2026-04-11 19:45:55 -07:00
Florent
fbf84cbdac v2.3.6 2026-04-09 11:40:02 +02:00
fdlamotte
2d85fe465d Merge pull request #70 from meshcore-dev/feature/mesh-request-lock
Add mesh request lock to serialize firmware-bound commands
2026-04-09 05:15:31 -04:00
Alex Wolden
5e9cb559e7 Use firmware suggested_timeout for login and path discovery sync methods
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-08 21:42:21 -07:00
Alex Wolden
ed96df197a Fix 16 failing unit tests to match current source behavior
- Update mock dispatcher to use subscribe-before-send pattern matching
  the rewritten CommandHandler.send() method
- Use 32-byte pubkeys in tests for commands that now require
  prefix_length=32 (login, logout, statusreq, reset_path, share/export/remove contact)
- Fix send_trace test path format to match flags=1 (2-byte path hashes)
- Update LPP current test to expect signed wrap for values > 32.767
- Fix BinaryReqType import (moved from meshcore.parsing to meshcore.packets)
- Fix register_binary_request call signature (added pubkey_prefix param)
- Update timeout test to expect 'no_event_received' instead of 'timeout'
2026-04-05 18:38:16 -07:00
Alex Wolden
20f3bccb58 Deprecate fire-and-forget mesh request methods
Add warnings to send_login, send_statusreq, send_telemetry_req, and
send_path_discovery pointing users to their _sync counterparts. The
fire-and-forget versions bypass the mesh request lock and can cause
silent response drops due to firmware clearPendingReqs() behavior.
2026-04-05 18:38:06 -07:00
Alex Wolden
ab3e507e1f Add mesh request lock to serialize firmware-bound mesh commands
The companion firmware can only track one outstanding mesh request at a
time — clearPendingReqs() zeros all pending response flags before each
outgoing mesh request. Overlapping mesh commands cause silent response
drops.

Adds _mesh_request_lock to CommandHandlerBase and wraps all _sync
methods with it. Also adds send_login_sync and send_path_discovery_sync
for complete round-trip serialization of those commands.

Local commands (get_bat, get_channel, set_time, send_msg, etc.) are
unaffected — they don't trigger clearPendingReqs() on the firmware.
2026-04-04 23:18:21 -07:00
Florent
40a70222c8 don't put chan_name in log_rx if we don't know it 2026-03-29 10:53:43 -04:00
Florent
be3aa103c5 adds more min_timeout when fetching lots of neighbours 2026-03-29 07:57:08 -04:00
Florent
fe5096eb9e add hashtag to scope if absent 2026-03-27 20:12:15 -04:00
Florent
4c744888f1 v2.3.2 2026-03-22 12:51:52 -04:00
Florent
eca375dc8a apply frame header fix to tcp as well 2026-03-22 12:51:01 -04:00
fdlamotte
52ad5c201c Merge pull request #67 from jkingsman/respect-found-idx
Use the frame start once we've found it
2026-03-22 12:48:11 -04:00
Jack Kingsman
4df3655752 Use the frame start once we've found it 2026-03-21 21:08:04 -07:00
Florent
1e33dc5c66 ver bump 2026-03-19 06:40:49 -04:00
fdlamotte
cfafaccb5b Merge pull request #66 from jkingsman/fix-three-byte-path-packets
Fix bad bitmask on three byte PATH packets
2026-03-19 06:39:34 -04:00
Jack Kingsman
3ad77d364d Fix three byte path packets 2026-03-18 17:31:17 -07:00
Florent
5bfe63912c set decrypt_channel_logs to False by default 2026-03-11 10:21:29 -04:00
Florent
f507e396e3 v2.3.0 v2.3.0 2026-03-10 16:43:45 -04:00
Florent
3c81f67608 uploading missing file 2026-03-09 20:58:43 -04:00
Florent
18528f2ed3 make a class and module for parsing meshcore packets 2026-03-09 18:22:02 -04:00
Florent
f3fce820fc fix error 2026-03-08 15:11:24 -04:00
Florent
5e4663d058 there is still a strange bug with path_len 2026-03-08 08:21:56 -04:00
Florent
01471c0d24 fix nasty bug when updating contact flags 2026-03-08 07:04:33 -04:00
Florent
cda44ae0a0 and if error message does not exist yet 2026-03-07 21:13:30 -04:00
Florent
0d043bc094 fix 2026-03-07 21:05:55 -04:00
Florent
fe2239a8c6 add code_string to error event 2026-03-07 21:05:00 -04:00
Florent
462c4311d3 implement advert_path 2026-03-07 17:42:41 -04:00
Florent
0769afa475 v2.2.25 2026-03-07 07:04:16 -04:00
fdlamotte
0c00118624 Merge pull request #64 from f3sty/64
f-string nested double quote fix
2026-03-07 06:45:49 -04:00
josh
3358916e4c f-string quote fix 2026-03-07 13:58:03 +11:00
Florent
0bfa8003d5 remove some debug printfs 2026-03-06 11:11:54 -04:00
Florent
8087fe643b v2.2.23 2026-03-06 10:40:37 -04:00