Unrecoverable error when using python socket with CAN_ISOTP.

Created by: tlalexander

Hello and thank you for your work on this kernel module. I appreciate being able to make a high quality system with it.

I have an issue that is probably a python issue, but I am hoping you may know something about it. I have a full write up at the following link: https://github.com/pylessard/python-can-isotp/issues/80

Basic details: I am using the python-can-isotp library as a helper library to create a connection to the socketcan interface on my raspberry pi CM4 with MCP2515 chip onboard. This kernel module is running on the raspberry pi.

Everything works fine with a short delay between new calls to send, but if I send data too quickly the python socket will eventually enter a condition where it starts exclusively throwing socket.timeout errors on every call to send, never sending another packet again. If I try to close the socket the program locks up. During this error condition I am still able to send frames over that same CAN connection using isotpsend from another terminal window even as the python program repeatedly throws socket.timeout errors.

If I create the socket with the WAIT_TX_DONE flag, then when this error occurs it will lock up my entire python program and I will have to kill the program from another terminal. The timeout is only 100 milliseconds so I don't understand why this would lock up, and if this is a bug or I am simply misusing it.

Since I can use isotpsend from another terminal window during this lockup, this shows me that the actual CAN link is fine. So the problem should be between your kernel module and the python socket. Thus I should probably ask the python-help list for assistance, but I worry that concerns about CAN bus may be too specialized, or that such an email would only get routed to you anyway.

Here is what the system says about the link when my python program is locked up:

acorn@acornv2:~$ ip -details -statistics link show can1
4: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 20000
    link/can  promiscuity 0 minmtu 0 maxmtu 0 
    can state ERROR-ACTIVE restart-ms 0 
	  bitrate 500000 sample-point 0.875 
	  tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
	  mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
	  clock 8000000 
	  re-started bus-errors arbit-lost error-warn error-pass bus-off
	  0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
    RX: bytes  packets  errors  dropped missed  mcast   
    1509504    188688   0       0       0       0       
    TX: bytes  packets  errors  dropped carrier collsns 
    6484689    891876   0       0       0       0

I would like to find out what is actually happening when it locks up, but I don't know how to debug that. I would also like to know how I can check the socket to confirm it is safe to send without having to hard code a wait, and finally I would like to know how to recover from this condition.

Any and all help is greatly appreciated. If I should take my question to python-help please let me know. Thank you!