Make "no reproducible errors" reproducible

ringring · December 28, 2023, 8:14am

In Jami there are a lot of “no reproducible errors” that make Jami unusable for most users and that are not being fixed by the developers for a long time of several years.

In my opinion, it is necessary to make this errors reproducible to finally investigate and fix then - for example by building some kind of testing network:

The idea:

create or use a tool that setup a lot of different virtual networks with different virtual computers
- ipv4 only, ipv6 only, both
- tcp and udp, only tcp
- slow, fast, very slow
- with changing ips, without
- Windows, Ubuntu 20.04, Ubuntu 22.04, Fedora, Arch, Android …
- with stable connection, with unstable connection, with connection only from 8:00 AM to 8:00 PM…
Then, official versions of Jami should be installed on the computers. Official would have two meanings - both should be covered to get a huge diverse network as normal users would create:
- Official aka builds directly from the jami.net websites.
- Official aka builds directly from OS’ official repositories (Windows store, Flatpak, debians apt, ubuntus apt etc.)
A bot (like Smart Auto Clicker on Android - or maybe a Jami plugin?) should be installed on all the OS and cause common user interactions like creating a account with username, a account without, adding some user from another host, start a call, end a call, send a text message, send a file etc.
Some kind of log collector checks if all messages, files, and calls are delivered and if not, it collects all the logs, stack traces etc.

sblin · December 28, 2023, 12:19pm

The tools already exist (the problem is not new).

The agent is used to try scenario like contact adding/test call, change the settings and retry to test various environments. Tc-netem can simulate slow network, etc etc.

Implementing telemetry/log collection is generally unwanted by the community and can be tricky.

This generally not change the fact of unclear tickets. But it’s fine. Providing enough informations to make your scenario reproducible is hard and the développer will try to deal with it. If not, and because time is not an infinite resource, the ticket will be closed and another person with the same problem will probably produce a better ticket if the issue is not fixed

JamiMan7 · January 1, 2024, 4:22pm

Perhaps there is a more foundational problem not being addressed here.

If the UI is not intuitive, then users will not have a good way to describe the situation since the way things should work are confused.

If developers want better and more reasoned input, then the must start with a better UI.

pmetras · January 3, 2024, 12:01am

Perhaps having manual telemetry would help… Many times, I encounter problems but I’m unable to report them because the logs are not enabled. If when a problem happens, the user can easily report it with extracts from Jami logs, there would be more chances to reproduce it.

Les explain this last sentence…

I wrote problem and not bug to stay more general. For instance, last time, I had to try 5 times to reach my correspondent. What was the reason? I don’t know. Is it a bug in Jami or my firewall being too strict? I’m presently unable to dig without traces…

I wrote also that the user decides to report it or not. I’m not sure that log collection is unwanted as long as the user can stay in control. We don’t want obscure telemetry or automatic log collection after a crash. But an always visible button, that you make available from the settings, that you can press to report a problem and that attaches the last 5 minutes of logs and report to the Jami developers.
In case of crash, when starting again Jami, suggesting to report the crash with information about how it happened…

The first steps to problem correction is understanding how it happened. Explaining what users saw is only the tip of the iceberg, because each configuration (platform, network, OS, etc.) is different. When the developers can build a mental model of what happened, they can try to reproduce it and then explain it, and prevent it happening again, eventually correcting a bug.

ECaptainRaj · January 4, 2024, 12:34pm

Something similar is already in progress:

pmetras · January 17, 2024, 1:11am

The postmortem feature is about reporting after a crash. Most of my problems with Jami don’t occur from a crash (or perhaps the server part has crashed and I can’t see it) but more frequently of problematic situations. When it happens, I would like to be able to report the situation and have the tool capture the context where it occurred. And as these situations are frequently transient, it must be easy to start the capture. Presently, keeping Jami logs enabled just in case a problem occur creates more stress of filling the storage with unnecessary files and crashing the whole system than having the information to report a bug…

Example of situation where I would like to easily share what happens on my Jami instance:

Duplicate accounts in the contacts list.
Mysterious conversation with contact amarok (@sblin ?) who appears to having attempted to call me 7 times, and where the history of messages is not chronological…
A swarm created with contact divadlo where I’m supposed to have accepted the conversation, but I can’t remember having done it.
Being able to call a correspondent even if he is not online (green spot not present)…
Unable to call a correspondent even if he is online (green spot present)…
etc.

Should I open another ticket in Gitlab for a non-post-mortem reporting tool or amend the post-mortem one?