-
About usefulness of early detectors
08/03/2017 at 07:39 • 1 commentWhen one claim to design an early detector of some illness, it is important to think about early detection in healthcare in general, what are the benefits, inefficiencies and the new problems it creates.
As a personal anecdote about usefulness of early detectors, I had a skin carcinoma, that two doctors (the family and the company MDs) saw without reacting for years, one of them even asked me what it was. At the end it was a cardiologist who told me it was probably a carcinoma and that I had to consult quickly a specialist.
MDs have to know what to make of the tests results of those devices. For example some medical organizations start to provide free kit for genetic screening for some conditions [0], as we know some drugs work well for some genomes but less for others, which is a concept a bit weird in itself but very fashionable at the moment.
But those kits do not work the same way, so their results are not comparable with each others, some may analyze the DNA in blood, while others may take a sample with a biopsy needle. Neither can claim to capture the full picture of the tumor’s mutations. In addition tumors' genome evolves very quickly and is not homogeneous, it is as if many mutations are branching out quickly from a common ancestor cell. At some time later a tumor is the site of several unrelated mutations.
Some tests sometimes provide conflicting or overlapping results from the same patient. Researchers at the University of California, San Diego, published a 168-patient study on discordance in early 2016 that shows there are overlap as well as differences, between DNA analyses from biopsies of tissue and blood samples.
Some tests even make suggestions for drugs, studies have shown that different commercial solutions may in some cases suggest different drugs, or do not suggest drugs that a MD would have prescribed. Those commercial products need to improve, and doctors' professional bodies need to develop guidelines to teach how to cope with those new tools.Another thing is the false negative, the press reported recently an unfortunate case where a women felt something was wrong with her baby, in the last months of her pregnancy. She then used a fetal Doppler and found an heart beat, unfortunately the baby was stillborn. It is possible that if she had not used her fetal Doppler, she would have gone immediately to her hospital which may have saved the baby.
False positive are another problem, as an older man I am regularly reminded to check for PSA by the state health insurance, PSA (prostate-specific antigen) is a marker of prostate cancer. I am aware of the risk of cancer, but two large studies, one in US and two in Europe told that for a thousand people screened positively, one man will probably be saved, but several dozens will suffer severe degradation in their life quality and health in general.The testing process may also induce traveling cost for the patient, lost of time and revenues, incomfort or even suffering, especially in women healthcare. Unnecessary biopsies and other medical procedures for people who are wrongly diagnosed or whose cancer might never have spread, can also hasten health problems.
While early detectors might seem a good idea in general, one problem is the anxiety they generate, for example even if everything is right, it does not mean everything will stay right in the future so there is a constant urge to re-check. Even medical doctors could succumb to cognitive bias, when they find "something" in mammography, then ask for more tests which are negative but nevertheless urge to have more frequent testing in the future, creating unnecessary anxiety for the patient[1].What does all this mean for a designer of an early detector of heart failure? Certainly that there is a need to not make big unwise claims. There is also a need to collaborate with real doctors, not only scientists.
At the same time how to attract attention of people to make them use it and finance R&D ?[0] http://www.xconomy.com/national/2017/05/31/in-maine-making-cancer-dna-tests-free-and-asking-tough-questions/
[1] https://blogs.scientificamerican.com/cross-check/why-we-overrate-the-lifesaving-power-of-cancer-tests/ -
Refactoring and randomness test
07/12/2017 at 17:07 • 0 comments
The first usable versions of our features detection code (findbeats.java) were full of hardwired constant and heuristics.The code has now been modularized with clean condition of method exit.
We were proud that our design was able to look at each beat and heart sound, which is a far greater achievement than what ML code does usually. Something really interesting was how we used compression to detect heart sounds features automatically in each beat.
Now we introduce something similar in spirit: Until now sometimes our code was unable to find the correct heart rate if the sound file was heavily polluted by noise. Now we use a simple statistical test akin to standard deviation, to test the randomness of beats distribution. If it is distributed at random, then it means our threshold is too low: We detect noise in addition to the signal.
This helped us to improve the guessing of the heart rate.
In an unrelated area, we also started to work on multi-HMM, which means detecting several, concurrent features. An idea that we toy with, would be to use our compression trick, at beat level, whereas now it is used at heart sound level. This is tricky and interesting in the context of a multi-HMM. Indeed it makes multi-HMM more similar to unsupervised ML algorithms. -
Multi-HMM Approach
07/03/2017 at 14:30 • 0 commentsUp to now the feature detection has used something that I find funny, but it works really well. As we use Hidden Markov Models, we must create a list of "observations" for which the HMM infer a model (the hidden states). So creating trustable observations is really important, it is a design decision that those observations would be the "heart sounds" that cardiologists name S1, S1, etc..
In order to detect those events, we first have to find the heart beats, then find sonic events in each of them. In CINC/Physionet 2016 they use a FFT to find the the basic heart rate, and because a FFT cannot inform on heart rate variability, they compute various statistical indicators linked to heart rate variability.
And its not a very good approach as the main frequency of a FFT is not always the heart beat rate.
Furthermore this approach is useless at the heart beat level and indeed at heart sound level. So what we did, was to detect heart beats (which is harder that one could think) and from that point, we can detect heart sounds.Having a series of observations that would consist only of four heart sounds, would not be useful at all. After all a Sn+1 heart sound, is simply the heart sound that comes after the Sn heart sound. We needed more information to capture and somehow pre-classify the heart sounds.
It was done (after much efforts) by computing a signature based somehow on a compressed heart sound. Compression is a much more funny thing that it might seem. To compress one has to reduce the redundant information as much as possible, which means that a perfectly compressed signal could be used as a token about this signal, and logical operations could be done with it.
Sometimes people in AI research fantasize that compression is the Graal of machine learning by making feature detection automatic. We are far from thinking that, as that in order to compress one has to understand how the information is structured, and automatic feature detection implies that we do not know its structure.
It is the same catch-22 problem that the Semantic Web met 10 years ago, it can reason on structured data but not on unstructured data, and the only thing that would have been a real breakthrough was reasonning on unstructured data. That is why now we have unsupervised Machine Learning with algorithms like Deep Forest. While Cinc 2016 submissions used heavily unsupervised ML, we used compression (Run Limited Length) to obtain a "signature" of each heart sound, and it works surprisingly well with our HMM.
The next step is to implement a Multi-HMM Approach, because there are other possibilities to pre-categorize our heart sounds than its RLL signature, for example the heart sound might be early or late and that characteristic could be used to label it.
-
Heart beat detection and segmentation
06/15/2017 at 21:18 • 0 commentsYou can find a good description of the current detection and segmentation of the heart rate on the Padirac Innovation web site:
https://padiracinnovation.org/2017/06/15/heart-beat-detection-and-segmentation/
Why is segmentation so important? Because we want to be able to explain our classification result. This means that we share understanding and vocabulary with cardiologists. This is opposed to current approaches to deep learning like "deep forest" that fit their internal model with features that might be only remotely connected to the physiology.Why is it important to be able to understand what is going on at physiological level? Because the horror stories here are studies that claim to be able to predict with 98% accuracy re-hospitalization next year in diseases such as diabetes or heart failure, simply by looking at medical records. Obviously one does not need deep forest algorithms to predict that someone with a HF condition will be re-hospitalized next year.
We must make credible statements, if we want MDs and scientists to take us seriously.
-
A library, a few modifications to GUI and feature detection
06/15/2017 at 06:56 • 0 commentsIn order to work toward a controller implementation, I separated the core Hjertle (early heart failure detection) and its GUI.
The new Hjertle library could be found at:
https://github.com/Hjertesvikt/HjerteLib/
Minor modifications were made to the GUI and to feature detection:
https://github.com/Hjertesvikt/Hjerte -
A milestone between phase 0 and phase 1.
06/11/2017 at 10:35 • 0 commentsWhen I started this project I stated there would be several phases:
- The phase 0: Fetal Doppler and Linux box, which was possible for you to implement from the onset thanks to the code from Physionet 2016.
- The phase 1 is a dedicated device (Arduino?) combining a 3Mhz ultrasound probe and associated software .
- The phase 2 hardware (phase 1 + fibrous tissue detection) will be also under a BSD or MIT license.
Today we achieved a milestone between phase 0 and phase 1. A HMM software for Hjerte has been uploaded on Github in the master branch.
https://github.com/Hjertesvikt/Hjerte
It enables to train a HMM and classify another heart sound file with respect to this HMM. This software addresses the limitation I perceived in the Physionet code (no GUI, huge computing needs, black box solution). There is still no device, so you still have to use a Linux box (whatever kind as long as it runs Java 1.2) with a commercial low cost Fetal Doppler.
I will now take a break to think about my other HaD project, and then resume the work toward phase 1. Comments or suggestions are heartily welcomed! -
A GUI to manage HMMs
06/07/2017 at 17:43 • 0 commentsOur tool is intended to explore several HMMs and find how they perform on different training sets. This implies there is a need to manage HMMs.
A HMM may have an author, an intended usage, belonging to a portfolio, and having a name. It can be saved and loaded and shown in a pretty way.
However this is mostly not implemented at the moment, it is just the initial effort.
https://github.com/Hjertesvikt/Hjerte/tree/Hjertesvikt_draft_1/
Stay tuned! -
New branch
06/04/2017 at 21:59 • 0 commentsThe previous branch ..../draft_0 was removed and replaced by :
https://github.com/Hjertesvikt/Hjerte/tree/Hjertesvikt_draft_1
It brings a slightly different GUI and a better detection of small features in heart sounds.