Wednesday, 25 August 2010

Semmelweis: mortality and benchmarking

Semmelweis in 1860
It concerns me that in my previous post I may have given the impression that I thought that mortality could never be a good indicator of care quality (or, strictly, poor care quality). This is by no means the case: what I’m saying is that mortality figures have to be handled with prudence and there are many areas of care where they are not helpful, if only because mortality is so low.

One of those areas, in the developed world at least, is maternity services. I say in the developed world because in 2008, the maternal mortality rate in the UK was 8.2 per 100,000; in Ghana it was 409 and in Afghanistan it was 1575, so in the latter two countries it is certainly a useful indicator, indeed an indictment of how little we have achieved in improving healthcare across the world.

It used to be a significant indicator in Europe too. In 1846, for example, Professor Ignaz Semmelweis was working in a Viennese hospital in which two clinics provided free maternity services in return for the patient accepting that they would be used for training doctors and midwives. Semmelweis was appalled to discover that there were massively different rates of death from puerperal fever, or childbed fever as it was called, in the two clinics. As the table below shows, the figures over six years showed mortality of just under 10% in the first clinic and just under 4% in the second.

Semmelweis's findings for the two Clinics. Source: Wikipedia


In a fascinating early example of benchmarking, Semmelweis carried out a detailed study of both clinics gradually eliminating any factor that could explain the difference. One possible cause he was able to exclude early on was overcrowding: women were clamouring to get into the second clinic rather than the first, for obvious reasons, so it had a great many more patients.

Eventually, the only difference he could identify was that the first clinic was used for the training of doctors and the second for the training of midwives. And what was the difference? The medical students also took part in dissection classes, working on putrefying dead bodies, and then attended the women in labour without washing their hands. Semmelweis was able to show that with thorough handwashing using a sterilising solution it was possible to get childbed fever deaths down to under 1%.

Unfortunately, his findings weren’t received with cries of joy. On the contrary, since he seemed to be suggesting that the doctors were causing the deaths, he met considerable resistance. Semmelweis died at the age of 47 in a lunatic asylum (though this may not have been related to the reception of his work). It took Louis Pasteur's research and the adoption of the theory of disease transmission by germs for Semmelweis’s recommendations to win widespread acceptance.

This is a striking illustration of the principle that it isn’t enough to demonstrate the existence of a phenomenon and that you have to have a plausible mechanism to propose for it too. Semmelweis had shown the germ-borne spread of disease, but he hadn’t proposed the germ mechanism to explain how it happened.

Still, Semmelweis’s works remains a brilliant use of mortality as an indicator and the use of benchmarking to achieve a breakthrough in the improvement of healthcare. An excellent example of the brilliant use of healthcare information.

Thursday, 19 August 2010

Indicators are only useful if they’re useful indicators

We’ve seen that pulling healthcare data from disparate sources, linking it via the patient and building it into pathways of care are the essential first steps in providing useful information for healthcare management. They allow us to analyse what was done to treat a patient, how it was done and when it was done. Paradoxically, however, we have ignored the most fundamental question of all: why did we do it in the first place?

The goal of healthcare is to leave a patient in better health at the end than at the start of the process. What we really need is an idea of what the outcome of care has been.

The reason why we tend to sidestep this issue is that we have so few good indicators of outcome.

In this post we’re going to look at the difficulties of measuring outcome. In another we'll review the intelligent use that is being made of existing outcome measures, despite those difficulties, and at initiatives to collect new indicators.

The first thing to say about most existing indicators is that they are at best proxies for outcomes rather than direct measures of health gain. They're also usually negative, in that they represent things that one would want to avoid, such as mortality or readmissions.

Calculating them is also fraught with problems. Readmissions, for example, tend to be defined as an emergency admission within a certain time after a discharge from a previous stay. The obvious problem is that a patient who had a perfectly successful hernia repair, say, and then is admitted following a road traffic accident a week later will be counted as a readmission unless someone specifically excludes the case from the count.

At first sight, it might seem that we should be able to guard against counting this kind of false positive by insisting that the readmission should be to the same speciality as the original stay, or that it should have the same primary diagnosis. But if the second admission had been as a result of a wound infection, the specialty probably wouldn’t have been the same (it might have been General Surgery for the hernia repair and General Medicine for the treatment of the infection). The diagnoses would certainly have been different. However, this would certainly have been a genuine readmission, and excluding this kind of case would massively understate the total.

It’s hard to think of any satisfactory way of excluding false positives by some kind of automatic filter which wouldn’t exclude real readmissions.

Another serious objection to the use of readmission as an indicator is that in general hospitals don’t know about readmissions to another hospital. This will depress the readmission count, possibly by quite a substantial number.

Things are just as bad when it comes to mortality. Raw figures can be deeply misleading. The most obvious reason is that clinicians who handle the most difficult cases, precisely because of the quality of their work, may well have a higher mortality rate than others. Some years ago, I worked with a group of clinicians who had an apparently high death rate for balloon angioplasty. As soon as we adjusted for risk of chronic renal failure (by taking haematocrite values into account), it was clear they were performing well. It was because they were taking a high proportion of patients at serious risk of renal failure that the raw mortality figures were high.

This highlights a point about risk adjustment. Most comparative studies of mortality do adjust for risk, but usually based on age, sex and deprivation. This assumes that mortality is affected by those three factors in the same way everywhere, and there's no really good evidence that they really do. More important still, as the balloon angioplasty case shows, we really need to adjust for risk differently depending on the area of healthcare we’re analysing.

This is clear in Obstetrics, for instance. Thankfully, in the developed world at least, death in maternity services is rare these days, so mortality is no longer a useful indicator. On the other hand, the rate of episiotomies, caesarean or perineal tears are all relevant indicators. They need to be adjusted for the specific risk factors that are known to matter in Obstetrics, such as the height and weight of the mother, whether or not she smokes, and so on.

Mental Health is another area where mortality is not a helpful indicator. Equally, readmission has to be handled in a different way, since the concept of an emergency admission doesn't apply to Mental Health, and generally we would be interested in a longer gap between discharge and readmission than in Acute care.

Readmission rates and mortality are indicators that can highlight an underlying problem that deserves investigation. They have, however, to be handled with care and they are only really useful in a limited number of areas. If we want a more comprehensive view of outcome quality, we are going to have to come up with new measures, which is what we’ll consider when we look at this subject next.

Wednesday, 11 August 2010

It’s the patient level that counts

What happens in healthcare happens to patients. Not to wards or theatre or specialties, far less HRGs, DRGs or Clusters. Ultimately, the only way to understand healthcare is by analysing it at the level of the individual patient.

Much of the information we work with is already at that level or even below. An inpatient stay or an outpatient attendance is specific to a particular event for an individual patient. We need to be able to move up a level and link such event records into care pathways.

Much information, on the other hand, is held at a level far above that of the patient. Financial information from a ledger, for example, tells us how much we spent on medical staff generally but not on a specific patient. Most pharmacy systems can tell us what drugs were used and how much they cost but usually can’t tell us is which medications were dispensed for which patients.

On the cost side, one answer to this kind of problem is to build relatively sophisticated systems to apportion values – i.e. to share them out across patient records in as smart a way as possible. For example, an effective Patient Level Costing system uses weights reflecting likely resource usage to assign a higher share of certain costs to some patients than to others. The effect is to take high-level information down to patient level.

In some areas, that's the only approach that will ever work. For nursing costs, one can imagine a situation where patients have bar-coded wrist bands that nurses read with wands, giving an an accurate view of the time taken over each patient. But the approach would be fraught with problems:

  • it would be expensive and impose a new task, designed only to collect information without contributing to patient care, on nurses who already have more than enough to do
  • it would be subject to terrible data quality problems. Just think of the picture we’d get if a nurse wanded himself in to a bedside, and then forgot to wand out, which it wouldn't surprise me to see happen with monotonous frequency
  • even if all nurses could be persuaded to use the wands and did so accurately and reliably, it’s not clear that we would get a useful view of nurse time use: after all, when staff are under less pressure, a nurse might take longer over a routine task such as administering drugs, but it would be nonsense to assign the patient a higher cost as a result
For resources like nursing, it seems sensible to share the total figure across all the patients, in proportion to the time they spent on a ward, as though they were incurring a flat fee for nursing care irrespective of how much they actually used. This suggests that apportionment actually gives the most appropriate picture.

But with other kinds of cost, we really ought to be getting the information at patient level directly. We ought to know exactly which prosthesis was used by a patient, how much physiotherapy was delivered, precisely which drugs were administered. If we don’t, then that’s because the systems we’re using aren’t clever enough to provide the information. In that case we need cleverer systems.

For example, pathology systems tend to have excellent, patient-level information already. We just need to link the tests to the appropriate activity records, making intelligent use of information such as the identity of the patient and the recorded dates. This has to be done in a flexible way, of course, to allow for such occurrences as a pathology test carried out some time after the clinic attendance at which it was requested.

Linking in pathology information would immediately make pathway analysis richer. When it comes to costing, we still need an apportionment step, to calculate individual test costs from overall lab figures, but then the calculated value can be directly assigned to the patient record.

The same kind of approach can be applied to diagnostic imaging, the therapies and many other areas. For example, we can calculate a cost for a multi-disciplinary team meeting and then assign that cost to the patient record, as long as the information about the meeting is available in a usable form.

Then, however, there are other areas of work where we should be able to operate this way but generally can’t. Few British hospitals have pharmacy systems that assign medications to individual patients. If they did, and given that the pharmacy knows the price it is being charged for each dose, we could link the prescription to the patient record and assign its actual cost to it. Given that pharmacy is the second biggest area of non-pay cost in most acute hospitals, after theatres, this would be a significant step forward.

The same is true of similar services in other departments, such as blood products.

Getting this kind of information right would greatly enhance our understanding both of care pathways and of patient costs. As I’ve already pointed out, that would be a huge improvement in the capacity of hospitals to meet the challenges ahead.

Saturday, 7 August 2010

Navigating the Pathways and Protocols of Care


A major advantage of the pathway view of healthcare is that it allows us to compare the actual of process of treatment with agreed protocols. A protocol represents a consensus view among experts on the most appropriate way of treating a particular condition.

Although such consensus often exists, it’s striking how frequently what actually happens differs substantially from accepted best practice. Sometimes it’s for good reasons associated with the particular case, but often it’s simply a failure to stick to guidelines with the result that care falls short of the highest standards. Oddly, poor care may often be more expensive too, since additional work is needed to correct what could have been done right in the first place. So there’s a double reason for trying to eliminate this kind of variation.

Unfortunately, today’s hospital information systems generally find it difficult to support pathway analysis and comparisons with protocols. This is partly because the information needs to be brought in from multiple sources and then integrated, which may seem dauntingly difficult. However, there is actually a great deal that can be done with relatively simply data. There’s a lot to be said for not being put off by the difficulty of doing a fully comprehensive job, when one can start with something more limited now and add to it later.

Take the example of cataract treatment in a hospital that has decided to follow the guidelines of the NHS version of the Map of Medicine. The Map suggests the preferred procedure is phacemulsification. Routine cases have day surgery with a follow-up phone call within 24 hours and possibly an outpatient review after a week. This allows us to build a pathway for routine cases entirely from PAS data or at worst PAS and other Contacts data.


Map of medicine suggests non-routine cases occur where there is a particular risk of complications, the patient only has one eye or the patient is suffering from dementia or learning difficulties. In these instances, we would expect daily home visits in the first week and certainly an outpatient attendance for review within one to four weeks of discharge.

So here’s a second pathway structure:


Again, information about home visits might be in the PAS or might have to be added. We also need to check on diagnosis information from the PAS for dementia or learning difficults.

The two pathway structures shown above correspond to the two protocols. So now we can compare them with similar pathways built for real patients in the hospital. The aim is to limit the number of cases that we investigate further to only those that differ significantly from the guidelines.

So any cases where the pathway is the same as for routine cases can be ignored.

Any cases where the pathway is the same as for non-routine cases can be ignored as long as there is evidence of dementia or of learning difficulties, or the patient had a single eye or there was a serious risk of complications. It's possible that the last two pieces of information aren't routinely collected, in which case we shall find ourselves investigating some cases that didn't need it until we can start to collect them.

Overall what this approach means is that we can eliminate a lot of cases from examination and concentrate management attention on only those where there may be a real anomaly, and action could lead to an improvement in the future. That has to be a huge step forward over what most hospitals can do today. Yet it involves relatively straightforward work on information systems.

Adding other data could improve the analysis. Information from a theatre system would tell us about, say, how long the operation takes. If patient-level information about medication is available, it can be linked in to check that appropriate drugs are being administered at the right times.

In the meantime though, we would be working with a system that should be relatively easy to implement and can help us make sure patients are being treated both effectively and cost-effectively.

That sounds like a something it would be good to do, given today's pressure to deliver more while costing less.

Most of the information to check on compliance will be in hospital Patient Administration Systems (PAS).

The PAS records procedures so we can check whether phacoemulsification was used or not. In some acute hospitals, the PAS may not record non face-to-face contacts, which would cover the telephone follow-up, but the information is certainly held somewhere and it should not be insuperably difficult to link it with PAS data. All these data items have dates associated with them, so we can apply rules to check that the right actions were taken at the appropriate time.

Sunday, 1 August 2010

Lies, damned lies and misused healthcare statistics


If you live in Stafford, as I do, it comes as a great relief to see that the Care Quality Commission (CQC) has decided to lift five of the six restrictions it had placed on the Mid Staffordshire Trust, our local acute hospital, following the scandal there over care quality. It was particularly gratifying to read that mortality rates have fallen and numbers of nurses are up to more sensible levels. It’s obviously good news that a hospital that had been having real problems with quality seems to be well on the way to solving them.
On the other hand, much of the original scandal had missed the fundamental point. Much was made of the finding that between 400 and 1200 more people had died in the hospital than would have been expected. The implication was that poor quality had led to excess deaths, even though there was no way of linking the deaths to care quality defects. Indeed, the Health Care Commission, the predecessor of the CQC, had decided to take action over the quality of care at the Trust, but not because of the mortality figures which it had decided not to publish and was irritated to see leaked.

Now on 28 May, Tim Harford’s Radio 4 programme More or Less examined the use of statistics about Mid-Staffs. David Spiegelhalter, Professor of the Public Understanding of Risk at the University of Cambridge, warned listeners that we need to be careful with the concept of ‘excess deaths’, because it really only means more deaths than the average and ‘half of all hospitals will have excess deaths, half of all hospitals are below average.’

What we need to look out for is exceptionally high values, although even there we have to be careful as there are many reasons why a hospital might be extreme: ‘first of all you’ve just got chance, pure randomness: some years a hospital will be worse than average even if it’s average over a longer period.’

Spiegelhalter also questions the techniques used to make the statistics more relevant, such as risk adjustment. That kind of adjustment aims to take into consideration the extent to which mortality might be affected by factors external to the hospital, such as race, poverty or age. That should give a better way of comparing hospitals, but in reality the procedure is inadequate because ‘there’s always variability between hospitals that isn’t taken into account by this risk adjustment procedure, not least of which is that we assume that factors such as age, ethnicity and deprivation have exactly the same effect in every hospital in the country’, an assumption that we’re not justified in making.

Spiegelhalter’s conclusion? Mortality is ‘a nice piece of statistics and it’s great as a performance indicator, something which might suggest that something needs looking at, but you can’t claim that excess mortality is due to poor quality care.’ Not that such considerations stopped many newspapers making exactly that claim.

Of course, Spiegelhalter could have added that a lot depends on which mortality statistics you measure anyway. It’s fascinating to see that the body that produced the original mortality figures for Mid Staffs, Dr Foster Intelligence, was later asked to look at a different range of performance indicators, including some more narrowly defined mortality values, and placed Mid Staffordshire ninth best performing hospital in the country – less than a year after the original scandal broke.

Tim Harford also interviewed Richard Lilford who is Profession of Clinical Epidemiology at the University of Birmingham. Lilford suggested a different approach to assessing hospitals: ‘I’ve always felt that we should go for more process-based measurements. What we should look for is whether hospitals are giving the correct treatment.’ Professor Lilford felt this approach had two advantages. The first is that if differences in quality of care can be traced to the processes used, it’s difficult to the right them off as a result of statistical bias. Most important of all, though, if we really want to improve the care provided, ‘we need to improve the hospitals that are in the middle of the range not just those that are at the extreme of the range.’ In fact, he finds that there is more to gain from improving the middle-range of hospitals than from improving the extremes.

In any case, I don’t think I’ve ever come across a good or bad hospital. Some hospitals are strong in certain specialties and weak in others, or have stronger and weaker clinicians, or even clinicians who are good at certain times or at certain things and bad at others. Lilford makes much the same point: ‘the fact of the matter is that hospitals don’t tend to fail uniformly, they’re seldom bad at everything or good at everything. If you go for process you can be specific. You can improve process wherever you find it to be sub-optimal.’

That’s the key. When we understand processes, we can see where problems are arising. There clearly were problems at Mid Staffs. What was needed was careful analysis of what was being done wrong so that it could be fixed, so that processes could be improved. This is the reason for my enthusiasm for analysing heathcare in terms of pathways, spanning whole processes, rather than isolated events.

It’s really good news that the CQC feels that the work at Mid Staffs has produced results.

How much better things might have been if this work of improvement hadn’t had to start in the atmosphere of scandal and panic set going by wild use of mortality figures.

Last word to Professor Lilford.

‘Using mortality as an indication of overall hospital performance is what we would call, in clinical medicine, a very poor diagnostic test. What we’re really interested in when we measure mortality isn’t mortality, it’s not the overall mortality, for this reason: we all will die some day and most of us will do so in hospital. So what we’re really interested in is preventable or avoidable mortality and, because avoidable mortality is a very small proportion of overall mortality, it’s quixotic to look for the preventable mortality in the overall mortality.’

Time we stopped tilting at windmills and took hospital performance a little more seriously.