How is icd 9 cm useful




















Terminology and disease classification have been updated to be consistent with current usage and medical advances. This final rule designated five medical code sets to be used for assigning diagnoses and procedures.

These are:. This final rule was effective on October 16, Most entities had to be in compliance by October 16, , although some smaller entities had until October 16, , to be compliant. Various organizations have recommended that the Department of Health and Human Services should issue a proposed rule requiring that facilities adopt the new ICDCM codes as the national standard code set.

Generally, ICDCM incorporates greater specificity, clinical data, and information relevant to ambulatory and managed care encounters. In addition, the structure of ICDCM allows for the possibility of greater expansion of code numbers. This classification will also extend beyond simply the classification of disease and injuries to include risk factors that are frequently encountered in a primary care setting.

General terminology, as well as disease classification, has been updated to be consistent with accepted and current clinical practice. The expanded degree of specificity should provide more detailed information, which would assist providers, payers, and policy makers in establishing appropriate reimbursement rates, improving the delivery of healthcare, improving and evaluating the overall quality of patient care, and effectively monitoring both service and resource utilization.

These changes should result in major improvements in both the quality and uses of data for various healthcare settings. ICD has been in use in other countries for several years. AHIMA surveyed several other countries regarding their implementation strategies and obstacles that they encountered. AHIMA discovered that many other countries are disgruntled regarding the failure of the US to adopt the revision of ICD, again noting the inability to accurately compare data Worldwide.

As mentioned previously, both Australia and Canada have developed modifications of ICD for use in their respective countries. ICDAM has been fully implemented in Australia since approximately , and most of Canada has completed the conversion. Australia conducted two-day training workshops for experienced coding professionals, while Canada provided coding education in a three-phase plan. The first phase consisted of a self-learning package that required about 21 hours to complete.

The second phase consisted of a two-day workshop, with a hands-on program. In the third phase, a self-learning package of 10 case studies was provided to the coders. All of the education in Canada involved the use of coding software and not codebooks. Both countries offer periodic refresher courses.

The average learning curve was four to six months and coding professionals reported that they did not find ICD any more or less difficult to learn than ICD The information obtained through this study will be used, as appropriate, to move the regulatory process forward. Certainly, upper management should be represented as well as all departments affected in any way by the change. The frequency of meetings will depend on the individual facility, as will the responsibilities of this task force.

Obviously, coders and physicians will require training, but there are other individuals who will be affected and thus, will need some training depending on their involvement. Training on the new coding system may take many forms including face-to-face workshops or seminars. Currently, there are a number of excellent coding publications dedicated to coding training, and it is expected that this, too, will be the case for ICDCM.

We ranked all words and phrases according to their relevance on false negative predictions and added the most reliable keywords and phrases to the dictionary of the rule-based classifier. This procedure was repeated until the most significant feature brought fewer than 2 additional true positive predictions. With this approach we managed to extend the rule-based model for 9 labels. The set of terms acquired by this iterative method is twice as large as that obtained by the decision tree.

Even so, the difference between their accuracies on the challenge test dataset is unquestionably below the level of significance. On the other hand, the construction of systems that use hand-crafted decision rules would become more laborious and hard to accomplish when the number of codes involved in the task is a magnitude bigger than that used in the CMC challenge. To overcome this problem, we examined the possible ways of replacing certain phases of the construction of rule-based systems by statistical methods, while keeping the advantages of expert systems.

Our results demonstrate that, after the conversion of ICDCM coding guides which were originally designed for humans and are not machine readable , the major steps of building a high performance rule-based classifier for coding radiology reports can be replaced by automated procedures that require no human interaction. We studied two aspects of the construction of a purely hand-crafted rule-based system, namely the modeling of inter-label dependencies, which is a special characteristic of ICDCM coding and the enriching of the synonym list of the rule-based system with rare transliterations and abbreviations of symptoms or diseases.

The results of our experiments are summarized in Table 2. A webpage where all the systems described can be accessed and tested online is available at [ 12 ]. To perform these tasks with machine learning models, we trained classifiers to predict the errors of a basic rule-based system which relies just on the knowledge found in the coding guide.

We trained C4. Here we found the same dependencies, and got the same improvement in performance, as that of a system with hand-crafted rules for inter-label dependencies. To enrich the list of synonyms used by the rule-based system with additional phrases and abbreviations, we trained C4. These statistical models can be used in a cascade model following the rule-based system, or the most reliable keywords found can be incorporated as decision rules into the expert system.

However, the difference in performance between these two different machine learning methods was below the statistical level of significance. The extracted synonyms and abbreviations correlated well with those phrases added manually to the hand-crafted system. A small percentage of the phrases were clearly noise, that causing the systems to overfit on the training dataset — these systems achieved better performance on the training set than the hand-crafted system and performed somewhat worse on the evaluation set, see Table 2.

The manual filtering of phrases proposed by the learning models could be performed in a few minutes, and this way more robust and more similar to the hand-crafted hybrid systems could be built with minimal effort. In our experiments we performed the major steps of the construction of a hand-crafted expert system using statistical methods. Evidently, the performance of the hand-crafted system is an upper bound on the performance that can be attained this way. We found that similar results could be achieved via statistical models by improving basic rule-based classifiers like those we obtained by an entirely hand-crafted system.

The main contribution of the study described here is that such automatic systems can be constructed at a lower cost, with less human labour. The results reported here are close to the performance that human expert annotators would achieve for the same task.

The gold standard of the CMC challenge dataset is the majority annotation of three human annotators. The inter-annotator agreement statistics are shown in Table 3. We should mention here that the human annotators had no access to knowledge about the majority labeling, while models trained on the challenge dataset can model majority labeling directly. This way, human annotator agreement with majority codes should be higher if they had the chance of examining the characteristics of majority labeling.

On the other hand, the annotators influenced the target labels as these were created based on their single annotations. This fact explains why all annotators have a higher agreement rate with the majority annotation than with other human annotators. It would be interesting to see the agreement rate of a fourth human annotator and majority codes, given that the human annotator could now examine the characteristics of the majority codes but have no direct effect on their assignment.

This statistic would provide a better insight into the theoretical upper bound for system performance the human performance on this task. The significantly lower agreement between single human annotators shows that different health institutes probably have their own individual style of ICDCM labeling.

We also listed the agreement rates of annotators and the gold standard labeling with our basic rule-based system with label dependencies. This system can be regarded as a hypothetical human annotator in the sense that it models the ICDCM coding guide an annotator should follow, not the gold standard labeling of the data itself.

The fact that human annotators agree slightly better with this system than each other also proves that they tend to follow specific standards that are not neccessarily confirmed by official annotation guidelines. It is also interesting to see that majority labling has a significantly higher agreement with this system than single annotators. This observation seems to justify that majority coding of independent annotators indeed estimates ICDCM coding guidelines better than single expert annotators.

All the above findings hold when we restrict the agreement evaluation to the 45 labels that appear in the gold standard. Agreement between human annotators remains comparable to their agreement with the the coding guide basic rule-based, BRB system.

Each of the annotators have one preferred partner with whom their agreement is slightly better than with the BRB system, and show definitely lower agreement with the other human annotation. The current systems have certain limitations when compared to the ICDCM coding of expert annotators. Take, for example, the following record from the training set:. The annotators — given that the record itself contains nothing of relevance for any ICD code — then conclude that this must be report of a routine chest x-ray V Such complex inferences are beyond the scope of automated systems.

Still, the obvious advantage of automated coding is that it is less prone to coding errors in simpler and more frequent cases. Some improvement, however, could be achieved by using a more sophisticated method to identify the scope of negation and speculative keywords than we applied here.

Take, for instance, the following record:. The use of syntactic structure to determine the scope of negation and speculative keywords would allow the coding of pneumonia here. Our current system considers the token pneumonia as speculative, but in the second sentence right middle corresponds to pneumonia as well and is in a non-speculative context.

The analysis of classification errors revealed that our results are quite close to the upper limit of performance that can be attained using the CMC challenge dataset. The similar results we obtained with two different classifiers and two different approaches used to extend the initial rule-based model also support this conclusion. The vast majority of classification errors are caused either by very rare cases single specific usages not covered or by not dealing with temporal aspects.

The labeling of the dataset itself seems to be inconsistent regarding temporality, thus we think that there is little hope of building simple rule-based or statistical models that would detect past illnesses reported in the records and improve the overall system performance. We should add here that there were 23 records where our final system could not assign any code. As every medical record contains at least a symptom or a disease label, it would be worthwhile dealing with these cases.

Small improvements could also be achieved by using better models for negation and speculative cases or by incorporating richer lists of synonyms as the examples above make clear.

Addressing these two tasks is what we plan to do in the future, but adding very rare terms would probably require the assistance of a physician to avoid overfitting on the labeled data. In order to perform the classification task accurately, some pre-processing steps have to be performed to convert the text into a consistent form and remove certain parts. First, we lemmatized and converted the whole text to lowercase.

For lemmatization issues we used the freely available Dragon Toolkit [ 13 ]. As a final step, we removed all punctuation marks from the text. According to the official coding guidelines, negated and speculative assertions also referred as soft negations have to be removed from the text as negative or uncertain diagnosis should not be coded in any case. We used the punctuations in the text to determine the scope of keywords. We identified the scope of negation and speculative keywords to be each subsequent token in the sentence.

For a very few specific keywords like or we used a left scope also, that was each token between the left-nearest punctuation mark and the keyword itself. We deleted every token from the text that was found to be in the scope of a speculative or negation keyword prior to the ICDCM coding process.

Our simple algorithm is similar to NegEx [ 14 ] as we use a list of phrases and their context, but we look for punctuation marks to determine the scopes of keywords instead of applying a fixed window size. In our experiments we found that a slight improvement on both the training and test sets could be achieved by classifying the speculative parts of the document in cases where the predicative texts were insufficient to assign any code. This observation suggests that human annotators tend to code uncertain diagnosis in those cases where they find no clear evidence of any code they avoid leaving a document blank.

Negative parts of the text were detrimental to accuracy in any case. Our final language processing method was the following:. If the document received no code in step 1, classify the document based on the speculative parts. Here we made use of negation and speculative keywords collected manually from the training dataset. The accurate handling of these two phenomena proved to be very important on the challenge dataset. Without the negation filter, the performance of our best system decreased by We observed that there was a The above-mentioned language processing approach was used throughout our experiments to permit a fair comparison of different systems all systems had the same advantages of proper preprocessing and the same disadvantages from preprocessing errors.

As regards its performance on the training data, our method seemed to be acceptably accurate. On the other hand, the more accurate identification of the scope of keywords is a straightforward way of further improving our systems. When you need to list more than one diagnosis for your patient, prioritize them: Code the primary diagnosis first followed by the next most important and so on. The primary diagnosis should be the one that receives the most attention during the patient visit.

For example, if a patient you're treating for hypertension presents with an upper respiratory infection, the infection would be considered the primary reason for the visit and should be listed first, followed by hypertension. For example, you're treating a patient with poorly controlled diabetes, hypertension and coronary artery disease. Because you see the patient most often for blood-glucose monitoring, the primary diagnosis would be diabetes followed by hypertension and coronary artery disease unless the patient had active signs or symptoms related to one of the other conditions.

And here's a related tip: Don't code a diagnosis that doesn't affect your care of the patient. For example, if your patient with diabetes is also being treated by an orthopedist for a broken arm, don't code the fracture since it doesn't affect the care you're providing.

Evaluate your coding skills by choosing the correct ICD-9 code s for the following two patient visits. The answers and explanations appear below. A patient complains of epigastric pain. You suspect reflux esophagitis and order an upper GI series. What ICD-9 code s would you submit for this visit? A female patient complains of dysuria and increased frequency.

A microscopic exam performed in your office reveals the presence of bacteriuria, and you order a culture. During the visit, the patient also asks you for a refill of Synthroid. Reviewing her medical history, you notice that she has not had her thyroid level checked in some time, and you perform a thyroid-stimulating hormone test.

Assuming that this patient has Graves' disease, what ICD-9 code s would you submit for this visit? For case 1 , the correct code is Therefore, you must code only the symptoms. Remember: Code only what you know. Since proper coding requires use of the highest number of digits that best describe your patient's condition, you must use five digits here. The fifth digit describes the location of the pain. For case 2 , the answer is less straightforward; this visit may be coded several ways.

You could choose code These symptoms would support the need for the office visit, microscopic exam and the subsequent culture. Since bacteriuria was present on examination, a definitive diagnosis cystitis was actually made at the time of the patient visit. Consequently, the visit and the tests could also be correctly coded as According to instructions in ICDCM, the organism should also be coded; however, since you don't have the results of the culture, you can't yet identify the specific organism involved.

Hence, One could argue that this code shouldn't be used to support the microscopic exam because the symptoms, not the diagnosis, were the reason for performing it. Next, you'll need to code the hyperthyroidism. But since the patient has Graves' disease, the code Finally, you must properly link the ICD-9 codes to each of the services provided. The primary reason for the visit is the dysuria and frequency. List these codes first, and link them to CPT codes for the two tests for the urinary complaints.

The hyperthyroidism code should not be linked to the urinary tests. Link code No matter who actually does the coding in your practice, the physicians are legally responsible for the codes selected and submitted to payers. Since it's usually the physicians who have first-hand knowledge of what occurred during the patient visit, the initial code selection should come from them.

Office staff can provide valuable help with the nuances of coding and specific payer requirements. Working as a team, physicians and staff can ensure that coding is done properly. Coding will never be the part of your work that you enjoy most; if it were, you wouldn't have bothered with medical school. But you do need to know the basics and be able to speak the language of coding. The bottom line is your bottom line: Accurate coding of diagnoses, signs and symptoms helps to streamline payment from third-party payers.

Although the coding system may seem confusing at first, it becomes an important management tool — once you get used to it. Already a member or subscriber?



0コメント

  • 1000 / 1000