Natural language processing techniques for the purpose of sentinel event information extraction

dc.contributor.authorBarrett, Neil
dc.contributor.supervisorJahnke, Jens H.
dc.contributor.supervisorLau, Francis Yin Yee
dc.date.accessioned2012-11-23T23:11:04Z
dc.date.available2012-11-23T23:11:04Z
dc.date.copyright2012en_US
dc.date.issued2012-11-23
dc.degree.departmentDepartment of Computer Science
dc.degree.levelDoctor of Philosophy Ph.D.en_US
dc.description.abstractAn approach to biomedical language processing is to apply existing natural language processing (NLP) solutions to biomedical texts. Often, existing NLP solutions are less successful in the biomedical domain relative to their non-biomedical domain performance (e.g., relative to newspaper text). Biomedical NLP is likely best served by methods, information and tools that account for its particular challenges. In this thesis, I describe an NLP system specifically engineered for sentinel event extraction from clinical documents. The NLP system's design accounts for several biomedical NLP challenges. The specific contributions are as follows. - Biomedical tokenizers differ, lack consensus over output tokens and are difficult to extend. I developed an extensible tokenizer, providing a tokenizer design pattern and implementation guidelines. It evaluated as equivalent to a leading biomedical tokenizer (MedPost). - Biomedical part-of-speech (POS) taggers are often trained on non-biomedical corpora and applied to biomedical corpora. This results in a decrease in tagging accuracy. I built a token centric POS tagger, TcT, that is more accurate than three existing POS taggers (mxpost, TnT and Brill) when trained on a non-biomedical corpus and evaluated on biomedical corpora. TcT achieves this increase in tagging accuracy by ignoring previously assigned POS tags and restricting the tagger's scope to the current token, previous token and following token. - Two parsers, MST and Malt, have been evaluated using perfect POS tag input. Given that perfect input is unlikely in biomedical NLP tasks, I evaluated these two parsers on imperfect POS tag input and compared their results. MST was most affected by imperfectly POS tagged biomedical text. I attributed MST's drop in performance to verbs and adjectives where MST had more potential for performance loss than Malt. I attributed Malt's resilience to POS tagging errors to its use of a rich feature set and a local scope in decision making. - Previous automated clinical coding (ACC) research focuses on mapping narrative phrases to terminological descriptions (e.g., concept descriptions). These methods make little or no use of the additional semantic information available through topology. I developed a token-based ACC approach that encodes tokens and manipulates token-level encodings by mapping linguistic structures to topological operations in SNOMED CT. My ACC method recalled most concepts given their descriptions and performed significantly better than MetaMap. I extended my contributions for the purpose of sentinel event extraction from clinical letters. The extensions account for negation in text, use medication brand names during ACC and model (coarse) temporal information. My software system's performance is similar to state-of-the-art results. Given all of the above, my thesis is a blueprint for building a biomedical NLP system. Furthermore, my contributions likely apply to NLP systems in general.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/4320
dc.languageEnglisheng
dc.language.isoenen_US
dc.rights.tempAvailable to the World Wide Weben_US
dc.subjectnatural language processingen_US
dc.subjectmedical language processingen_US
dc.subjectbiomedical language processingen_US
dc.subjectsentinel eventen_US
dc.subjectclinical documentsen_US
dc.subjectNLPen_US
dc.subjectMLPen_US
dc.subjectCLUen_US
dc.subjectclinical language processingen_US
dc.titleNatural language processing techniques for the purpose of sentinel event information extractionen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Barrett_Neil_PhD_2012.pdf
Size:
1.99 MB
Format:
Adobe Portable Document Format
Description:
thesis
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: