Natural language processing techniques for the purpose of sentinel event information extraction

Show simple item record

dc.contributor.author Barrett, Neil
dc.date.accessioned 2012-11-23T23:11:04Z
dc.date.available 2012-11-23T23:11:04Z
dc.date.copyright 2012 en_US
dc.date.issued 2012-11-23
dc.identifier.uri http://hdl.handle.net/1828/4320
dc.description.abstract An approach to biomedical language processing is to apply existing natural language processing (NLP) solutions to biomedical texts. Often, existing NLP solutions are less successful in the biomedical domain relative to their non-biomedical domain performance (e.g., relative to newspaper text). Biomedical NLP is likely best served by methods, information and tools that account for its particular challenges. In this thesis, I describe an NLP system specifically engineered for sentinel event extraction from clinical documents. The NLP system's design accounts for several biomedical NLP challenges. The specific contributions are as follows. - Biomedical tokenizers differ, lack consensus over output tokens and are difficult to extend. I developed an extensible tokenizer, providing a tokenizer design pattern and implementation guidelines. It evaluated as equivalent to a leading biomedical tokenizer (MedPost). - Biomedical part-of-speech (POS) taggers are often trained on non-biomedical corpora and applied to biomedical corpora. This results in a decrease in tagging accuracy. I built a token centric POS tagger, TcT, that is more accurate than three existing POS taggers (mxpost, TnT and Brill) when trained on a non-biomedical corpus and evaluated on biomedical corpora. TcT achieves this increase in tagging accuracy by ignoring previously assigned POS tags and restricting the tagger's scope to the current token, previous token and following token. - Two parsers, MST and Malt, have been evaluated using perfect POS tag input. Given that perfect input is unlikely in biomedical NLP tasks, I evaluated these two parsers on imperfect POS tag input and compared their results. MST was most affected by imperfectly POS tagged biomedical text. I attributed MST's drop in performance to verbs and adjectives where MST had more potential for performance loss than Malt. I attributed Malt's resilience to POS tagging errors to its use of a rich feature set and a local scope in decision making. - Previous automated clinical coding (ACC) research focuses on mapping narrative phrases to terminological descriptions (e.g., concept descriptions). These methods make little or no use of the additional semantic information available through topology. I developed a token-based ACC approach that encodes tokens and manipulates token-level encodings by mapping linguistic structures to topological operations in SNOMED CT. My ACC method recalled most concepts given their descriptions and performed significantly better than MetaMap. I extended my contributions for the purpose of sentinel event extraction from clinical letters. The extensions account for negation in text, use medication brand names during ACC and model (coarse) temporal information. My software system's performance is similar to state-of-the-art results. Given all of the above, my thesis is a blueprint for building a biomedical NLP system. Furthermore, my contributions likely apply to NLP systems in general. en_US
dc.language English eng
dc.language.iso en en_US
dc.subject natural language processing en_US
dc.subject medical language processing en_US
dc.subject biomedical language processing en_US
dc.subject sentinel event en_US
dc.subject clinical documents en_US
dc.subject NLP en_US
dc.subject MLP en_US
dc.subject CLU en_US
dc.subject clinical language processing en_US
dc.title Natural language processing techniques for the purpose of sentinel event information extraction en_US
dc.type Thesis en_US
dc.contributor.supervisor Jahnke, Jens H.
dc.contributor.supervisor Lau, Francis Yin Yee
dc.degree.department Dept. of Computer Science en_US
dc.degree.level Doctor of Philosophy Ph.D. en_US
dc.rights.temp Available to the World Wide Web en_US
dc.description.scholarlevel Graduate en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UVicSpace


My Account