Practical Earley parsing and the SPARK toolkit

dc.contributor.authorAycock, John Daniel
dc.contributor.supervisorHorspool, R. Nigel of Computer Scienceen_US of Philosophy Ph.D.en_US
dc.description.abstractDomain-specific, “little” languages are commonplace in computing. So too is the need to implement such languages; to meet this need, we have created SPARK (Scanning, Parsing, And Rewriting Kit), a toolkit for little language implementation in Python, an object-oriented scripting language. SPARK greatly simplifies the task of little language implementation. It requires little code to be written, and accommodates a wide range of users—even those without a background in compiler theory. Our toolkit is seeing increasing use on a variety of diverse projects. SPARK was designed to be easy-to-use with few limitations, and relies heavily on Earley's general parsing algorithm internally, which helps in meeting these design goals. Earley's algorithm, in its standard form, can be hard to use; indeed, experience with SPARK has highlighted several problems with the practical use of Earley's algorithm. Our research addresses and provides solutions for these problems, making some significant improvements to the implementation and use of Earley's algorithm. First, Earley's algorithm suffers from the performance problem . Even under optimum conditions, a standard Earley parser is burdened with overhead. We extend directly-executable parsing techniques for use in Earley parsers, the results of which run in time comparable to the much-more-specialized LALR(1) parsing algorithm. Second is what we call the delayed action problem. General parsers like Earley must, in the worst case, read the entire input before executing any semantic actions associated with the grammar rules. We attack this problem in two ways. We have identified conditions under which it is safe to execute semantic actions on the fly during recognition; as a side effect, this has yielded space savings of over 90% for some grammars. The other approach to the delayed action problem deals with the difficulty of handling context-dependent tokens. Such tokens are easy to handle using what we call “Schrödinger's tokens,” a superposition of token types. Finally, Earley parsers are complicated by the need to process grammar rules with empty right-hand sides. We present a simple, efficient way to handle these empty rules, and prove that our new method is correct. We also show how our method may be used to create a new type of LR(0) automaton which is ideally suited for use in Earley parsers. Our work has made Earley parsing faster and more space-efficient, turning it into an excellent candidate for practical use in many applications.en_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectParsing (Computer grammar)en_US
dc.subjectSPARK (Computer program language)en_US
dc.subjectProgramming languages (Electronic computers)en_US
dc.titlePractical Earley parsing and the SPARK toolkiten_US


Original bundle
Now showing 1 - 1 of 1
Thumbnail Image
3.33 MB
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
1.71 KB
Item-specific license agreed upon to submission