Practical Earley parsing and the SPARK toolkit

Aycock, John Daniel

Practical Earley parsing and the SPARK toolkit

dc.contributor.author	Aycock, John Daniel
dc.contributor.supervisor	Horspool, R. Nigel
dc.date.accessioned	2018-05-23T18:44:07Z
dc.date.available	2018-05-23T18:44:07Z
dc.date.copyright	2001	en_US
dc.date.issued	2018-05-23
dc.degree.department	Department of Computer Science
dc.degree.level	Doctor of Philosophy Ph.D.	en_US
dc.description.abstract	Domain-specific, “little” languages are commonplace in computing. So too is the need to implement such languages; to meet this need, we have created SPARK (Scanning, Parsing, And Rewriting Kit), a toolkit for little language implementation in Python, an object-oriented scripting language. SPARK greatly simplifies the task of little language implementation. It requires little code to be written, and accommodates a wide range of users—even those without a background in compiler theory. Our toolkit is seeing increasing use on a variety of diverse projects. SPARK was designed to be easy-to-use with few limitations, and relies heavily on Earley's general parsing algorithm internally, which helps in meeting these design goals. Earley's algorithm, in its standard form, can be hard to use; indeed, experience with SPARK has highlighted several problems with the practical use of Earley's algorithm. Our research addresses and provides solutions for these problems, making some significant improvements to the implementation and use of Earley's algorithm. First, Earley's algorithm suffers from the performance problem . Even under optimum conditions, a standard Earley parser is burdened with overhead. We extend directly-executable parsing techniques for use in Earley parsers, the results of which run in time comparable to the much-more-specialized LALR(1) parsing algorithm. Second is what we call the delayed action problem. General parsers like Earley must, in the worst case, read the entire input before executing any semantic actions associated with the grammar rules. We attack this problem in two ways. We have identified conditions under which it is safe to execute semantic actions on the fly during recognition; as a side effect, this has yielded space savings of over 90% for some grammars. The other approach to the delayed action problem deals with the difficulty of handling context-dependent tokens. Such tokens are easy to handle using what we call “Schrödinger's tokens,” a superposition of token types. Finally, Earley parsers are complicated by the need to process grammar rules with empty right-hand sides. We present a simple, efficient way to handle these empty rules, and prove that our new method is correct. We also show how our method may be used to create a new type of LR(0) automaton which is ideally suited for use in Earley parsers. Our work has made Earley parsing faster and more space-efficient, turning it into an excellent candidate for practical use in many applications.	en_US
dc.description.scholarlevel	Graduate	en_US
dc.identifier.uri	http://hdl.handle.net/1828/9392
dc.language	English	eng
dc.language.iso	en	en_US
dc.rights	Available to the World Wide Web	en_US
dc.subject	Parsing (Computer grammar)	en_US
dc.subject	SPARK (Computer program language)	en_US
dc.subject	Programming languages (Electronic computers)	en_US
dc.title	Practical Earley parsing and the SPARK toolkit	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Aycock_JohnDaniel_PhD_2001.pdf
Size:: 3.33 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronic Theses and Dissertations (ETD)