Dimensional Analysis of Data Flow Programs

Show simple item record

dc.contributor.author Shennat, Abdulmonem Ibrahim
dc.date.accessioned 2022-05-25T00:12:04Z
dc.date.available 2022-05-25T00:12:04Z
dc.date.copyright 2022 en_US
dc.date.issued 2022-05-24
dc.identifier.uri http://hdl.handle.net/1828/13965
dc.description.abstract Our main objective is to design Dimensional Analysis (DA) algorithms for the multidimensional dialect PyLucid of Lucid, the equational data flow language. The significance is that the DA is indispensable for an efficient implementation of multidimensional Lucid and should aid the implementation of other data flow systems, such as Google’s TensorFlow. Data flow is a form of computation in which components of multidimensional datasets (MDDs) travel on communication lines in a network of processing stations. Each processing station incrementally transforms its input MDDs to its output, another (possibly very different) MDD. MDDs are very common in Health Information Systems and data science in general. An important concept is that of relevant dimension. A dimension is relevant if the coordinate of that dimension is required to extract a value. It is very important that in calculating with MDDs we avoid non-relevant dimensions, otherwise we duplicate entries (say, in a cache) and waste time and space. Suppose, for example, that we are measuring rainfall in a region. Each individual measurement (say, of an hour’s worth of rain) is determined by location (one dimension), day, (a second dimension) and time of day (a third dimension). All three dimensions are a priori relevant. Now suppose we want the total rainfall for each day. In this MDD (call it N) the relevant dimensions are location and day, but time of day is no longer relevant and must be removed. Normally this is done manually. However, can this process be automated? We answer this question affirmatively by devising and testing algorithms that produce useful and reliable approximations (specifically, upper bounds) for the dimensionalities of the variables in a program. By dimensionality we mean the set of relevant dimensions. For example, if M is the MDD of raw rain measurements, its dimensionality is {location, day, hour}, and that of N is {location, day}. Note that the dimensionality is more than just the rank, which is simply the number of dimensions. Previously, there’s extensive research on dataflow itself, which we summarize. However, an exhaustive literature search uncovered no relevant previous DA work other than that of the GLU (Granular Lucid) project in the 90s. Unfortunately the GLU project was funded privately and remains proprietary – not even the author has access to it. Our methodology is that we proceeded incrementally, solving increasingly difficult instances of DA corresponding to increasingly sophisticated language features. We solved the case of one dimension (time), two dimensions (time and space), and multiple dimensions. We also solved the difficult problem (which the GLU team never solved) of determining the dimensionality of programs that include user defined functions, including recursively defined functions. We do this by adapting the PyLucid interpreter (to produce the DAM interpreter) to evaluating the entire program over the (finite) domain of dimensionalities. As a result, the experimentally validated algorithms in our dissertation can produce useful upper bounds for the dimensionalities of the variables in multidimensional PyLucid programs. That also includes those with user-defined functions en_US
dc.language English eng
dc.language.iso en en_US
dc.rights Available to the World Wide Web en_US
dc.subject Dataflow en_US
dc.subject Dimensional Analysis en_US
dc.subject PyLucid Interpreter en_US
dc.subject Multipledimensional en_US
dc.subject DAM Interpreter en_US
dc.subject User Defined Functions en_US
dc.subject Lucid en_US
dc.title Dimensional Analysis of Data Flow Programs en_US
dc.type Thesis en_US
dc.contributor.supervisor Kuo, Alex
dc.contributor.supervisor Wadge, W. W.
dc.degree.department Department of Computer Science en_US
dc.degree.level Doctor of Philosophy Ph.D. en_US
dc.description.scholarlevel Graduate en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UVicSpace


My Account