forked from h2oai/h2o-3
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathrapids.txt
537 lines (351 loc) · 14.9 KB
/
rapids.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
Rapids is a functional language that can reduce multi-statement programs to a single statement.
It supports indexed reads and writes of keys, with a robust definition of allowed index operations, user functions, and a large set of built in functions. Functions can have arbitrary static number of parameters.
No recursive function definition is allowed
# multiple statements allowed now (what is put in the json response?)
(, (...) (...) (...) ...)
every (...) is a statement
# this functionality is properly part of Rapids now. Strings are supported by surrounding characters with either " or '
Currently not supported are ops like:
a[,] = "asdf"
(will/should blow with IAE)
# colnames
here's an example of how to do set colnames in Rapids
(colnames= %asdf {cols} {names})
# thinking about lhs
([ $a "null" "null" ])
on lhs means write to the whole frame
!a
on lhs means write to the whole frame too
so !a is shorthand for the top line.
I should be able to replace all my !a assigns with the top lhs.
Oh: but the top case requires a to exist
the bottom doesn't
# on figuring out when ! is needed:
On 12/04/2014 01:48 AM, Kevin wrote:
>
> (= ([ (+ #5 ([ $iris.hex "null" #0)) {#0;#1;#2;#3;#4} #0) ([ $iris.hex (: #4 #8) #0))
the mind boggling thing is there is no ! in your lhs above
if ! is not needed above then it's odd that it's ever needed
I guess it's only needed if there is no indexing?
ya that's the case,
! only needed if I'm setting into some named thing
= !asdjfhalsjdf (...)
if Rapids sees:
(= (...) (...) )
then the first (...) is either:
case 1: !sadfhaas
OR
case 2: ([ (...))
(...) == http://www.nasuni.com/writable/images/Blog_Images/turtles-1-1-1.jpg
----------------
On 12/04/2014 01:48 AM, Kevin wrote:
> (= ([ (+ #5 ([ $iris.hex "null" #0)) {#0;#1;#2;#3;#4} #0) ([ $iris.hex (: #4 #8) #0))
the mind boggling thing is there is no ! in your lhs above
if ! is not needed above then it's odd that it's ever needed
----------------
I guess it's only needed if there is no indexing?
using [0] to write to a col or row in a frame on lhs
can only be used if the key exists. col or row may not exist, if next N+1th row/col
----------------
using [0] to write to a col or row in a frame on lhs
can only be used if the key exists. col or row may not exist, if next N+1th row/col
----------------
Exec is an interpreter of abstract syntax trees.
Trees have a Lisp-like structure with the following "reserved" special characters:
( is required to have no space to the function definition
works
'= !x1 (sum ([ $r1 "null" #0) $TRUE)'
doesn't work
'= !x2 ( sum ([ $r1 "null" #0) $TRUE )'
it means (function ... ) is the right/only way to specify a function, right?
space between ( and next thing is not allowed
'(' signals the parser to begin a function application, next token is an identifier or a (single char) flag
'#' signals the parser to parse a double: attached_token
'"' signals the parser to parse a String (double quote): attached_token
"'" signals the parser to parse a String (single quote): attached_token
'$' signals a variable lookup: attached_token
'!' signals a variable set: attached_token
'[' signals a column slice by index - R handles all named to int conversions (as well as 1-based to 0-based)
# NO!
# 'f' signals the parser to a parse a function: (f name args body).
'=' signals the parser to assign the RHS to the LHS.
'g' signals >
'G' signals >=
'l' signals <
'L' signals <=
'n' signals ==
'N' signals !=
'_' signals negation (!)
'{' signals the parser to begin parsing a ';'-separated array of things (ASTSeries is the resulting AST)
In the above, attached_token signals that the special char has extra chars that must be parsed separately. These are
variable names (in the case of $ and !), doubles (in the case of #), or Strings (in the case of ' and ").
Everything else is a function call (prefix/infix/func) and has a leading char of '('.
#************************************************
all operators need to be separated by space?
[ $r1 "null" #0
I don't think we have any a[0] kind of thing in R..all data frames are two dimensional?
(even single columns)?
can you say a[5] for the fifth element of a column? what's that look like to you if so?
any ( is always paired with a )
The ASTFunc Object
An ASTFunc pulls the function ast produced by the front-end and creates a reference to this function.
A function has a body (which may be empty), and a body is a list of statements.
[] for multiple functions (single too?)
(def f {args} {body})
# got npe when I didn't follow spacing rules
# yup that's all, just the space between `anon` and `{x}` was needed, doesn't matter about spaces after the `{x}`
Separator is ';;' final statement ends with ';;;'
Statements that are possible:
if statements
else statements
for statements
while statements
switch statement
declarative statements
operative statements
return statements
The last statement of a function will return the result of that statement.
Some Rules:
-----------
Every function defines a new Environment that inherits from the current one. Put another way, the calling scope
provides the context for the function to be executed in. Environments can be captured with the `capture` call.
No function shall modify state in any of its parent environments (which includes the DKV store). A function may only
return values to a parent scope.
// All of the special chars (see Exec.java)
SYMBOLS.put("=", new ASTAssign());
SYMBOLS.put("'", new ASTString('\'', ""));
SYMBOLS.put("\"",new ASTString('\"', ""));
SYMBOLS.put("$", new ASTId('$', ""));
SYMBOLS.put("!", new ASTId('!', ""));
SYMBOLS.put("#", new ASTNum(0));
SYMBOLS.put("g", new ASTGT());
SYMBOLS.put("G", new ASTGE());
SYMBOLS.put("l", new ASTLT());
SYMBOLS.put("L", new ASTLE());
SYMBOLS.put("N", new ASTNE());
SYMBOLS.put("n", new ASTEQ());
SYMBOLS.put("[", new ASTSlice());
SYMBOLS.put("{", new ASTSeries(null, null));
SYMBOLS.put(":", new ASTSpan(new ASTNum(0),new ASTNum(0)));
SYMBOLS.put("_", new ASTNot());
SYMBOLS.put("if", new ASTIf());
SYMBOLS.put("else", new ASTElse());
SYMBOLS.put("for", new ASTFor());
SYMBOLS.put("while", new ASTWhile());
SYMBOLS.put("return", new ASTReturn());
// Unary infix ops
putUniInfix(new ASTNot());
// Binary infix ops
putBinInfix(new ASTPlus());
putBinInfix(new ASTSub());
putBinInfix(new ASTMul());
putBinInfix(new ASTDiv());
putBinInfix(new ASTPow());
putBinInfix(new ASTPow2());
putBinInfix(new ASTMod());
putBinInfix(new ASTAND());
putBinInfix(new ASTOR());
putBinInfix(new ASTLT());
putBinInfix(new ASTLE());
putBinInfix(new ASTGT());
putBinInfix(new ASTGE());
putBinInfix(new ASTEQ());
putBinInfix(new ASTNE());
putBinInfix(new ASTLA());
putBinInfix(new ASTLO());
// Unary prefix ops
putPrefix(new ASTIsNA());
putPrefix(new ASTNrow());
putPrefix(new ASTNcol());
putPrefix(new ASTLength());
putPrefix(new ASTAbs ());
putPrefix(new ASTSgn ());
putPrefix(new ASTSqrt());
putPrefix(new ASTCeil());
putPrefix(new ASTFlr ());
putPrefix(new ASTLog ());
putPrefix(new ASTExp ());
putPrefix(new ASTScale());
putPrefix(new ASTFactor());
putPrefix(new ASTIsFactor());
putPrefix(new ASTAnyFactor()); // For Runit testing
putPrefix(new ASTCanBeCoercedToLogical());
putPrefix(new ASTAnyNA());
putPrefix(new ASTRound());
putPrefix(new ASTSignif());
putPrefix(new ASTTrun());
// Trigonometric functions
putPrefix(new ASTCos());
putPrefix(new ASTSin());
putPrefix(new ASTTan());
putPrefix(new ASTACos());
putPrefix(new ASTASin());
putPrefix(new ASTATan());
putPrefix(new ASTCosh());
putPrefix(new ASTSinh());
putPrefix(new ASTTanh());
// More generic reducers
putPrefix(new ASTMin ());
putPrefix(new ASTMax ());
putPrefix(new ASTSum ());
putPrefix(new ASTSdev());
putPrefix(new ASTVar());
putPrefix(new ASTMean());
// Misc
putPrefix(new ASTMatch());
putPrefix(new ASTRename()); //TODO
putPrefix(new ASTSeq ()); //TODO
putPrefix(new ASTSeqLen()); //TODO
putPrefix(new ASTRepLen()); //TODO
putPrefix(new ASTQtile ()); //TODO
putPrefix(new ASTCbind ());
putPrefix(new ASTTable ());
// putPrefix(new ASTReduce());
// putPrefix(new ASTIfElse());
putPrefix(new ASTApply());
putPrefix(new ASTSApply());
// putPrefix(new ASTddply ());
// putPrefix(new ASTUnique());
putPrefix(new ASTXorSum ());
putPrefix(new ASTRunif ());
putPrefix(new ASTCut ());
putPrefix(new ASTLs ());
yeah it's unclear if [ requires ([...)
or if the number of arguments is implied without ()
On 11/17/2014 10:04 PM, Spencer Aiello wrote:
> In R, this translates to
> r1[,1]
> to get the 5th element:
> r1[5,1] => ([ $r1 #4 #0)
yes '[' always takes 3 args
arg1 is the frame
arg2 is the rows
arg3 is the cols
On 11/17/2014 10:06 PM, Spencer Aiello wrote:
> '{' are used in cases like this:
> r1[c(1,5,8,10,33),] => ([ $r1 {#1;#5;#8;#10;#33} "null")
okay, {} does a cbind, maybe only for constants? (can there be variables inside {} ?
or can it be anything?
'(' is meant to signal that the next token is some sort of function. So the input is a token that will be looked up in the big list of Ops.
'{' "things" are flagged inputs:
A flag is a special char: ' " $ # are the main ones:
$x means lookup x
"x... means parse the String with expected end quote "
'x.... means parse the String with expected end quote '
# means parse a number
These are the "things"
All functions will be named. -- I should make this a more explicit rule actually -- good point.
Number of args are defined in parsing of a new function. The rules here are more complicated and I should really write them down so here goes:
H2O learns about a new function as follows:
An AST must be passed to H2O under the "funs" end point.
Here's an example:
f <- function(x,y,z) {
x <- z[,1]
y <- x * x * z[,3]
y - 1
}
ASTs in funs have the following form:
(def name {args} body)
args are a ';'-separated array of tokens -- no special flags necessary here.
body is more complicated:
body is a list of statements in the function body.
statements are separated with ';;'
The last statement ends with ';;;'
Currently disallowing nested function definition at the AST level -- the front end must correctly swap out defs with an invocation.
#***********************************************************************
var is variance
it's a function in R:
var(x, y = NULL, na.rm = FALSE, use)
will look like this:
(var $rf "null" $FALSE "everything")
`use` just says what to do with NAs:
from the R help:
use: an optional character string giving a method for computing
covariances in the presence of missing values. This must be
(an abbreviation of) one of the strings ‘"everything"’,
‘"all.obs"’, ‘"complete.obs"’, ‘"na.or.complete"’, or
‘"pairwise.complete.obs"’.
We support "everything" (default), "complete.obs", and "all.obs"
On 11/25/2014 12:19 AM, Spencer Aiello wrote:
> not sure what the deal is with this one, currently debugging it
I had a question about whether I needed () around the body of the function statements.
It seemed to work with or without
i.e. is the syntax this
(def anon {x} ( (var ...));;;)
or this:
(def anon {x} (var ...);;;)
it matters for deciding what the 2 expression syntax is
this
(def anon {x} ( (var ...);;(var ...));;;)
or this
(def anon {x} (var ...);;(var ...);;;)
or is the syntax something else
you used to have {} in the function body in an old email, I think
that's what was new to me in your example. the function body syntax.
only the last statement becomes global,
example:
function(x) {
z <- x[,1]
y <- z[1,1]
}
technically neither "z" nor "y" should be global, but the value z[1,1] is returned to the outer scope. Looks like I'm allowing key creation within functions to leak to the global space, which is not the intention. I will fix that
The rules are:
Doing an apply over columns: (apply ... #2 ...)
each column has the function applied to it. the result from each column must not produce more than 1 column
Doing an apply over rows: (apply ... #1 ...)
each row may produce 1 or more columns of data
last stmnt can be with or without LHS shouldn't matter
prior statements can have LHS as well. Those LHS should even be able to be referenced in later statements inside the same fcn
#************************************
# cut
#************************************
library(h2o)
h <- h2o.init()
hex <- as.h2o(h, iris)
a <- hex[,1]
cut(a, breaks = c(min(a), mean(a), max(a)), labels = c("a", "b"))
(cut $a {4.3;5.84333333333333;7.9} {"a";"b"} $FALSE $TRUE #3))
arg1: single column vector
arg2: the cuts to make in the vector
arg3: labels for the cuts (always 1 fewer than length of cuts)
arg4: include lowest? (default FALSE)
arg5: right? (default TRUE)
arg6: dig.lab (default 3)
More info on the args here (see ?cut):
x: a numeric vector which is to be converted to a factor by
cutting.
breaks: either a numeric vector of two or more unique cut points or a
single number (greater than or equal to 2) giving the number
of intervals into which ‘x’ is to be cut.
labels: labels for the levels of the resulting category. By default,
labels are constructed using ‘"(a,b]"’ interval notation. If
‘labels = FALSE’, simple integer codes are returned instead
of a factor.
include.lowest: logical, indicating if an ‘x[i]’ equal to the lowest
(or highest, for ‘right = FALSE’) ‘breaks’ value should be
included.
right: logical, indicating if the intervals should be closed on the
right (and open on the left) or vice versa.
dig.lab: integer which is used when labels are not given. It
determines the number of digits used in formatting the break
#**********************
# objects in h2o-dev
#**********************
Well not entirely:
Acceptable args are:
numbers: #127882
strings: "asdf", 'asdf'
Frames: $v2, $iris.hex
anything else gets called a vector, but that's not the right name...
#*****************************************8888
okay, so it looks like:
(if (expr)...)
and
(else ..)
are syntax like functions
expr is the same syntax as any expression (although it shoudn't have a lhs? it could be a function that returns a logical value)
any (else ..) relates to the previous (if ..) within the same level
if another (if ...) happens at the same level before an (else ..) to a prior (if ..) that prior (if ..) loses it's ability to have an (else ...)
(else ..) acts as if it has an if clause that's the logical opposite of the (if ..) it's paired with
You can't have two sequential (else..) at the same level..need an intermidiate (if ...)
(return expr) ends the function and returns the expr value