Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response Error when committing large numbers of nodes/edges at once. #75

Open
danieljue opened this issue May 26, 2020 · 2 comments
Open

Comments

@danieljue
Copy link

I'm running the current docker image in Windows. I discovered this problem when my program encountered a large book and tried to ingest sentences from it. I was able to reproduce the problem with a more simple example in a jupyter notebook.

from string import ascii_letters

r = redis.Redis(host='localhost', port=6379)

def get_mutated_string(s):
    #This is just so we have some way of generating lots of strings different from the last.
    inds = [i for i,_ in enumerate(s) if not s.isspace()]
    sam = random.sample(inds, 3)
    letts =  iter(random.sample(ascii_letters, 3))
    lst = list(s)
    for ind in sam:
        lst[ind] = next(letts)

    return "".join(lst)

#this is just to reset the graph when trying different amounts of nodes.
try:
    rg2.delete()
except:
    pass

rg2 = Graph('bulk_test', r)
nodes=[]
edges=[]
last = None
s = "Jean Piaget"
for i in range(9041):
    new = Node(label='name', properties={'w': s})
    nodes.append(new)
    if last is not None:
        edges.append(Edge(last, 'mutate', new))
    s= get_mutated_string(s)
    last = new
    
for n in nodes:
    rg2.add_node(n)
    
for e in edges:
    rg2.add_edge(e)
    
#Like commit, but sets nodes and edges to empty.  Multiple flushes don't cause duplicates.
rg2.flush()

If the range is only 9040, no errors. If I increase it to 9041 or higher, I get this error:

Error
ResponseError: errMsg: Invalid input 'J': expected '.', AND, OR, XOR, NOT, '=~', '=', '<>', '+', '-', '*', '/', '%', '^', IN, CONTAINS, STARTS WITH, ENDS WITH, '<=', '>=', '<', '>', IS NULL, IS NOT NULL, '[', '{', a label, ',' or '}' line: 1, column: 1048604, offset: 1048603 errCtx: ...e{w:"GVNjJCREATE (pkuvboenoi:name{w:"Jean Piaget"}),(tabnluexcb:name{w:"Ye... errCtxOffset: 40

Depending on your machine or the size of the nodes, the number could differ (like 10000 or 20000).
I was also able to cause this problem with only one node that had a large property.

Expected behavior:
Tell me if the content of the commit is too large
Or
Handle it gracefully.

@swilly22
Copy link
Contributor

swilly22 commented May 26, 2020

Hi @danieljue thank you for reporting,
you've probably hit RedisGraph parser buffer size limit.
Let me check what are the consequences of enlarging this buffer.

@danieljue
Copy link
Author

danieljue commented May 26, 2020

Thanks! I looked on github and maybe there are some ideas that can be borrowed from here:

https://github.com/RedisGraph/redisgraph-bulk-loader/blob/master/redisgraph_bulk_loader/bulk_insert.py

Their actual bulk insert library doesn't work well for my use case, because the data is not in a CSV type of format, but I saw some code in there relating to buffer size.

Perhaps there's a way to allow an "unsafe" commit for the edges, where the code doesn't check for the existence of the nodes. In this way we could insert nodes in smaller batches, and insert edges later on, putting the responsibility of those nodes' existence on the developer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants