You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Then, create the indexing instructions directly in the node.csv and rels.csv files, so we don't need the ...index.csv files anymore, see https://github.com/jexp/batch-import -> automatic indexing
kind:string:mb comment status position name:string:mb area gender format barcode number ended length end_date_year begin_date_year mbid:string:mbid type:string:mb pk
artist Talkshow Boy f e8d94cf5-fafa-48fc-a6fa-aa50cf54d7f3 288762
artist Vibulator f 735bfaad-6eb1-4f9c-b21d-cbaef7c79a92 97944
artist Eat Me f c38a93e8-2ecf-4848-b1d2-364202d9dc0c Group 499198
artist Uffe Andersen f a7f3c871-3ba3-40b1-ba58-d08b40312789 Person 514886
artist Headust f eda60727-7036-437b-b53d-ae472818ee3a 212148
artist Sons Of The Subway f 232d5716-c2b2-47e1-aa0c-264ec69e6a18 100774
artist The Poe Boy Family f 672d599e-6a6c-456e-98ba-dac5a45e3ed8 43132
artist Ralph Gusovius Germany Male f 1950 6ecfcea1-677d-427b-a38b-9c76ce92e313 Person 295052
artist Elastik Band f 46e0639c-1ccf-45f5-b886-4cbf5549a2a1 61467
it happens I already started a branch "multi_nodescsv" in this very direction https://github.com/redapple/sql2graph/tree/multi_nodescsv
This branch also uses different CSV nodes files (another recent feature from batchimport),
theses CSV files can be generated directly by the database engine (at least Postgresql in the case of MusicBrainz)
The branch is not very clean yet, but uses automatic indexing for MusicBrainz, but in a different (and more naïve way). Comparatively, your "mbid" exact index for all MBIDs is smart; in the branch I am using an index per entity (artists, labels...) and indexing "mbid" for each, which is definitely less elegant.
I should be connected back in the coming days so I can work on updating "multi_nodescsv" branch with your ideas, and merge them into master if we converge.
As for the processing times you have, I'm afraid I don't have that much RAM on my laptop or server :) (I'm running out of memory when importing too many entities).
Great to hear you've been able to import all MusicBrainz!
Hi there,
I imported the musicbrainz database to Neo4j using the following approach, helped by @jexp:
Define 2 indexes (one
mbid
exact, for MBIDs and onemb
fulltext, for everything else) in batch.properties:Then, create the indexing instructions directly in the node.csv and rels.csv files, so we don't need the ...index.csv files anymore, see https://github.com/jexp/batch-import -> automatic indexing
And then import the two files with something like
WDYT? It would make the output a lot easier, and the import took about 10min on my machine, 160M Properties, 75M relatoinships ...
The text was updated successfully, but these errors were encountered: