The process of converting a MARC record into an Instance object is usually called MARC-to-Instance mapping.
Conversion logic is defined by mapping rules and these rules are described in JSON. Mapping rules basically have functions for data normalization (trimming leading whitespaces, removing slashes, removing ending punctuation) and the assigning target Inventory fields to the values of the incoming MARC record.
Basically, a mapping rule is a simple key-value JSON element. The key serves a MARC record's field (tag). The value is a rule configuration.
Rule:
{
"001":[
{
"target":"hrid",
"description":"The human readable ID"
}
]
}
This rule belongs to the "001" field and handles all the "001" fields from incoming record. It takes value from "001" field and puts it into Instance "hrid" field.
SRM has own schema for Inventory Instance. The schema describes target fields, data types, restrictions and other internal details of Instance. Please, be careful while writing rules - you can put into "target" only fields from the schema, take a look at the Instance schema for clear understanding. If the "target" field is specified wrong then the RuleProcessor does not take this rule for mapping.
NOTE: MARC records and mapping rules below have been taken intentionally just for demonstration purposes.
In most cases the record value needs to be normalized before being imported into an Instance field, because MARC record data is often raw and mixed. For this purpose we have to use such structure:
MARC Record: "001": "393/89/3"
Rule:
"001":[
{
"target":"hrid",
"description":"The human readable ID",
"rules":[
{
"conditions":[
{
"type":"remove_substring",
"parameter":{
"substring":"/"
}
}
]
}
]
}
]
remove_substring
is normalization function that removes the given substring from the field’s value. The function is just doing a job and returns a string that ends up in the Instance “hrid” field.
An outcome Instance looks like this in Json:
Instance:
{
"hrid": "393893"
}
Here are all of the formatting functions defined. The most useful ones are: trim, capitalize, remove_ending_punc
.
In most cases there are subfields present within MARC fields. This is important for mapping. Here is an example of a MARC “250” field that contains a “subfield a”, a “subfield b”, and a “subfield 6.”
MARC Record: "250":{"ind1":"", "ind2":"", "subfields":[ { "a":" fifth ed." }, { "b":"Editor in chief Lord Mackay of Clashfern. " } , {"6":"880-02"}]}
Rule:
"250":[
{
"target":"edition",
"description":"Edition",
"subfield":["a", "b"],
"rules":[
{
"conditions":[
{
"type":"capitalize, trim"
}
]
}
]
}
]
The RuleProcessor triggers the rule above only for each allowed subfield ("a" and "b"), that causes a call for normalization functions ("capitalize" and "trim"). Subfield "6" is not allowed, the RuleProcessor just skips calling normalization functions for "6" subfield. The result is concatenated in one string and written to the Instance "edition" field. An outcome Instance looks like this in Json:
Instance:
{
"edition": "Fifth ed. Editor in chief Lord Mackay of Clashfern."
}
Sometimes you can encounter Instance fields with sub-fields, for example, Electronic access (sub-fields: linked text, public note, uri), Contributors (name, type, primary), Publication (place, publisher, date of publication).
To map "264" record into the "place of publication" we have to use period-delimited syntax: "target": "publication.place"
. Period serves a sub-field delimiter to reach a "place" sub-field of the "publication" field.
We can write the mapping rules for the Publication field as shown below:
MARC Record: "264":{"subfields":[{"a":"Chicago, Illinois :"}, {"b":"The HistoryMakers,"}, {"c":"[2016]"}], "ind1":" ", "ind2":"1"}
Rule:
"264": [
{
"target": "publication.place",
"description": "Place of publication",
"subfield": ["a"],
"rules": []
},
{
"target": "publication.publisher",
"description": "Publisher of publication",
"subfield": ["b"],
"rules": []
},
{
"target": "publication.dateOfPublication",
"description": "Date of publication",
"subfield": ["c"],
"rules": []
}
]
An outcome Instance looks like this in Json:
Instance:
{
"publication":[
{
"place":"Chicago, Illinois :",
"publisher":"The HistoryMakers,",
"dateOfPublication":"[2016]"
}
]
}
If there are repeated MARC fields in a single record, then the Instance will receive several elements in its target field. To skip mapping for repeated fields and take only the first occurrence, we can use the ignoreSubsequentFields
flag:
MARC Record:
"336":{"subfields":[{"a":"text"}, {"b":"txt"}, {"2":"rdacontent"}], "ind1":" ", "ind2":" "}, ...
"336":{"subfields":[{"a":"performed music"}, {"b":"prm"}, {"2":"rdacontent"}], "ind1":" ", "ind2":" "}
Rule:
"336": [
{
"target": "instanceTypeId",
"description": "Instance type ID",
"ignoreSubsequentFields": true,
"subfield": ["b"],
"rules": []
}
]
An outcome Instance looks like this in Json:
Instance:
{
"instanceTypeId": "txt"
}
Usually, the Rule Processor creates only one instance of the 'target' field for each record field. What if we need to create several objects from single record field ?
In the example below we map several 'publication' elements from a single "264" record field. To do so we have to wrap a rule structure into an entity
. Let's consider the example below:
MARC Record:
"264": {
"subfields":[
{"a":"Chicago, Illinois :"},
{"b":"The HistoryMakers,"},
{"c":"[2016]"},
{"f":"Nashville, Tennessee"},
{"g":"Revenant Records"},
{"h":"[2015]"}
],
"ind1":" ",
"ind2":"1"
}
Rule:
"264": [
{
"entity": [
{
"target": "publication.place",
"description": "Place of publication",
"subfield": ["a"],
"rules": []
},
{
"target": "publication.publisher",
"description": "Publisher of publication",
"subfield": ["b"],
"rules": []
},
{
"target": "publication.dateOfPublication",
"description": "Date of publication",
"subfield": ["c"],
"rules": []
}
]
},
{
"entity": [
{
"target": "publication.place",
"description": "Place of publication",
"subfield": ["f"],
"rules": []
},
{
"target": "publication.publisher",
"description": "Publisher of publication",
"subfield": ["g"],
"rules": []
},
{
"target": "publication.dateOfPublication",
"description": "Date of publication",
"subfield": ["h"],
"rules": []
}
]
}
]
An outcome Instance looks like this in Json:
Instance:
{
"publication":[
{
"place":"Chicago, Illinois :",
"publisher":"The HistoryMakers,",
"dateOfPublication":"[2016]"
},
{
"place":"Nashville, Tennessee :",
"publisher":"Revenant Records,",
"dateOfPublication":"[2015]"
}
]
}
If there are several repeated sub-fields in one single record, then entity
will concatenate them. To create a new object per each sub-field even if they are repeated, we can use entityPerRepeatedSubfield
flag:
MARC Record:
"264": {
"subfields":[
{"a":"Chicago, Illinois :"},
{"a":"Nashville, Tennessee"},
{"f": "Austin Texas"}
],
"ind1":" ",
"ind2":"1"
}
Rule:
"264": [
{
"entityPerRepeatedSubfield": true,
"entity": [
{
"target": "publication.place",
"description": "Place of publication",
"subfield": ["a", "f"],
"rules": []
},
{
"target": "publication.publisher",
"description": "Publisher of publication",
"subfield": ["a", "f"],
"rules": [
{
"conditions": [],
"value": "STUB publisher"
}
]
},
{
"target": "publication.dateOfPublication",
"description": "Date of publication",
"subfield": ["a", "f"],
"rules": [
{
"conditions": [],
"value": "STUB date"
}
]
}
]
}
]
An outcome Instance looks like this in Json:
Instance:
{
"publication":[
{
"place":"Chicago, Illinois :",
"publisher":"STUB publisher,",
"dateOfPublication":"STUB date"
},
{
"place":"Nashville, Tennessee :",
"publisher":"STUB publisher",
"dateOfPublication":"STUB date"
},
{
"place":"Austin Texas",
"publisher":"STUB publisher",
"dateOfPublication":"STUB date"
}
]
}
By default rules run on the data in a single sub-field. In order to concatenate un-normalized data, and run the rules on the concatenated data add the following field: applyRulesOnConcatedData: true
. This can be used when punctuation should only be removed from the end of a concatenated string.
Rule:
"500": [
{
"subfield": ["a"],
"applyRulesOnConcatenatedData": true
}
]
Grouping sub-fields "subfield": [ "a", "f"]
will concatenate (space delimited) the values in those sub-fields and place the result in the target. However, if there is a need to declare different delimiters per set of sub-fields, the following can be declared using the subFieldDelimiter
array:
MARC Record:
"264": {
"subfields":[
{"a":"Chicago, Illinois"},
{"a":"Nashville, Tennessee"},
{"f": "Austin Texas"}
],
"ind1":" ",
"ind2":"1"
}
Rule:
"265": [
{
"target": "publication.place",
"subfield": ["a","f"],
"description": "Place of publication",
"subFieldDelimiter": [
{
"value": " & ",
"subfields": ["a","f"]
}
],
"rules": []
}
]
An outcome Instance looks like this in Json:
Instance:
{
"publication":[
{
"place":"Chicago, Illinois & Nashville, Tennessee & Austin Texas"
}
]
}
If there is a need to concat special subfields to processed one, its needed to add to the rule subfieldsToConcat
array
and list subfields names to be concatenated and use concat_subfields_by_name
normalization function:
MARC Record:
"024": {
"subfields":[
{"a":"Chicago, Illinois"},
{"c":"Nashville, Tennessee"},
{"f": "Austin Texas"}
],
"ind1":"2",
"ind2":"1"
}
Rule:
"024": [{
"target": "identifiers.value",
"description": "Invalid UPC",
"subfield": ["a"],
"rules": [{
"conditions": [{
"type": "concat_subfields_by_name",
"parameter": {
"subfieldsToConcat": [
"f"
]
}
}]
}]
}]
An outcome Instance looks like this in Json:
Instance:
{
"identifiers":[
{
"value":"Chicago, Illinois Austin Texas"
}
]
}
In case you have to stop the concatenation before some subfields, its needed to add the parameter subfieldsToStopConcat
and a list of subfields before which the concatenation should stop:
MARC Record:
"020": {
"subfields":[
{"a":"a9780471622673"},
{"c":"(acid-free paper)"},
{"z": "z0471622672"},
{"q":"(acid-free paper)"}
],
"ind1":" ",
"ind2":" "
}
Rule:
"020": [{
"target": "identifiers.value",
"description": "Valid ISBN",
"subfield": ["a"],
"rules": [{
"conditions": [{
"type": "concat_subfields_by_name, remove_ending_punc, trim",
"parameter": {
"subfieldsToConcat": [
"c", "q"
],
"subfieldsToStopConcat": [
"z"
]
}
}]
}]
}]
An outcome Instance looks like this in Json:
Instance:
{
"identifiers":[
{
"value":"a9780471622673 (acid-free paper)"
}
]
}
Sometimes the existence of a MARC subfield will dictate whether or not a target field is presented in Inventory. We use requiredSubfield
to define the required subfield needed to trigger the appearance of a target field. In this example, the presence of an 020 subfield "z" in a MARC record is needed in order for the target field, “Invalid ISBN” to appear in the Inventory record.
MARC Record:
{
"020":{
"subfields":[
{
"z":"9780190494889"
},
{
"q":"hardcover ;"
},
{
"c":"alkaline paper"
}
],
"ind1":" ",
"ind2":" "
}
}
Rule:
"020": [
{
"entity": [
{
"target": "identifiers.value",
"description": "Invalid ISBN",
"subfield": ["z","q","c"],
"requiredSubfield": ["z"],
"rules": []
}
]
}
]
- If there is no "z" in record sub-fields then target field does not appear at all.
- If there is "z" among record sub-fields then target field gets filled by all the
["z","q","c"].
Sometimes the existence of a MARC subfield will dictate whether or not a target field is presented in Inventory. We use exclusiveSubfield
to define the exclusive subfield needed to NOT trigger the appearance of a target field. In this example, the presence of an 100 subfield "t" in a MARC record is needed in order for the target field, “personalName” to not appear in the Inventory record, even if other subfields are present.
MARC Record:
{
"200":{
"subfields":[
{
"a":"9780190494889"
},
{
"b":"hardcover ;"
},
{
"t":"alkaline paper"
}
],
"ind1":" ",
"ind2":" "
}
}
Rule:
"100": [
{
"entity": [
{
"target": "personalName",
"subfield": ["a","b"],
"exclusiveSubfield": ["t"],
"rules": []
},
{
"target": "personalNameTitle",
"subfield": ["t"],
"rules": []
}
]
}
]
- If there is "t" among record sub-fields then target "personalName" field does not appear at all.
- If there is no "t" in record sub-fields then target field gets filled by all the
["a","b"].
There are "fieldReplacement" properties. They need for changing field number for specific fields, based on field replacement rules. For example, if 880 field need to be changed on 711 based on some condition, "fieldReplacement" properties can help with it. Example:
Rule:
"880": [
{
"target": "",
"description": "Field for target post-processing based on first 3 digits in subfield 6",
"subfield": [
"6"
],
"fieldReplacementBy3Digits" : true,
"fieldReplacementRule" : [
{
"sourceDigits": "100",
"targetField": "700"
},
{
"sourceDigits": "110",
"targetField": "710"
},
{
"sourceDigits": "111",
"targetField": "711"
},
{
"sourceDigits": "245",
"targetField": "246"
}
]
}
]
fieldReplacementBy3Digits
property indicates that "fieldReplacement"-logic should be applied here. (There can be added and implemented other properties, not only ...3Digits
)
fieldReplacementRule
property contains some additional data for "fieldReplacement"-logic.
If there is a need to use matches for indicators of record fields and rules, its needed to add an object indicators
to a rule
which process a record field:
Indicators object to process:
"indicators": {
"ind1": "1",
"ind2": "*"
}
MARC Record:
"024": {
"subfields":[
{"a":"Chicago, Illinois"},
{"c":"Nashville, Tennessee"},
{"f": "Austin Texas"}
],
"ind1":"2",
"ind2":"1"
}
Rule:
"024": [{
"indicators": {
"ind1": "2",
"ind2": "*"
},
{
"target": "identifiers.value",
"description": "Invalid UPC",
"subfield": ["a"],
"rules": [{
"conditions": [{
"type": "concat_subfields_by_name",
"parameter": {
"subfieldsToConcat": [
"f"
]
}
}]
}]
}}]
An outcome Instance looks like this in Json:
Instance:
{
"identifiers":[
{
"value":"Chicago, Illinois Austin Texas"
}
]
}
In our case, indicator value of record field "ind1" matched with indicator "ind1" value of the rules. Indicator value of record field "ind2" is "1" but it also matched because the rule indicator "ind2" value is "*" - wildcard indicator, so any record field indicator value matches to it. In case of such match, the rule will be proceeded.
This mechanism executes in Processor, in data-import-processing-core, while parsing these rules.
If there is a need to use an alternative target and subfield if the result of the running function is empty, there can be used "alternativeMapping" rule: which process a record field:
Alternative rules:
"alternativeMapping": {
"target": "contributors.contributorTypeText",
"subfield": [
"e"
],
"ignoreSubsequentSubfields": true
},
"applyRulesOnConcatenatedData": true
},
For the function "set_contributor_type_id_by_code_or_name" there should be set additional parameters: "contributorCodeSubfield" and "contributorNameSubfield" for searching from subfield "4" as code for Contributors and from subfield "e" as name for Contributors. Also, there was added a brand new function called "trim_punctuation". It will provide the same functionality as the "trim"+"trim_period" functions. Moreover, it provides additional mechanism for the punctuations: If ending punctuation of the last mapped subfield of the field is a period or comma, then it will be removed, EXCEPT 1.If the last mapped text ends with a single letter and then a period (e.g. Brown, Sterling K.), then ending period will not removed. 2.If the last mapped text ends with a period, followed by a comma, (e.g. Brown, Sterling K,.), then the comma will be removed, but the period will leave. 3.If the last mapped text ends with a hyphen (e.g. Kaluuya, Daniel, 1989-), then the hyphen will not removed.
Rule:
"710": [
{
"entity": [
{
"target": "contributors.authorityId",
"description": "Authority ID that controlling the contributor",
"applyRulesOnConcatenatedData": true,
"subfield": [
"9"
]
},
{
"target": "contributors.contributorNameTypeId",
"description": "Type for Corporate Name",
"applyRulesOnConcatenatedData": true,
"subfield": [
],
"rules": [
{
"conditions": [
{
"type": "set_contributor_name_type_id",
"parameter": {
"name": "Corporate name"
}
}
]
}
]
},
{
"rules": [
{
"conditions": [
{
"type": "set_contributor_type_id_by_code_or_name",
"parameter": {
"contributorCodeSubfield": "4",
"contributorNameSubfield": "e"
}
}
]
}
],
"target": "contributors.contributorTypeId",
"subfield": [
"4",
"e"
],
"description": "Type of contributor",
"alternativeMapping": {
"target": "contributors.contributorTypeText",
"subfield": [
"e"
],
"ignoreSubsequentSubfields": true
},
"applyRulesOnConcatenatedData": true
},
{
"target": "contributors.primary",
"description": "Primary contributor",
"applyRulesOnConcatenatedData": true,
"subfield": [
],
"rules": [
{
"conditions": [],
"value": "false"
}
]
},
{
"target": "contributors.name",
"description": "Corporate Name",
"subfield": [
"a",
"b",
"c",
"d",
"f",
"g",
"k",
"l",
"n",
"p",
"t",
"u"
],
"applyRulesOnConcatenatedData": true,
"rules": [
{
"conditions": [
{
"type": "trim_punctuation",
}
]
}
]
}
]
}
],
NOTE: Regarding ending punctuation - if the mapped text ends with (".", ",", ";") then it will be(the last symbol) removed for the matching with Contributor Type.
If there is a need to map not arrays or string but json object with simple fields inside (strings), there can be used "createSingleObject" rule, which will create a single json object with specified fields:
Building Dates json object:
{
"target": "dates.dateTypeId",
"description": "Date type ID",
"subfield": [],
"createSingleObject": true,
"rules": [
{
"conditions": [
{
"type": "set_date_type_id"
}
]
}
]
},
{
"target": "dates.date1",
"description": "Date 1",
"subfield": [],
"createSingleObject": true,
"rules": [
{
"conditions": [
{
"type": "char_select",
"parameter": {
"from": 7,
"to": 11
}
}
]
}
]
},
{
"target": "dates.date2",
"description": "Date 2",
"subfield": [],
"createSingleObject": true,
"rules": [
{
"conditions": [
{
"type": "char_select",
"parameter": {
"from": 11,
"to": 15
}
}
]
}
]
}
When the source-record-manager starts up, it performs initialization for default mapping rules for given tenant. There are 3 REST methods to work with rules.
Method | URL | Content type | Description |
---|---|---|---|
GET | /mapping-rules/{recordType} | application/json | Get rules for given tenant by record type |
PUT | /mapping-rules/{recordType} | application/json | Update rules for given tenant |
PUT | /mapping-rules/{recordType}/restore | application/json | Restore rules to default |
An endpoint support recordType path parameter with 3 possible values:
- marc-bib
- marc-holdings
- marc-authority (only GET)
Before working with API make sure you have an HTTP token that is required for sending requests. If you have already logged in the system using UI, just copy token from Apps/Settings/Developer/Set token/Authentication token
field.
Also you can log in the system using CLI tools, response returns x-okapi-token
header with HTTP token.
Login request:
curl -w '\n' -X POST \
--header "Content-Type: application/json" \
--header "x-okapi-tenant: {tenant}" \
--data @credentials.json \
https://folio-snapshot-load-okapi.dev.folio.org/bl-users/login
credentials.json:
{
"username": "{username}",
"password": "{password}"
}
To get rules you can send a request using GET method
curl -w '\n' -X GET \
--header "Content-type: application/json" \
--header "x-okapi-tenant: {tenant}" \
--header "x-okapi-token: {token}" \
https://folio-snapshot-load-okapi.dev.folio.org/mapping-rules
A response returns existing rules:
{
"001": [
{
"rules": [],
"target": "hrid",
"subfield": [],
"description": "The human readable ID"
}
],
"008": [
{
"rules":
...
}
If you would like to update rules just get existing rules using GET method and combine it with your updates, use PUT method:
curl -w '\n' -X PUT \
--header "Content-type: application/json" \
--header "Accept: text/plain, application/json" \
--header "x-okapi-tenant: {tenant}" \
--header "x-okapi-token: {token}" \
--data @rules.json \
https://folio-snapshot-load-okapi.dev.folio.org/mapping-rules
rules.json with updated list of subfields for 001 :
{
"001": [
{
"rules": [],
"target": "hrid",
"subfield": ["a", "b", "c"],
"description": "The human readable ID"
}
],
"008": [
{
"rules":
...
}
A response returns updated rules, content should be the same you sent in body of the PUT request:
{
"001": [
{
"rules": [],
"target": "hrid",
"subfield": ["a", "b", "c"],
"description": "The human readable ID"
}
],
"008": [
{
"rules":
...
}
Before sending an updates, please, make sure your file has valid JSON format, otherwise the system returns response with 400 error code (Bad Request). To validate JSON file there are online free tools: Json Formatter.
To revert the current state of rules to default, as it was at the system startup, use PUT method with 'restore' suffix:
curl -w '\n' -X PUT \
--header "Content-type: application/json" \
--header "Accept: text/plain, application/json" \
--header "x-okapi-tenant: {tenant}" \
--header "x-okapi-token: {token}" \
https://folio-snapshot-load-okapi.dev.folio.org/mapping-rules/restore
A response returns rules in default state:
{
"001": [
{
"rules": [],
"target": "hrid",
"subfield": [],
"description": "The human readable ID"
}
],
"008": [
{
"rules":
...
}