Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
Theodor Tolstoy committed Oct 27, 2021
2 parents 6407893 + af8bfce commit 5c60e49
Show file tree
Hide file tree
Showing 42 changed files with 2,388 additions and 1,990 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,6 @@ jobs:
- name: Make sure the code can run
run: |
python main_bibs.py -h
python main_holdings.py -h
python main_holdings_marc.py -h
python main_holdings_csv.py -h
python main_items.py -h
26 changes: 19 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
![example workflow](https://github.com/FOLIO-FSE/MARC21-To-FOLIO/actions/workflows/python-app.yml/badge.svg)
A set of Python3 script transforming MARC21 and Items in delimited files to FOLIO inventory objects.

The scripts requires a FOLIO tenant with reference data set. The script will throw messages telling what reference data is missing.
The scripts requires a FOLIO tenant with reference data properly set up. The script will throw messages telling what reference data is missing.

When the files have been created, post them to FOLIO using the [service_tools](https://github.com/FOLIO-FSE/service_tools) set of programs.
When the files have been created, post them to FOLIO using the [service_tools](https://github.com/FOLIO-FSE/service_tools) set of programs. Preferably BatchPoster

## Relevant FOLIO community documentation
* [Instance Metadata Elements](https://docs.google.com/spreadsheets/d/1RCZyXUA5rK47wZqfFPbiRM0xnw8WnMCcmlttT7B3VlI/edit#gid=952741439)
Expand All @@ -17,7 +17,7 @@ When the files have been created, post them to FOLIO using the [service_tools](h
* [MARC Mappings Information](https://wiki.folio.org/display/FOLIOtips/MARC+Mappings+Information)

# FOLIO Inventory data migration process
This template plays a vital part in a process together with other repos allowing you to perform bibliographic data migration from a legacy ILS into FOLIO. For more information on the process, head over to the linked repos below.
[This template repository](https://github.com/FOLIO-FSE/migration_repo_template) plays a vital part in a process together with other repos allowing you to perform bibliographic data migration from a legacy ILS into FOLIO. For more information on the process, head over to the linked repos below.
In order to perform migrations according to this process, you need to clone the following repositories:
* [MARC21-to-FOLIO](https://github.com/FOLIO-FSE/MARC21-To-FOLIO)
* [service_tools](https://github.com/FOLIO-FSE/service_tools)
Expand All @@ -33,26 +33,38 @@ The scripts also relies on a folder with a set of mapping files. There is a [tem
MARC mapping for Bib level records is based on the mapping-rules residing in a FOLIO tenant.
Read more on this in the Readme in the [Source record manager Module repo](https://github.com/folio-org/mod-source-record-manager/blob/25283ebabf402b5870ae4b3846285230e785c17d/RuleProcessorApi.md).

The trigger for this process it the main_bibs.py. In order to see what parameters are needed, just do pipenv run python main_bibs.py -h

![image](https://user-images.githubusercontent.com/1894384/137994473-10fea92f-1966-41d5-bd41-d6be00594b58.png)
In the picture above, you can se the files needed and the files created as part of the proces.

### MFHD-to-Inventory
#### Mapping rules
This processing does not store the MARC records anywhere since this is not available in FOLIO yet. Only FOLIO Holdings records are created.
This processing does not store the MARC records anywhere since this is not available in FOLIO yet (Planned for the Kiwi release). Only FOLIO Holdings records are created.
MFHD-to-Inventory mapping also relies on mapping based on a similar JSON structure. This is not stored in the tenant and must be maintained by you. A template/example is available in [migration_repo_template](https://github.com/FOLIO-FSE/migration_repo_template)

If you do not have MFHD records available, you can build a mapping file [this web tool](https://data-mapping-file-creator.folio.ebsco.com/data_mapping_creation) from the Item data. This will generate Holdings records to connect to the items.
There are two scripts, depending on what source data you have: main_holdings_csv.py and main_holdings_marc.py

![image](https://user-images.githubusercontent.com/1894384/137994847-f27f5e09-329e-4f75-a9fd-a83423d73068.png)


#### Location mapping
For holdings mapping, you also need to map legacy locations to FOLIO locations. An example map file is available at [migration_repo_template](https://github.com/FOLIO-FSE/migration_repo_template)

## Items-to-Inventory
Items-to-Inventory mapping is based on a json structure where the CSV headers are matched against the target fields in the FOLIO items. To create a mapping file, use the [web tool](https://data-mapping-file-creator.folio.ebsco.com/data_mapping_creation).

![image](https://user-images.githubusercontent.com/1894384/137995011-dd6a78a7-61d7-46d8-a35c-363f65c33ce0.png)


# Tests
There is a test suite for Bibs-to-Instance mapping.
## Running the tests for the Rules mapper

* Install the packages in the Pipfile
* Make a copy of the test_config.json.template and add the necessary credentials.
* Run ```pipenv run python3 -m unittest test_rules_mapper.TestRulesMapper```
* Run ```pipenv run python3 pytest```

Since you need to point your test towards a FOLIO tenant the Test suit is somwehat unstable. But it is still very useful for ironing out complex mapping issues.

# Running the scripts
For information on what files are needed and produced by the tools, refer to the documentation and example files in the [template repository](https://github.com/FOLIO-FSE/migration_repo_template).
Expand Down
85 changes: 44 additions & 41 deletions main_bibs.py
Original file line number Diff line number Diff line change
@@ -1,33 +1,36 @@
'''Main "script."'''
import json
import logging
import os
import sys
import time
from datetime import datetime as dt
from os import listdir
from os.path import dirname, isfile
from os.path import isfile

import requests
from argparse_prompt import PromptParser
from folioclient.FolioClient import FolioClient
from pymarc import MARCReader
from pymarc.record import Record
import requests

from marc_to_folio import main_base
from marc_to_folio.bibs_processor import BibsProcessor
from marc_to_folio.custom_exceptions import (
from migration_tools import main_base
from migration_tools.colors import Bcolors
from migration_tools.custom_exceptions import (
TransformationProcessError,
TransformationRecordFailedError,
)
from marc_to_folio.folder_structure import FolderStructure
from marc_to_folio.rules_mapper_bibs import BibsRulesMapper
from migration_tools.folder_structure import FolderStructure
from migration_tools.helper import Helper
from migration_tools.marc_rules_transformation.bibs_processor import BibsProcessor
from migration_tools.marc_rules_transformation.rules_mapper_bibs import BibsRulesMapper


class Worker(main_base.MainBase):
"""Class that is responsible for the actual work"""

def __init__(self, folio_client, folder_structure: FolderStructure, args):
# msu special case
super().__init__()
self.args = args
self.folder_structure = folder_structure
self.files = [
Expand All @@ -36,8 +39,9 @@ def __init__(self, folio_client, folder_structure: FolderStructure, args):
if isfile(os.path.join(folder_structure.legacy_records_folder, f))
]
self.folio_client = folio_client
logging.info(f"# of files to process: {len(self.files)}")
logging.info(json.dumps(self.files, sort_keys=True, indent=4))
logging.info("# of files to process: %s", len(self.files))
for file_path in self.files:
logging.info("\t%s", file_path)
self.mapper = BibsRulesMapper(self.folio_client, args)
self.processor = None
self.bib_ids = set()
Expand Down Expand Up @@ -69,41 +73,40 @@ def work(self):
else:
logging.info("FORCE UTF-8 is set to FALSE")
reader.force_utf8 = False
logging.info(f"running {file_name}")
self.read_records(reader)
logging.info("running %s", file_name)
self.read_records(reader, file_name)
except TransformationProcessError as tpe:
logging.critical(tpe)
exit()
except Exception:
logging.exception(file_name, stack_info=True)
# wrap up
self.wrap_up()

def read_records(self, reader):
def read_records(self, reader, file_name):
for idx, record in enumerate(reader):
self.mapper.add_stats(self.mapper.stats, "Records in file before parsing")
self.mapper.migration_report.add_general_statistics(
"Records in file before parsing"
)
try:
if record is None:
self.mapper.add_to_migration_report(
"Bib records that failed to parse",
f"{reader.current_exception} {reader.current_chunk}",
)
self.mapper.add_stats(
self.mapper.stats,
self.mapper.migration_report.add_general_statistics(
"Records with encoding errors - parsing failed",
)
raise TransformationRecordFailedError(
f"Index in file:{idx}",
f"Index in {file_name}:{idx}",
f"MARC parsing error: {reader.current_exception}",
reader.current_chunk,
)
else:
self.set_leader(record)
self.mapper.add_stats(
self.mapper.stats,
self.mapper.migration_report.add_general_statistics(
"Records successfully parsed from MARC21",
)
self.processor.process_record(idx, record, False)
except TransformationRecordFailedError as error:
logging.error(error)
logging.info(f"Done reading {idx} records from file")
error.log_it()
logging.info("Done reading %s records from file", idx + 1)

@staticmethod
def set_leader(marc_record: Record):
Expand All @@ -114,20 +117,19 @@ def wrap_up(self):
logging.info("Done. Wrapping up...")
self.processor.wrap_up()
with open(self.folder_structure.migration_reports_file, "w+") as report_file:
report_file.write(f"# Bibliographic records transformation results \n")
report_file.write("# Bibliographic records transformation results \n")
report_file.write(f"Time Run: {dt.isoformat(dt.utcnow())} \n")
report_file.write(f"## Bibliographic records transformation counters \n")
self.mapper.print_dict_to_md_table(
self.mapper.stats,
Helper.write_migration_report(report_file, self.mapper.migration_report)
Helper.print_mapping_report(
report_file,
"Measure",
"Count",
self.mapper.parsed_records,
self.mapper.mapped_folio_fields,
self.mapper.mapped_legacy_fields,
)
self.mapper.write_migration_report(report_file)
self.mapper.print_mapping_report(report_file)

logging.info(
f"Done. Transformation report written to {self.folder_structure.migration_reports_file}"
"Done. Transformation report written to %s",
self.folder_structure.migration_reports_file.name,
)


Expand Down Expand Up @@ -181,7 +183,8 @@ def parse_args():
"-utf8",
help=(
"forcing UTF8 when parsing marc records. If you get a lot of encoding issues, test "
"changing this setting to False"
"changing this setting to False \n"
f"\n{Bcolors.WARNING}WARNING!{Bcolors.ENDC}\nEven though setting this to False might make your migrations run smoother, it might lead to data loss in individual fields"
),
default="True",
)
Expand All @@ -207,10 +210,10 @@ def main():
Worker.setup_logging(folder_structure)
folder_structure.log_folder_structure()

logging.info(f"Okapi URL:\t{args.okapi_url}")
logging.info(f"Tenant Id:\t{args.tenant_id}")
logging.info(f"Username: \t{args.username}")
logging.info(f"Password: \tSecret")
logging.info("Okapi URL:\t%s", args.okapi_url)
logging.info("Tenant Id:\t%s", args.tenant_id)
logging.info("Username: \t%s", args.username)
logging.info("Password: \tSecret")
try:
folio_client = FolioClient(
args.okapi_url, args.tenant_id, args.username, args.password
Expand All @@ -219,7 +222,7 @@ def main():
logging.critical(
"SSL error. Check your VPN or Internet connection. Exiting"
)
exit()
sys.exit()
# Initiate Worker
worker = Worker(folio_client, folder_structure, args)
worker.work()
Expand All @@ -228,7 +231,7 @@ def main():
except TransformationProcessError as process_error:
logging.critical(process_error)
logging.critical("Halting...")
exit()
sys.exit()


if __name__ == "__main__":
Expand Down
Loading

0 comments on commit 5c60e49

Please sign in to comment.