Merge branch 'develop'

FOLIO-FSE · Oct 27, 2021 · 5c60e49 · 5c60e49
2 parents 6407893 + af8bfce
commit 5c60e49
Show file tree

Hide file tree

Showing 42 changed files with 2,388 additions and 1,990 deletions.
diff --git a/.github/workflows/python-app.yml b/.github/workflows/python-app.yml
@@ -33,6 +33,6 @@ jobs:
     - name: Make sure the code can run
       run: |
         python main_bibs.py -h
-        python main_holdings.py -h
+        python main_holdings_marc.py -h
         python main_holdings_csv.py -h
         python main_items.py -h
diff --git a/README.md b/README.md
@@ -2,9 +2,9 @@
 ![example workflow](https://github.com/FOLIO-FSE/MARC21-To-FOLIO/actions/workflows/python-app.yml/badge.svg)    
 A  set of Python3 script transforming MARC21 and Items in delimited files to FOLIO inventory objects.
 
-The scripts requires a FOLIO tenant with reference data set. The script will throw messages telling what reference data is missing. 
+The scripts requires a FOLIO tenant with reference data properly set up. The script will throw messages telling what reference data is missing. 
 
-When the files have been created, post them to FOLIO using the [service_tools](https://github.com/FOLIO-FSE/service_tools) set of programs.
+When the files have been created, post them to FOLIO using the [service_tools](https://github.com/FOLIO-FSE/service_tools) set of programs. Preferably BatchPoster
 
 ## Relevant FOLIO community documentation
 * [Instance Metadata Elements](https://docs.google.com/spreadsheets/d/1RCZyXUA5rK47wZqfFPbiRM0xnw8WnMCcmlttT7B3VlI/edit#gid=952741439)
@@ -17,7 +17,7 @@ When the files have been created, post them to FOLIO using the [service_tools](h
 * [MARC Mappings Information](https://wiki.folio.org/display/FOLIOtips/MARC+Mappings+Information)
 
 # FOLIO Inventory data migration process
-This template plays a vital part in a process together with other repos allowing you to perform bibliographic data migration from a legacy ILS into FOLIO. For more information on the process, head over to the linked repos below.
+[This template repository](https://github.com/FOLIO-FSE/migration_repo_template) plays a vital part in a process together with other repos allowing you to perform bibliographic data migration from a legacy ILS into FOLIO. For more information on the process, head over to the linked repos below.
 In order to perform migrations according to this process, you need to clone the following repositories:   
 * [MARC21-to-FOLIO](https://github.com/FOLIO-FSE/MARC21-To-FOLIO)
 * [service_tools](https://github.com/FOLIO-FSE/service_tools)
@@ -33,26 +33,38 @@ The scripts also relies on a folder with a set of mapping files. There is a [tem
 MARC mapping for Bib level records is based on the mapping-rules residing in a FOLIO tenant.
 Read more on this in the Readme in the [Source record manager Module repo](https://github.com/folio-org/mod-source-record-manager/blob/25283ebabf402b5870ae4b3846285230e785c17d/RuleProcessorApi.md).
 
+The trigger for this process it the main_bibs.py. In order to see what parameters are needed, just do pipenv run python main_bibs.py -h
+
+![image](https://user-images.githubusercontent.com/1894384/137994473-10fea92f-1966-41d5-bd41-d6be00594b58.png)   
+In the picture above, you can se the files needed and the files created as part of the proces.
+
 ### MFHD-to-Inventory
 #### Mapping rules
-This processing does not store the MARC records anywhere since this is not available in FOLIO yet. Only FOLIO Holdings records are created.
+This processing does not store the MARC records anywhere since this is not available in FOLIO yet (Planned for the Kiwi release). Only FOLIO Holdings records are created.
 MFHD-to-Inventory mapping also relies on mapping based on a similar JSON structure. This is not stored in the tenant and must be maintained by you. A template/example is available in [migration_repo_template](https://github.com/FOLIO-FSE/migration_repo_template)
 
+If you do not have MFHD records available, you can build a mapping file [this web tool](https://data-mapping-file-creator.folio.ebsco.com/data_mapping_creation) from the Item data. This will generate Holdings records to connect to the items. 
+There are two scripts, depending on what source data you have: main_holdings_csv.py and main_holdings_marc.py
+
+![image](https://user-images.githubusercontent.com/1894384/137994847-f27f5e09-329e-4f75-a9fd-a83423d73068.png)
+
+
 #### Location mapping
 For holdings mapping, you also need to map legacy locations to FOLIO locations. An example map file is available at [migration_repo_template](https://github.com/FOLIO-FSE/migration_repo_template) 
 
 ## Items-to-Inventory
 Items-to-Inventory mapping is based on a json structure where the CSV headers are matched against the target fields in the FOLIO items. To create a mapping file, use the [web tool](https://data-mapping-file-creator.folio.ebsco.com/data_mapping_creation).
 
+![image](https://user-images.githubusercontent.com/1894384/137995011-dd6a78a7-61d7-46d8-a35c-363f65c33ce0.png)
+
+
 # Tests
 There is a test suite for Bibs-to-Instance mapping.
 ## Running the tests for the Rules mapper
 
 * Install the packages in the Pipfile
-* Make a copy of the test_config.json.template and add the necessary credentials.
-* Run ```pipenv run python3 -m unittest test_rules_mapper.TestRulesMapper```
+* Run ```pipenv run python3 pytest```
 
-Since you need to point your test towards a FOLIO tenant the Test suit is somwehat unstable. But it is still very useful for ironing out complex mapping issues.
 
 # Running the scripts
 For information on what files are needed and produced by the tools, refer to the documentation and example files in the [template repository](https://github.com/FOLIO-FSE/migration_repo_template).

diff --git a/main_bibs.py b/main_bibs.py
@@ -1,33 +1,36 @@
 '''Main "script."'''
-import json
 import logging
 import os
+import sys
 import time
 from datetime import datetime as dt
 from os import listdir
-from os.path import dirname, isfile
+from os.path import isfile
 
+import requests
 from argparse_prompt import PromptParser
 from folioclient.FolioClient import FolioClient
 from pymarc import MARCReader
 from pymarc.record import Record
-import requests
 
-from marc_to_folio import main_base
-from marc_to_folio.bibs_processor import BibsProcessor
-from marc_to_folio.custom_exceptions import (
+from migration_tools import main_base
+from migration_tools.colors import Bcolors
+from migration_tools.custom_exceptions import (
     TransformationProcessError,
     TransformationRecordFailedError,
 )
-from marc_to_folio.folder_structure import FolderStructure
-from marc_to_folio.rules_mapper_bibs import BibsRulesMapper
+from migration_tools.folder_structure import FolderStructure
+from migration_tools.helper import Helper
+from migration_tools.marc_rules_transformation.bibs_processor import BibsProcessor
+from migration_tools.marc_rules_transformation.rules_mapper_bibs import BibsRulesMapper
 
 
 class Worker(main_base.MainBase):
     """Class that is responsible for the actual work"""
 
     def __init__(self, folio_client, folder_structure: FolderStructure, args):
         # msu special case
+        super().__init__()
         self.args = args
         self.folder_structure = folder_structure
         self.files = [
@@ -36,8 +39,9 @@ def __init__(self, folio_client, folder_structure: FolderStructure, args):
             if isfile(os.path.join(folder_structure.legacy_records_folder, f))
         ]
         self.folio_client = folio_client
-        logging.info(f"# of files to process: {len(self.files)}")
-        logging.info(json.dumps(self.files, sort_keys=True, indent=4))
+        logging.info("# of files to process: %s", len(self.files))
+        for file_path in self.files:
+            logging.info("\t%s", file_path)
         self.mapper = BibsRulesMapper(self.folio_client, args)
         self.processor = None
         self.bib_ids = set()
@@ -69,41 +73,40 @@ def work(self):
                         else:
                             logging.info("FORCE UTF-8 is set to FALSE")
                             reader.force_utf8 = False
-                        logging.info(f"running {file_name}")
-                        self.read_records(reader)
+                        logging.info("running %s", file_name)
+                        self.read_records(reader, file_name)
+                except TransformationProcessError as tpe:
+                    logging.critical(tpe)
+                    exit()
                 except Exception:
                     logging.exception(file_name, stack_info=True)
             # wrap up
             self.wrap_up()
 
-    def read_records(self, reader):
+    def read_records(self, reader, file_name):
         for idx, record in enumerate(reader):
-            self.mapper.add_stats(self.mapper.stats, "Records in file before parsing")
+            self.mapper.migration_report.add_general_statistics(
+                "Records in file before parsing"
+            )
             try:
                 if record is None:
-                    self.mapper.add_to_migration_report(
-                        "Bib records that failed to parse",
-                        f"{reader.current_exception} {reader.current_chunk}",
-                    )
-                    self.mapper.add_stats(
-                        self.mapper.stats,
+                    self.mapper.migration_report.add_general_statistics(
                         "Records with encoding errors - parsing failed",
                     )
                     raise TransformationRecordFailedError(
-                        f"Index in file:{idx}",
+                        f"Index in {file_name}:{idx}",
                         f"MARC parsing error: {reader.current_exception}",
                         reader.current_chunk,
                     )
                 else:
                     self.set_leader(record)
-                    self.mapper.add_stats(
-                        self.mapper.stats,
+                    self.mapper.migration_report.add_general_statistics(
                         "Records successfully parsed from MARC21",
                     )
                     self.processor.process_record(idx, record, False)
             except TransformationRecordFailedError as error:
-                logging.error(error)
-        logging.info(f"Done reading {idx} records from file")
+                error.log_it()
+        logging.info("Done reading %s records from file", idx + 1)
 
     @staticmethod
     def set_leader(marc_record: Record):
@@ -114,20 +117,19 @@ def wrap_up(self):
         logging.info("Done. Wrapping up...")
         self.processor.wrap_up()
         with open(self.folder_structure.migration_reports_file, "w+") as report_file:
-            report_file.write(f"# Bibliographic records transformation results   \n")
+            report_file.write("# Bibliographic records transformation results   \n")
             report_file.write(f"Time Run: {dt.isoformat(dt.utcnow())}   \n")
-            report_file.write(f"## Bibliographic records transformation counters   \n")
-            self.mapper.print_dict_to_md_table(
-                self.mapper.stats,
+            Helper.write_migration_report(report_file, self.mapper.migration_report)
+            Helper.print_mapping_report(
                 report_file,
-                "Measure",
-                "Count",
+                self.mapper.parsed_records,
+                self.mapper.mapped_folio_fields,
+                self.mapper.mapped_legacy_fields,
             )
-            self.mapper.write_migration_report(report_file)
-            self.mapper.print_mapping_report(report_file)
 
         logging.info(
-            f"Done. Transformation report written to {self.folder_structure.migration_reports_file}"
+            "Done. Transformation report written to %s",
+            self.folder_structure.migration_reports_file.name,
         )
 
 
@@ -181,7 +183,8 @@ def parse_args():
         "-utf8",
         help=(
             "forcing UTF8 when parsing marc records. If you get a lot of encoding issues, test "
-            "changing this setting to False"
+            "changing this setting to False \n"
+            f"\n⚠ {Bcolors.WARNING}WARNING!{Bcolors.ENDC} ⚠ \nEven though setting this to False might make your migrations run smoother, it might lead to data loss in individual fields"
         ),
         default="True",
     )
@@ -207,10 +210,10 @@ def main():
         Worker.setup_logging(folder_structure)
         folder_structure.log_folder_structure()
 
-        logging.info(f"Okapi URL:\t{args.okapi_url}")
-        logging.info(f"Tenant Id:\t{args.tenant_id}")
-        logging.info(f"Username:   \t{args.username}")
-        logging.info(f"Password:   \tSecret")
+        logging.info("Okapi URL:\t%s", args.okapi_url)
+        logging.info("Tenant Id:\t%s", args.tenant_id)
+        logging.info("Username:   \t%s", args.username)
+        logging.info("Password:   \tSecret")
         try:
             folio_client = FolioClient(
                 args.okapi_url, args.tenant_id, args.username, args.password
@@ -219,7 +222,7 @@ def main():
             logging.critical(
                 "SSL error. Check your VPN or Internet connection. Exiting"
             )
-            exit()
+            sys.exit()
         # Initiate Worker
         worker = Worker(folio_client, folder_structure, args)
         worker.work()
@@ -228,7 +231,7 @@ def main():
     except TransformationProcessError as process_error:
         logging.critical(process_error)
         logging.critical("Halting...")
-        exit()
+        sys.exit()
 
 
 if __name__ == "__main__":