Skip to content

Commit

Permalink
Merge pull request #104 from CMSgov/detect-schema-version
Browse files Browse the repository at this point in the history
Detect schema version
  • Loading branch information
shaselton-usds authored Oct 6, 2023
2 parents a138e1d + b1d760d commit ac7a636
Show file tree
Hide file tree
Showing 13 changed files with 506 additions and 128 deletions.
50 changes: 26 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,10 @@ Options:
-h, --help display help for command
Commands:
validate [options] <data-file> <schema-version> Validate a file against a specific published version of a CMS schema.
from-url [options] <data-url> <schema-version> Validate the file retrieved from a URL against a specific published version of a CMS schema.
update Update the available schemas from the CMS repository.
help [command] display help for command
validate [options] <data-file> Validate a file against a specific published version of a CMS schema.
from-url [options] <data-url> Validate the file retrieved from a URL against a specific published version of a CMS schema.
update Update the available schemas from the CMS repository.
help [command] display help for command
```

### Update available schemas
Expand All @@ -92,16 +92,16 @@ Validating a file against one of the provided schemas is the primary usage of th
From the installed directory:

```
cms-mrf-validator validate <data-file> <schema-version> [-o out] [-t target]
cms-mrf-validator validate <data-file> [options]
```

Example usages:

```bash
# basic usage, printing output directly and using the default in-network-rates schema
cms-mrf-validator validate my-data.json v1.0.0
# output will be written to a file. validate using allowed-amounts schema
cms-mrf-validator validate my-data.json v1.0.0 -o results.txt -t allowed-amounts
# basic usage, printing output directly and using the default in-network-rates schema with the version specified in the file
cms-mrf-validator validate my-data.json
# output will be written to a file. validate using specific version of allowed-amounts schema
cms-mrf-validator validate my-data.json --schema-version v1.0.0 -o results.txt -t allowed-amounts
```

Further details:
Expand All @@ -110,14 +110,15 @@ Further details:
Validate a file against a specific published version of a CMS schema.
Arguments:
data-file path to data file to validate
schema-version version of schema to use for validation
data-file path to data file to validate
Options:
-o, --out <out> output path
-t, --target <schema> name of schema to use (choices: "allowed-amounts", "in-network-rates", "provider-reference", "table-of-contents", default: "in-network-rates")
-s, --strict enable strict checking, which prohibits additional properties in data file
-h, --help display help for command
--schema-version <version> version of schema to use for validation
-o, --out <out> output path
-t, --target <schema> name of schema to use (choices: "allowed-amounts", "in-network-rates", "provider-reference", "table-of-contents",
default: "in-network-rates")
-s, --strict enable strict checking, which prohibits additional properties in data file
-h, --help display help for command
```

The purpose of the `strict` option is to help detect when an optional attribute has been spelled incorrectly. Because additional properties are allowed by the schema, a misspelled optional attribute does not normally cause a validation failure.
Expand All @@ -127,7 +128,7 @@ The purpose of the `strict` option is to help detect when an optional attribute
It is also possible to specify a URL to the file to validate. From the installed directory:

```
cms-mrf-validator from-url <data-url> <schema-version> [-o out] [-t target]
cms-mrf-validator from-url <data-url> [options]
```

The only difference in arguments is that a URL should be provided instead of a path to a file. All options from the `validate` command still apply. The URL must return a file that is one of the following:
Expand All @@ -142,14 +143,15 @@ Further details:
Validate the file retrieved from a URL against a specific published version of a CMS schema.
Arguments:
data-url URL to data file to validate
schema-version version of schema to use for validation
data-url URL to data file to validate
Options:
-o, --out <out> output path
-t, --target <schema> name of schema to use (choices: "allowed-amounts", "in-network-rates", "provider-reference", "table-of-contents", default: "in-network-rates")
-s, --strict enable strict checking, which prohibits additional properties in data file
-h, --help display help for command
--schema-version <version> version of schema to use for validation
-o, --out <out> output path
-t, --target <schema> name of schema to use (choices: "allowed-amounts", "in-network-rates", "provider-reference", "table-of-contents",
default: "in-network-rates")
-s, --strict enable strict checking, which prohibits additional properties in data file
-h, --help display help for command
```

### Test file validation
Expand All @@ -161,7 +163,7 @@ Running the command from the root of the project:
#### Running a valid file:

```bash
cms-mrf-validator validate test-files/in-network-rates-fee-for-service-sample.json v1.0.0
cms-mrf-validator validate test-files/in-network-rates-fee-for-service-sample.json --schema-version v1.0.0
```

Output:
Expand All @@ -173,7 +175,7 @@ Input JSON is valid.
#### Running an invalid file:

```bash
cms-mrf-validator validate test-files/allowed-amounts-error.json v1.0.0 -t allowed-amounts
cms-mrf-validator validate test-files/allowed-amounts-error.json --schema-version v1.0.0 -t allowed-amounts
```

Output:
Expand Down
28 changes: 28 additions & 0 deletions package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@
"typescript": "^4.5.5"
},
"dependencies": {
"@streamparser/json": "^0.0.17",
"@streamparser/json-node": "^0.0.17",
"axios": "^1.2.1",
"chalk": "^3.0.0",
"commander": "^8.3.0",
Expand Down
121 changes: 115 additions & 6 deletions src/SchemaManager.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,16 @@ import util from 'util';
import { config } from './utils';
import temp from 'temp';
import { logger } from './logger';
import { JSONParser } from '@streamparser/json-node';

const VERSION_TIME_LIMIT = 3000; // three seconds
const BACKWARDS_BYTES = 1000; // read the last 1000 bytes during backwards search

export class SchemaManager {
private version: string;
private _version: string;
private storageDirectory: string;
public strict: boolean;
public shouldDetectVersion: boolean;

constructor(
private repoDirectory = config.SCHEMA_REPO_FOLDER,
Expand All @@ -19,14 +24,18 @@ export class SchemaManager {
this.storageDirectory = temp.mkdirSync('schemas');
}

public get version() {
return this._version;
}

async ensureRepo() {
if (!fs.existsSync(path.join(this.repoDirectory, '.git'))) {
return util.promisify(exec)(`git clone ${this.repoUrl} "${this.repoDirectory}"`);
}
}

async useVersion(version: string): Promise<boolean> {
if (this.version === version) {
if (this._version === version) {
return true;
}
const tagResult = await util.promisify(exec)(
Expand All @@ -38,25 +47,25 @@ export class SchemaManager {
.filter(tag => tag.length > 0);
if (tags.includes(version)) {
await util.promisify(exec)(`git -C "${this.repoDirectory}" checkout ${version}`);
this.version = version;
this._version = version;
return true;
} else {
// we didn't find your tag. maybe you mistyped it, so show the available ones.
logger.error(
throw new Error(
`Could not find a schema version named "${version}". Available versions are:\n${tags.join(
'\n'
)}`
);
return false;
}
}

async useSchema(schemaName: string): Promise<string> {
const schemaPath = path.join(
this.storageDirectory,
`${schemaName}-${this.version}-${this.strict ? 'strict' : 'loose'}.json`
`${schemaName}-${this._version}-${this.strict ? 'strict' : 'loose'}.json`
);
if (fs.existsSync(schemaPath)) {
logger.debug(`Using cached schema: ${schemaName} ${this._version}`);
return schemaPath;
}
const contentPath = path.join(this.repoDirectory, 'schemas', schemaName, `${schemaName}.json`);
Expand All @@ -73,6 +82,106 @@ export class SchemaManager {
fs.writeFileSync(schemaPath, schemaContents, { encoding: 'utf-8' });
return schemaPath;
}

async determineVersion(dataFile: string): Promise<string> {
return new Promise((resolve, reject) => {
logger.debug(`Detecting version for ${dataFile}`);
const parser = new JSONParser({ paths: ['$.version'], keepStack: false });
const dataStream = fs.createReadStream(dataFile);
let foundVersion = '';

let forwardReject: (reason?: any) => void;
const forwardSearch = new Promise<string>((resolve, reject) => {
forwardReject = reject;
parser.on('data', data => {
if (typeof data.value === 'string') {
foundVersion = data.value;
}
dataStream.unpipe();
dataStream.destroy();
parser.end();
});
parser.on('close', () => {
if (foundVersion) {
logger.debug(`Found version: ${foundVersion}`);
resolve(foundVersion);
} else {
reject('No version property available.');
}
});
parser.on('error', () => {
// an error gets thrown when closing the stream early, but that's not an actual problem.
// it'll get handled in the close event
if (!foundVersion) {
reject('Parse error when detecting version.');
}
});
dataStream.pipe(parser);
});

let backwardReject: (reason?: any) => void;
const backwardSearch = new Promise<string>((resolve, reject) => {
backwardReject = reject;
fs.promises
.open(dataFile, 'r')
.then(async fileHandle => {
try {
const stats = await fileHandle.stat();
const lastStuff = await fileHandle.read({
position: Math.max(0, stats.size - BACKWARDS_BYTES),
length: BACKWARDS_BYTES
});
if (lastStuff.bytesRead > 0) {
const lastText = lastStuff.buffer.toString('utf-8');
const versionRegex = /"version"\s*:\s*("(?:\\"|\\\\|[^"])*")/;
const versionMatch = lastText.match(versionRegex);
if (versionMatch) {
const foundVersion = JSON.parse(versionMatch[1]);
logger.debug(`Found version during backwards search: ${foundVersion}`);
resolve(foundVersion);
} else {
reject('No version found during backwards search');
}
} else {
reject('No bytes read during backwards search');
}
} finally {
fileHandle.close();
}
})
.catch(err => {
logger.debug(`Something went wrong during backwards search: ${err}`);
reject('Something went wrong during backwards search');
});
});

const timeLimit = setTimeout(() => {
logger.debug('Could not find version within time limit.');
if (forwardReject) {
forwardReject('Forward timeout cancellation');
}
if (backwardReject) {
backwardReject('Backward timeout cancellation');
}
reject('Could not find version within time limit.');
}, VERSION_TIME_LIMIT);

Promise.any([forwardSearch, backwardSearch])
.then(foundVersion => {
resolve(foundVersion);
})
.catch(() => {
reject();
})
.finally(() => {
logger.debug('Cleaning up from version search.');
clearTimeout(timeLimit);
dataStream.unpipe();
dataStream.destroy();
parser.end();
});
});
}
}

// note that this only sets additionalProperties to false at the top level, and at the first level of definitions.
Expand Down
Loading

0 comments on commit ac7a636

Please sign in to comment.