Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Async $text Operator for LLM-Integrated Data Generation with Ollama #42

Open
wants to merge 20 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
be68a2e
Converting internal calls to async to support HTTP calls
omkarkhair Jul 27, 2024
e438332
Added template.json to gitignore for local testing
omkarkhair Jul 27, 2024
babc1fc
Updated test to refactor based on tests
omkarkhair Jul 27, 2024
56e1f1e
refactored index.js and array, coordinates, integer operators to meet…
omkarkhair Jul 27, 2024
65d8516
commiting operators modified to meet general tests
omkarkhair Jul 27, 2024
bd9bd5b
Modified operator tests to support testing of the async implementation
omkarkhair Jul 28, 2024
fb99cb9
Fixed an inefficient mapping during async implementation
omkarkhair Jul 28, 2024
3755b42
added await on two tests that we missed earlier
omkarkhair Jul 28, 2024
1ee259a
Converted operators that require nested evaluations to async implemen…
omkarkhair Jul 28, 2024
924d37b
Converted timestamp operator to async
omkarkhair Jul 28, 2024
d7ad926
Updated templates tests with async call
omkarkhair Jul 28, 2024
af70aaf
Added support for $text operator with a default response
omkarkhair Jul 28, 2024
754927a
Added a .env to gitignore to support local testing with a .env file w…
omkarkhair Jul 28, 2024
49dd355
First prototype of a functioning generative text with $text operator
omkarkhair Jul 28, 2024
54e0a3e
removed unnecessary logs
omkarkhair Jul 28, 2024
dd9b6da
Updated readme for text operator
omkarkhair Jul 28, 2024
0bc28ff
Fixed formating for experimental section
omkarkhair Jul 28, 2024
4c2a1b6
Merge pull request #1 from omkarkhair/dev
omkarkhair Jul 28, 2024
a0e5041
Updated package json with axios dependency
omkarkhair Jul 28, 2024
425936f
Merge pull request #2 from omkarkhair/dev
omkarkhair Jul 28, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.DS_Store
node_modules/
npm-debug.log
template.json
.env
24 changes: 23 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# mgeneratejs [![travis][travis_img]][travis_url] [![npm][npm_img]][npm_url]
# mgenerativejs [![travis][travis_img]][travis_url] [![npm][npm_img]][npm_url]

_mgeneratejs_ generates structured, semi-random JSON data according to a
template object. It offers both a command line script and a JavaScript API.
Expand Down Expand Up @@ -152,6 +152,10 @@ mgeneratejs '{"ip_addresses": {"$array": {"of": "$ip", "number": {"$integer": {"
- [`$regex`](#regex): Returns a Regular Expression object.
- [`$timestamp`](#timestamp): Returns a MongoDB Timestamp.

#### (Experimental) Generative Data

- [`$text`](#text): Returns text generated based on a prompt (requires a [Ollama API](https://github.com/ollama/ollama) endpoint).

### All Built-in Operators in Alphabetical Order

### `$array`
Expand Down Expand Up @@ -589,6 +593,24 @@ _Options_
>
> Returns `{"expr":{"$regex":"^ab+c$","$options":"i"}}`.

### `$text`

**Experimental Feature** | Returns text generated based on the provided `prompt`. This operator has a dependency on an accessible Ollama endpoint. The endpoint and the model to generate the text should be configured in the environment variables `MGENERATIVEJS_OLLAMA_ENDPOINT` and `MGENERATIVE_OLLAMA_MODEL`. Check [Ollama documentation](https://github.com/ollama/ollama?tab=readme-ov-file#start-ollama) for instructions.

_Options_

- `prompt` (required) Topical prompt to generate text (example: Designation or job title found in Tech).
- `minWordCount` (optional) Minimum word count for the model to generate text. This constrain will be included in the prompt and relies on the model to respect this constrain.
- `maxWordCount` (optional) Maximum word count for the model to generate text. This limit will be included in the prompt and relies on the model to respect this limit.

> **Example**
>
> ```
> {"jobTitle": {"$text": {"prompt": "Designation or job title found in Tech"}}}
> ```
>
> Returns `{"jobTitle": "DevOps Engineer"}`.

### `$timestamp`

Returns a MongoDB Timestamp object.
Expand Down
7 changes: 5 additions & 2 deletions bin/mgenerate.js
Original file line number Diff line number Diff line change
Expand Up @@ -87,8 +87,11 @@ function generate() {
if (count >= argv.number) {
return this.emit('end');
}
this.emit('data', mgenerate(template));
callback();
let _self = this;
mgenerate(template).then(function(doc) {
_self.emit('data', doc);
callback();
});
})
.pipe(stringifyStream)
.pipe(process.stdout);
Expand Down
128 changes: 83 additions & 45 deletions lib/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ var evalValue;
* value, false otherwise.
* @return {Any} return the value after evaluation
*/
var evalTemplate = function(template, isValue) {
return isValue ? evalValue(template) : evalObject(template);
var evalTemplate = async function(template, isValue) {
return isValue ? await evalValue(template) : await evalObject(template);
};

/**
Expand All @@ -38,7 +38,7 @@ var evalTemplate = function(template, isValue) {
* @param {object} opts options passed to the operator
* @return {any} return value after operator is resolved
*/
var callOperator = function(op, opts) {
var callOperator = async function(op, opts) {
opts = opts || {};
// exception for $missing values, handled in evalObject()
if (op === 'missing') {
Expand All @@ -50,11 +50,11 @@ var callOperator = function(op, opts) {
}
// known built-in operator, call with `evalTemplate` function and options
if (_.includes(opNames, op)) {
return operators[op](evalTemplate, opts);
return await operators[op](evalTemplate, opts);
}
// not a known operator, try chance.js
try {
return chance[op](evalObject(opts));
return chance[op](await evalObject(opts));
} catch (e) {
throw new Error('unknown operator: $' + op + '\nMessage: ' + e);
}
Expand All @@ -68,8 +68,30 @@ var callOperator = function(op, opts) {
* @param {object} template template to evaluate
* @return {any} return value after template is evaluated
*/
evalObject = function(template) {
var result = _.mapValues(template, evalValue);

// TODO: Move asyncMapValues into its own module
async function asyncMapValues(obj, asyncFn) {
const entries = Object.entries(obj);
const mappedEntries = await Promise.all(
entries.map(async ([key, value]) => {
const newValue = await asyncFn(value);
return [key, newValue];
})
);
return Object.fromEntries(mappedEntries);
}

// TODO: Move asyncMap into its own module
async function asyncMap(array, asyncFn) {
// Map over the array to create an array of promises
const promises = array.map(asyncFn);
// Wait for all the promises to resolve and return the results
return Promise.all(promises);
}

evalObject = async function(template) {
//var result = await new Promise(_.mapValues(template, evalValue));
var result = await asyncMapValues(template, evalValue);
result = _.omitBy(result, function(val) {
return val === '$missing';
});
Expand All @@ -84,48 +106,64 @@ evalObject = function(template) {
* @param {object} template template to evaluate
* @return {any} return value after template is evaluated
*/
evalValue = function(template) {
if (_.isString(template)) {
if (_.startsWith(template, '$')) {
return callOperator(template.slice(1));
evalValue = async function(template) {
let promise = new Promise((resolve, reject) => {
if (_.isString(template)) {
if (_.startsWith(template, '$')) {
callOperator(template.slice(1)).then(function(t) {
resolve(t);
});
return;
}
// check if the string can be interpreted as mustache template
if (_.includes(template, '{{')) {
var compiled = _.template(template, {
imports: {
chance: chance,
faker: faker
}
});
return resolve(compiled());
}
// string constant
return resolve(template);
}
// check if the string can be interpreted as mustache template
if (_.includes(template, '{{')) {
var compiled = _.template(template, {
imports: {
chance: chance,
faker: faker
}
if (_.isPlainObject(template)) {
// check if this is an object-style operator
var objKeys = _.keys(template);
var op = objKeys[0];
if (_.startsWith(op, '$')) {
op = op.slice(1);
assert.equal(
objKeys.length,
1,
'operator object cannot have more than one key.'
);
var options = _.values(template)[0];
callOperator(op, options).then(function(t) {
resolve(t);
});
return;
}
evalObject(template).then(function(t) {
resolve(t);
});
return compiled();
return;
}
// string constant
return template;
}
if (_.isPlainObject(template)) {
// check if this is an object-style operator
var objKeys = _.keys(template);
var op = objKeys[0];
if (_.startsWith(op, '$')) {
op = op.slice(1);
assert.equal(
objKeys.length,
1,
'operator object cannot have more than one key.'
);
var options = _.values(template)[0];
return callOperator(op, options);
// handle arrays recursively, skip $missing values
if (_.isArray(template)) {
asyncMap(template, evalValue).then(result => {
let filtered = _.filter(result, function(v) {
return v !== '$missing';
});
resolve(filtered);
});
return;
}
return evalObject(template);
}
// handle arrays recursively, skip $missing values
if (_.isArray(template)) {
return _.filter(_.map(template, evalValue), function(v) {
return v !== '$missing';
});
}
// don't know how to evalute, leave alone
return template;
// don't know how to evalute, leave alone
resolve(template);
});
return promise;
};

module.exports = evalObject;
6 changes: 3 additions & 3 deletions lib/operators/array.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ var _ = require('lodash');
* @param {Object} options options to configure the array operator
* @return {Array} array of `number` elements
*/
module.exports = function(evaluator, options) {
module.exports = async function(evaluator, options) {
var item = options.of;
var number = evaluator(options.number, true);
var number = await evaluator(options.number, true);
var replacement = _.map(_.range(number), function() {
return item;
});
var result = evaluator(replacement, true);
var result = await evaluator(replacement, true);
return result;
};
4 changes: 2 additions & 2 deletions lib/operators/binary.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,9 @@ var _ = require('lodash');
* @param {Object} options options to configure the array operator
* @return {Array} array of `number` elements
*/
module.exports = function(evaluator, options) {
module.exports = async function(evaluator, options) {
// default options
options = evaluator(
options = await evaluator(
_.defaults(options, {
length: 10,
subtype: '00'
Expand Down
4 changes: 2 additions & 2 deletions lib/operators/choose.js
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,13 @@ var chance = require('chance').Chance();
* @param {Object} options options to configure the choose operator
* @return {Any} chosen value, evaluated
*/
module.exports = function(evaluator, options) {
module.exports = async function(evaluator, options) {
var replacement;
if (options.weights) {
replacement = chance.weighted(options.from, options.weights);
} else {
replacement = chance.pickone(options.from);
}
var result = evaluator(replacement, true);
var result = await evaluator(replacement, true);
return result;
};
4 changes: 2 additions & 2 deletions lib/operators/coordinates.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ var _ = require('lodash');
* @return {Array} array of `number` elements
*/

module.exports = function(evaluator, options) {
options = evaluator(
module.exports = async function(evaluator, options) {
options = await evaluator(
_.defaults(options, {
long_lim: [-180, 180],
lat_lim: [-90, 90]
Expand Down
4 changes: 2 additions & 2 deletions lib/operators/date.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ var chance = require('chance').Chance();
* @param {Object} options options to configure the date operator
* @return {Array} random date between `min` and `max`
*/
module.exports = function(evaluator, options) {
module.exports = async function(evaluator, options) {
// default options
options = evaluator(
options = await evaluator(
_.defaults(options, {
min: '1900-01-01T00:00:00.000Z',
max: '2099-12-31T23:59:59.999Z'
Expand Down
4 changes: 2 additions & 2 deletions lib/operators/decimal.js
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ var bson = require('bson');
* @return {Array} Decimal128 value
*/

module.exports = function(evaluator, options) {
module.exports = async function(evaluator, options) {
// default options
options = evaluator(
options = await evaluator(
_.defaults(options, {
min: 0,
max: 1000,
Expand Down
15 changes: 11 additions & 4 deletions lib/operators/geometries.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,14 @@ var _ = require('lodash');
* @return {Array} array of `number` elements
*/

module.exports = function(evaluator, options) {
async function asyncMap(array, asyncFn) {
// Map over the array to create an array of promises
const promises = array.map(asyncFn);
// Wait for all the promises to resolve and return the results
return Promise.all(promises);
}

module.exports = async function(evaluator, options) {
// default options
options = _.defaults(options, {
types: ['Polygon', 'LineString', 'Point'],
Expand All @@ -31,14 +38,14 @@ module.exports = function(evaluator, options) {
};

// evaluate options first
options = evaluator(options);
options = await evaluator(options);

// remove corners from options and produce `corners` coordinate pairs
var geometries = _.map(_.range(options.number), function() {
var geometries = await asyncMap(_.range(options.number), async function() {
var op = nameToOperator[chance.pickone(options.types)];
var geometry = {};
geometry[op] = _.omit(options, ['types', 'number']);
return evaluator(geometry, true);
return await evaluator(geometry, true);
});

var result = {
Expand Down
8 changes: 5 additions & 3 deletions lib/operators/inc.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,13 @@

var counter = null;

module.exports = function(evaluator, options) {
module.exports = async function(evaluator, options) {
if (counter === null) {
counter = options.start !== undefined ? evaluator(options.start, true) : 0;
counter =
options.start !== undefined ? await evaluator(options.start, true) : 0;
} else {
var step = options.step !== undefined ? evaluator(options.step, true) : 1;
var step =
options.step !== undefined ? await evaluator(options.step, true) : 1;
counter += step;
}
return counter;
Expand Down
1 change: 1 addition & 0 deletions lib/operators/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ module.exports = {
inc: require('./inc'),
date: require('./date'),
now: require('./now'),
text: require('./text'),

/*
* Geospatial data
Expand Down
4 changes: 2 additions & 2 deletions lib/operators/integer.js
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ var chance = require('chance').Chance();

var MAX_VAL = Math.pow(2, 31);

module.exports = function(evaluator, options) {
module.exports = async function(evaluator, options) {
// default options
options = evaluator(
options = await evaluator(
_.defaults(options, {
min: -MAX_VAL,
max: MAX_VAL
Expand Down
Loading