Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add generate embeddings #5

Merged
merged 1 commit into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
79 changes: 79 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# MomConnect Intent Classifier

Model that classifies the intent of inbound messages.

## Development
This project uses [poetry](https://python-poetry.org/docs/#installation) for packaging and dependancy management, so install that first.

Ensure you're also running at least python 3.11, `python --version`.

Then you can install the dependencies
```bash
~ poetry install
```

To run a local worker, set NLU_USERNAME and NLU_PASSWORD environment variables, then start up the flask worker
```bash
~ poetry run flask --app src.application run
```

To run the autoformatting and linting, run
```bash
~ ruff format && ruff check && mypy --install-types
```

For the test runner, we use [pytest](https://docs.pytest.org/):
```bash
~ pytest
```

## Regenerating the embeddings json file

1. Delete the json embeddings file in src/data/
1. Update the nlu.yaml with your changes
1. Run the flask app, this should regenerate the embeddings file. `poetry run flask --app src.application run`

## Editor configuration

If you'd like your editor to handle linting and/or formatting for you, here's how to set it up.

### Visual Studio Code

1. Install the Python and Ruff extensions
1. In settings, check the "Python > Linting: Mypy Enabled" box
1. In settings, set the "Python > Formatting: Provider" to "black" (apparently "ruff format" isn't supported by the Python extension yet and "black" is probably close enough)
1. If you want to have formatting automatically apply, in settings, check the "Editor: Format On Save" checkbox

Alternatively, add the following to your `settings.json`:
```json
{
"python.linting.mypyEnabled": true,
"python.formatting.provider": "black",
"editor.formatOnSave": true,
}
```

## Release process

To release a new version, follow these steps:

1. Make sure all relevant PRs are merged and that all necessary QA testing is complete
1. Make sure release notes are up to date and accurate
1. In one commit on the `main` branch:
- Update the version number in `pyproject.toml` to the release version
- Replace the UNRELEASED header in `CHANGELOG.md` with the release version and date
1. Tag the release commit with the release version (for example, `v0.2.1` for version `0.2.1`)
1. Push the release commit and tag
1. In one commit on the `main` branch:
- Update the version number in `pyproject.toml` to the next pre-release version
- Add a new UNRELEASED header in `CHANGELOG.md`
1. Push the post-release commit

## Running in Production
There is a [docker image](https://github.com/praekeltfoundation/mc-intent-classifier/pkgs/container/mc-intent-classifier) that can be used to easily run this service. It uses the following environment variables for configuration:

| Variable | Description |
| ---------- | ----------- |
| NLU_USERNAME | The username used for API requests |
| NLU_PASSWORD | The password used for API requests |
| SENTRY_DSN | Where to send exceptions to |
7 changes: 5 additions & 2 deletions src/application.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,8 @@

dirname = os.path.dirname(__file__)
DATA_PATH = Path(f"{dirname}/data")
OUTPUT_FILE_PATH = DATA_PATH / "new_test_intent_embeddings.json"
NLU_FILE_PATH = DATA_PATH / "nlu.yaml"
EMBEDDINGS_FILE_PATH = DATA_PATH / "intent_embeddings.json"

app = Flask(__name__)
app.config["BASIC_AUTH_USERNAME"] = os.environ.get("NLU_USERNAME")
Expand All @@ -31,7 +32,9 @@
metrics = PrometheusMetrics(app)
metrics.info("app_info", "Application info", version=version)

classifier = IntentClassifier(json_path=OUTPUT_FILE_PATH)
classifier = IntentClassifier(
embeddings_path=EMBEDDINGS_FILE_PATH, nlu_path=NLU_FILE_PATH
)


@app.route("/")
Expand Down
2 changes: 1 addition & 1 deletion src/data/intent_embeddings.json

Large diffs are not rendered by default.

1 change: 0 additions & 1 deletion src/data/new_test_intent_embeddings.json

This file was deleted.

Loading