Skip to content

Commit

Permalink
docs(config): add documentation for modifying config.yaml parameters
Browse files Browse the repository at this point in the history
  • Loading branch information
edemirci-aai authored Sep 24, 2024
1 parent 7f23308 commit 8dffa99
Showing 1 changed file with 146 additions and 0 deletions.
146 changes: 146 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,152 @@ This document outlines the core functions used in the `VirtualHavruta` class. Th
| Function Name | Purpose | Input Parameters | Output |
|---------------|---------|------------------|--------|
| `graph_traversal_retriever(self, screen_res: str, scripture_query: str, enriched_query: str, filter_mode_nodes: str | None = None, linker_results: list[dict] | None = None, semantic_search_results: list[tuple[Document, float]] | None = None, msg_id: str = '')` | Retrieves related chunks by traversing the graph starting from seed chunks. | - `screen_res: str`: Screen result query<br>- `scripture_query: str`: Scripture query<br>- `enriched_query: str`: Enriched query<br>- `filter_mode_nodes: str | None = None`: Node filter mode<br>- `linker_results: list[dict] | None = None`: Linker results<br>- `semantic_search_results: list[tuple[Document, float]] | None = None`: Semantic search results<br>- `msg_id: str = ''`: Message ID for logging | Tuple `(retrieval_res_kg: list[tuple[Document, float]], total_token_count: int)` |

##Configuration Guide for config.yaml

This guide explains how to modify the config.yaml file for the Virtual Havruta project. The configuration file controls the environment, database connections, Slack integration, model API setups, and various other settings.

---

1. Environment-related parameters

These parameters control the application's behavior, logging, and thought process visibility.

environment:
use_app_mention: false
show_thought_process: true
show_kg_link: true
log_name: Virtual-Havruta

- `use_app_mention`: Set to `true` to respond only when mentioned in Slack, or `false` to respond to all messages.
- `show_thought_process`: Set to `true` to display the intermediate thought process in Slack responses, or `false` to hide it.
- `show_kg_link`: Set to `true` to include Knowledge Graph (KG) visualization links in responses, or `false` to hide the KG link.
- `log_name`: Name used for logging. Useful for identifying logs from different runs or environments.

---

2. Database-related parameters

These settings define the database connections for embedding-based and KG-based queries.

database:
embed:
url: bolt://publicip:7687
username: user
password: password@dev
top_k: 15
metadata_fields: ['metadata_field_name1', 'metadata_field_name2']
topic_fields: ['topic_field_name1', 'topic_field_name2']
kg:
url: bolt://publicip_kg:7687
username: user
password: password@dev
order: 1
direction: both_ways
k_seeds: 5
max_depth: 2
name: db_name
neo4j_deeplink: http://neodash.graphapp.io/xyz

Embed settings:
- `url`: The Neo4j database connection URL.
- `username` / `password`: Database credentials for Neo4j.
- `top_k`: Number of top search results to retrieve.
- `metadata_fields`: Metadata fields used for query filtering.
- `topic_fields`: Topic fields used for expanding queries.

KG settings:
- `url`: Connection URL for the Knowledge Graph database.
- `order`: Specifies search order.
- `direction`: Determines the direction of edges between nodes. Options are:
- `incoming`: Search for newer references.
- `outgoing`: Search for older references.
- `both_ways`: Search in both directions.
- `k_seeds`: Number of starting seeds for the KG search.
- `max_depth`: Maximum depth for KG traversal, which limits the path length.
- `neo4j_deeplink`: A direct link to the Neo4j visualizer.

---

3. Slack-related parameters

These parameters configure the Slack bot's authentication.

slack:
slack_bot_token: slack_bot_token
slack_app_token: slack_app_token

- `slack_bot_token`: The token for the Slack bot's authentication.
- `slack_app_token`: The application token used for real-time WebSocket communication with Slack.

---

4. Model API parameters

Settings to configure which models the application uses, including main, support, and embedding models.

openai_model_api:
api_key: openai_model_api_key
main_model: main_model_name
main_model_temperature: 0
support_model: support_model_name
support_model_temperature: 0
embedding_model: embedding_model_name

- `api_key`: The OpenAI API key for accessing models.
- `main_model`: The main model used to generate responses.
- `main_model_temperature`: Controls the randomness of the main model’s output (0 = deterministic, 1 = more random).
- `support_model`: A secondary model for additional tasks.
- `support_model_temperature`: Similar to `main_model_temperature`, but for the support model.
- `embedding_model`: Model used for generating embeddings.

---

5. LLM Chain Setups

This section defines the sequence of chains used for different tasks handled by the main model and the support model.

llm_chain_setups:
main_model: ['chain1', 'chain2']
main_model_json: ['chain3']
support_model: ['chain4', 'chain5', 'chain6']
support_model_json: []

- `main_model`: Chains used by the main model for text responses.
- `main_model_json`: Chains used for JSON-related tasks by the main model.
- `support_model`: Chains used by the support model for auxiliary tasks.
- `support_model_json`: JSON-related tasks handled by the support model.

---

6. Reference Settings

Settings related to how primary and secondary references are filtered and cited.

references:
primary_source_filter: ['filter1', 'filter2', 'filter3']
num_primary_citations: 1
num_secondary_citations: 1

- `primary_source_filter`: Filters applied to primary references during search.
- `num_primary_citations`: Number of primary source citations to include.
- `num_secondary_citations`: Number of secondary source citations to include.

---

7. Linker References

Settings for linking references from the database.

linker_references:
primary_source_filter: ['filter1', 'filter2', 'filter3', 'filter4', 'filter5']
num_primary_citations: -1
num_secondary_citations: -1

- `primary_source_filter`: Additional filters applied to primary sources.
- `num_primary_citations`: Number of primary citations to include from linked references.
- `num_secondary_citations`: Number of secondary citations to include from linked references.

## Future Directions
While currently focused on Judaic scriptures, the underlying technology of Virtual Havruta has potential for broader applications. Its adaptability to other domains highlights the project's versatility and the promise of RAG technology in various fields.
## Acknowledgments
Expand Down

0 comments on commit 8dffa99

Please sign in to comment.