Skip to content

Commit

Permalink
PS-9181 [DOCS] - document background dictionary cache reload function…
Browse files Browse the repository at this point in the history
…ality in the Masking Functions component 8.0

	modified:   docs/data-masking-function-list.md
	modified:   docs/data-masking-overview.md
  • Loading branch information
patrickbirch committed Jan 7, 2025
1 parent e7b9f6d commit fd28775
Show file tree
Hide file tree
Showing 2 changed files with 111 additions and 5 deletions.
86 changes: 85 additions & 1 deletion docs/data-masking-function-list.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,27 @@

The feature is in [tech preview](glossary.md#tech-preview).

| **Name** | **Usage** |
## Permissions

In Percona Server for MySQL 8.0.41, dictionary-related functions no longer run internal queries as the root user without a password. Following MySQL best practices, many admins disable the `root` user, which previously caused these functions to stop working. The server now uses the built-in `mysql.session` user to execute dictionary queries.

However, for this to work, you need to grant the mysql.session user `SELECT`, `INSERT`, `UPDATE`, and `DELETE` privileges on the `masking_dictionaries` table.

```{.bash data-prompt="mysql>"}
mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON mysql.masking_dictionaries TO 'mysql.session'@'localhost';
```

If you change the value of the `component_masking_functions.masking_database` system variable to something other than `mysql`, make sure to update the `GRANT` query to match the new value.

```{.bash data-prompt="mysql>"}
mysql> GRANT SELECT, INSERT, UPDATE, DELETE ON <component_masking_functions.masking_database>.masking_dictionaries TO 'mysql.session'@'localhost';
```

## Data masking component functions

| **Name** | **Details** |
|---------------------------------------------------|-------------------------------------------------------|
| [`dictionaries_flush_interval_seconds (integer, unsigned)`](#dictionaries_flush_interval_secondsinteger-unsigned) | The number of seconds between updates to the internal dictionary cache to match changes in the dictionaries table.|
| [`gen_blocklist(str, from_dictionary_name, to_dictionary_name)`](#gen_blockliststr-from_dictionary_name-to_dictionary_name) | Replace a term from a dictionary |
| [`gen_dictionary(dictionary_name)`](#gen_dictionarydictionary_name) | Returns a random term from a dictionary |
| [`gen_range(lower, upper)`](#gen_rangelower-upper) | Returns a number from a range |
Expand All @@ -24,14 +43,40 @@ The feature is in [tech preview](glossary.md#tech-preview).
| [`mask_ssn(str [,mask_char])`](#mask_ssnstr-mask_char) | Masks the US Social Security number |
| [`mask_uk_nin(str [,mask_char])`](#mask_uk_ninstr-mask_char) | Masks the United Kingdom National Insurance number |
| [`mask_uuid(str [,mask_char])`](#mask_uuidstr-mask_char) | Masks the Universally Unique Identifier |
| [`masking_dictionaries(str)`](#masking_dictionariesstr) | Set a different database name to use for the dictionaries table. |
| [`masking_dictionaries_flush()`](#masking_dictionaries_flush) | Resyncs the internal dictionary term cache |
| [`masking_dictionary_remove(dictionary_name)`](#masking_dictionary_removedictionary_name) | Removes the dictionary |
| [`masking_dictionary_term_add(dictionary_name, term_name)`](#masking_dictionary_term_adddictionary_name-term_name) | Adds a term to the masking dictionary |
| [`masking_dictionary_term_remove(dictionary_name, term_name)`](#masking_dictionary_term_removedictionary_name-term_name) | Removes a term from the masking dictionary |


## dictionaries_flush_interval_seconds(integer, unsigned)

The number of seconds between a synchronization between the dictionaries table and the internal dictionary cache.

This variable is read-only. Its default value is 0, which means the synchronization operation does not run.

## Version update

Percona Server for MySQL 8.0.41 introduces this variable.

### Parameters

| Parameter | Optional | Description | Type |
| --- | --- | --- | --- |
| `seconds` | Yes | The number of seconds between a synchronization of the dictionary internal cache and dictionaries table. | Integer, unsigned |


## gen_blocklist(str, from_dictionary_name, to_dictionary_name)

Replaces a term from one dictionary with a randomly selected term in another dictionary.

### Version update

Percona Server for MySQL 8.0.41 introduces an internal term cache. Instead of querying the underlying `mysql.masking_dictionaries` table each time a function is executed, the server now utilizes internal in-memory data structures for lookups. This enhancement significantly improves performance, particularly when processing multiple rows.



### Parameters

| Parameter | Optional | Description | Type |
Expand Down Expand Up @@ -760,6 +805,45 @@ mysql> SELECT mask_uuid('9a3b642c-06c6-11ee-be56-0242ac120002');
+-------------------------------------------------------+
```

## masking_database(string)

Specify the name of the database that holds the `dictionaries` table. By default, it uses the `mysql` database.

### Parameters

Name of the database as a string.

### Returns

Returns a string value of `1` (one) when successful.

## masking_dictionaries_flush()

Resyncs the internal dictionary term cache.

### Parameters

None

### Returns

Returns a string value of `1` (one) when successful.

### Example

```{.bash data-prompt="mysql>"}
mysql> SELECT masking_dictionaries_flush();
```
??? example "Expected output"

```{.text .no-copy}
+------------------------------+
| masking_dictionaries_flush() |
+------------------------------+
| 1 |
+---------------------------- +
```

## masking_dictionary_remove(dictionary_name)

Removes all of the terms and then removes the dictionary.
Expand Down
30 changes: 26 additions & 4 deletions docs/data-masking-overview.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Data masking overview

Data masking protects sensitive information by blocking unauthorized users from accessing the real data. This process creates altered versions of data for specific uses, like presentations, sales demonstrations, or software testing. The masked data keeps the same format as the original but contains changed values that cannot be reversed to reveal the true information. By making the data worthless to outsiders, masking helps organizations reduce their risk of data breaches or misuse. Companies can safely use masked data in various scenarios without exposing confidential details to unauthorized parties.
Data masking protects sensitive information by restricting data access to authorized users only. When you need to present, demonstrate, or test software without revealing actual data, masking creates safe versions of your data. The masking process changes values while keeping the same data format, making the original values impossible to recover. This security approach reduces organizational risk because any exposed data becomes worthless to unauthorized parties.

Data masking in Percona Server for MySQL is an essential tool for protecting sensitive information in various scenarios:

Expand All @@ -14,16 +14,38 @@ Data masking in Percona Server for MySQL is an essential tool for protecting sen

These examples underscore how data masking serves as a crucial safeguard for sensitive information, allowing organizations to leverage their data effectively across diverse functions.

Data masking helps to limit the exposure of sensitive data by preventing access to non-authorized users. Masking provides a way to create a version of the data in situations, such as a presentation, sales demo, or software testing, when the real data should not be used. Data masking changes the data values while using the same format and cannot be reverse engineered. Masking reduces an organization's risk by making the data useless to an outside party.

## Version updates

Percona Server for MySQL 8.0.41 introduces an internal term cache for the
following functions in the [data masking component](data-masking-function-list.md):

* [gen_blocklist](data-masking-function-list.md#gen_blockliststr-from_dictionary_name-to_dictionary_name)

* [gen_dictionary](data-masking-function-list.md#gen_dictionarydictionary_name)

Instead of querying the underlying `mysql.masking_dictionaries` table each time a function is executed, the server now utilizes internal in-memory data structures for lookups. This enhancement significantly improves performance, particularly when processing multiple rows.

With this redesign, the internal dictionary term cache might get out of sync with the underlying dictionaries table (default is `mysql.masking_dictionaries`). This can happen if you directly change the table instead of using the dedicated dictionary manipulation functions (`[masking_dictionary_term_add()]((data-masking-function-list.md#masking_dictionary_term_adddictionary_name-term_name`), [`masking_dictionary_term_remove()`](data-masking-function-list.md#masking_dictionary_term_removedictionary_name-term_name), [`masking_dictionary_remove()`](data-masking-function-list.md#masking_dictionary_removedictionary_name).

To resync the internal dictionary term cache, we added a new function called [`masking_dictionaries_flush()`](data-masking-function-list.md#masking_dictionaries_flush). This function takes no arguments and returns 1 when it succeeds.

This redesign also affects row-based replication. Changes to the dictionaries table, either through dedicated functions or directly on the source, are sent to a replica via the binary log. The applier thread reads these binary log events on the replica and applies them successfully. However, the dictionary term cache on the replica doesn't update automatically.

We introduced a new system variable called `component_masking_functions.dictionaries_flush_interval_seconds (read-only, integer, unsigned, default 0)`.

When you set this variable to any value other than 0, the component starts a background thread at startup that periodically syncs the dictionaries table with the internal dictionary term cache. The value specifies the number of seconds between each sync.

If this variable has a non-zero value on a replica, the dictionary term cache eventually syncs with the underlying dictionaries table after receiving those binary log events.

## Data masking techniques

The common data masking techniques are the following:

| Technique | Description |
| --- | --- |
| Custom string | Replaces sensitive data with a specific string, such as a phone number with XXX-XXX-XXXX |
| Data substitution | Replaces sensitive data with realistic alternative values, such as city name with another name from a dictionary |
| Character substitution | Replaces sensitive data with a matching symbol (X,*). For example, a phone number becomes XXX-XXX-XXXX. |
| Value generation | Replaces sensitive data with realistic-looking alternative values. For example, for testing purposes, you can generate a realistic alternative United States Social Security Number. |

## Additional resources

Expand Down

0 comments on commit fd28775

Please sign in to comment.