Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: Config Schema (+ tests!) #11

Merged
merged 12 commits into from
Aug 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
- name: Checkout code
uses: actions/checkout@v3
- name: go tests
run: go test -v -covermode=count -json ./... > test.json
run: (set -o pipefail && go test -v -covermode=count -json ./... | tee test.json)
- name: annotate go tests
if: always()
uses: guyarb/[email protected]
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/main.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ jobs:
- name: Checkout code
uses: actions/checkout@v3
- name: go tests
run: go test -v -covermode=count -json ./... > test.json
run: (set -o pipefail && go test -v -covermode=count -json ./... | tee test.json)
- name: annotate go tests
if: always()
uses: guyarb/[email protected]
Expand Down
2 changes: 0 additions & 2 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,7 @@ linters:
- bodyclose # checks whether HTTP response body is closed successfully
- durationcheck # check for two durations multiplied together
- errorlint # errorlint is a linter for that can be used to find code that will cause problems with the error wrapping scheme introduced in Go 1.13.
- execinquery # execinquery is a linter about query string checker in Query function which reads your Go src files and warning it finds
- exhaustive # check exhaustiveness of enum switch statements
- exportloopref # checks for pointers to enclosing loop variables
- forbidigo # Forbids identifiers
- gochecknoinits # Checks that no init functions are present in Go code
- goconst # Finds repeated strings that could be replaced by a constant
Expand Down
73 changes: 55 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,43 @@

# `baton-databricks` [![Go Reference](https://pkg.go.dev/badge/github.com/conductorone/baton-databricks.svg)](https://pkg.go.dev/github.com/conductorone/baton-databricks) ![main ci](https://github.com/conductorone/baton-databricks/actions/workflows/main.yaml/badge.svg)

`baton-databricks` is a connector for Databricks built using the [Baton SDK](https://github.com/conductorone/baton-sdk). It communicates with the Databricks API, to sync data about Databricks identities (users, groups and service principals), roles and workspaces.
`baton-databricks` is a connector for Databricks built using the
[Baton SDK](https://github.com/conductorone/baton-sdk). It communicates with the
Databricks API, to sync data about Databricks identities (users, groups and
service principals), roles and workspaces.

Check out [Baton](https://github.com/conductorone/baton) to learn more about the project in general.

# Prerequisites

To work with the connector, you can choose from multiple ways to run it, but the main requirement is to have a Databricks account and its ID. You can find the ID of an account, after you log into account platform and click on your username in right top corner that will open a dropdown menu with the account ID along other options.

Another requirement is to have valid credentials to run the connector with. This will decide how connector will be executed. You can use either OAuth client credentials flow or Basic auth flow (username and password) or Bearer auth flow. Both OAuth and Basic can be used across account and all workspaces you have access to. Bearer auth can be used only for a specific workspace.

To use the OAuth, you need to create a service principal and add OAuth secret (client id and secret) to it. You can do that by going to the user management tab and clicking on the Service Principals tab. Then click on the Add Service principal button and name it. You then need to add OAuth secret to it by clicking on the Generate secret button. You can use this secret to authenticate across all workspaces that service principal has access to. To use basic auth, you just need to provide a username and password of a user that has access to the Databricks API. Both methods require admin access to the Databricks account and each workspace you want to sync.

To use bearer auth, you need to provide a Databricks workspace access token. You can create a new token by logging into the workspace and going into user settings. Then go to Developer tab and create a new access token. This will try to work with only specified workspaces and their respective tokens. You can provide multiple tokens by separating them with a comma. This method requires admin access to each workspace you want to sync.
To work with the connector, you can choose from multiple ways to run it, but the
main requirement is to have a Databricks account and its ID. You can find the ID
of an account, after you log into account platform and click on your username in
right top corner that will open a dropdown menu with the account ID along other
options.

Another requirement is to have valid credentials to run the connector with. This
will decide how connector will be executed. You can use either OAuth client
credentials flow or Basic auth flow (username and password) or Bearer auth flow.
Both OAuth and Basic can be used across account and all workspaces you have
access to. Bearer auth can be used only for a specific workspace.

To use the OAuth, you need to create a service principal and add OAuth secret
(client id and secret) to it. You can do that by going to the user management
tab and clicking on the Service Principals tab. Then click on the Add Service
principal button and name it. You then need to add OAuth secret to it by
clicking on the Generate secret button. You can use this secret to authenticate
across all workspaces that service principal has access to. To use basic auth,
you just need to provide a username and password of a user that has access to
the Databricks API. Both methods require admin access to the Databricks account
and each workspace you want to sync.

To use bearer auth, you need to provide a Databricks workspace access token. You
can create a new token by logging into the workspace and going into user
settings. Then go to Developer tab and create a new access token. This will try
to work with only specified workspaces and their respective tokens. You can
provide multiple tokens by separating them with a comma. This method requires
admin access to each workspace you want to sync.

# Getting Started

Expand Down Expand Up @@ -55,11 +79,22 @@ baton resources
- Users
- Roles

By default, connector will fetch all resources from the account and all workspaces. You can limit the scope of the sync by providing a list of workspaces to sync with. You can do that by providing a comma-separated list of workspace hostnames to the `--workspaces` flag. You can also provide a list of workspace access tokens to the `--workspace-tokens` flag. This will limit the sync to only workspaces that are associated with those tokens. You can also use both flags at the same time. If you do that, connector will sync with all workspaces that are associated with provided tokens and all workspaces that are in the list of workspaces.
By default, connector will fetch all resources from the account and all
workspaces. You can limit the scope of the sync by providing a list of
workspaces to sync with. You can do that by providing a comma-separated list of
workspace hostnames to the `--workspaces` flag. You can also provide a list of
workspace access tokens to the `--workspace-tokens` flag. This will limit the
sync to only workspaces that are associated with those tokens. You can also use
both flags at the same time. If you do that, connector will sync with all
workspaces that are associated with provided tokens and all workspaces that are
in the list of workspaces.

# Contributing, Support and Issues

We started Baton because we were tired of taking screenshots and manually building spreadsheets. We welcome contributions, and ideas, no matter how small -- our goal is to make identity and permissions sprawl less painful for everyone. If you have questions, problems, or ideas: Please open a Github Issue!
We started Baton because we were tired of taking screenshots and manually
building spreadsheets. We welcome contributions, and ideas, no matter how
small—our goal is to make identity and permissions sprawl less painful for
everyone. If you have questions, problems, or ideas: Please open a GitHub Issue!

See [CONTRIBUTING.md](https://github.com/ConductorOne/baton/blob/main/CONTRIBUTING.md) for more details.

Expand All @@ -78,21 +113,23 @@ Available Commands:
help Help about any command

Flags:
--account-id string The Databricks account ID used to connect to the Databricks Account and Workspace API. ($BATON_ACCOUNT_ID)
--account-id string required: The Databricks account ID used to connect to the Databricks Account and Workspace API ($BATON_ACCOUNT_ID)
--client-id string The client ID used to authenticate with ConductorOne ($BATON_CLIENT_ID)
--client-secret string The client secret used to authenticate with ConductorOne ($BATON_CLIENT_SECRET)
--databricks-client-id string The Databricks service principal's client ID used to connect to the Databricks Account and Workspace API. ($BATON_DATABRICKS_CLIENT_ID)
--databricks-client-secret string The Databricks service principal's client secret used to connect to the Databricks Account and Workspace API. ($BATON_DATABRICKS_CLIENT_SECRET)
--databricks-client-id string The Databricks service principal's client ID used to connect to the Databricks Account and Workspace API ($BATON_DATABRICKS_CLIENT_ID)
--databricks-client-secret string The Databricks service principal's client secret used to connect to the Databricks Account and Workspace API ($BATON_DATABRICKS_CLIENT_SECRET)
-f, --file string The path to the c1z file to sync with ($BATON_FILE) (default "sync.c1z")
-h, --help help for baton-databricks
--log-format string The output format for logs: json, console ($BATON_LOG_FORMAT) (default "json")
--log-level string The log level: debug, info, warn, error ($BATON_LOG_LEVEL) (default "info")
--password string The Databricks password used to connect to the Databricks API. ($BATON_PASSWORD)
-p, --provisioning This must be set in order for provisioning actions to be enabled. ($BATON_PROVISIONING)
--username string The Databricks username used to connect to the Databricks API. ($BATON_USERNAME)
--password string The Databricks password used to connect to the Databricks API ($BATON_PASSWORD)
-p, --provisioning This must be set in order for provisioning actions to be enabled ($BATON_PROVISIONING)
--skip-full-sync This must be set to skip a full sync ($BATON_SKIP_FULL_SYNC)
--ticketing This must be set to enable ticketing support ($BATON_TICKETING)
--username string The Databricks username used to connect to the Databricks API ($BATON_USERNAME)
-v, --version version for baton-databricks
--workspace-tokens strings The Databricks access tokens scoped to specific workspaces used to connect to the Databricks Workspace API. ($BATON_WORKSPACE_TOKENS)
--workspaces strings Limit syncing to the specified workspaces. ($BATON_WORKSPACES)
--workspace-tokens strings The Databricks access tokens scoped to specific workspaces used to connect to the Databricks Workspace API ($BATON_WORKSPACE_TOKENS)
--workspaces strings Limit syncing to the specified workspaces ($BATON_WORKSPACES)

Use "baton-databricks [command] --help" for more information about a command.
```
135 changes: 76 additions & 59 deletions cmd/baton-databricks/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,71 +4,88 @@ import (
"context"
"fmt"

"github.com/conductorone/baton-sdk/pkg/cli"
"github.com/spf13/cobra"
"github.com/conductorone/baton-sdk/pkg/field"
"github.com/spf13/viper"
)

// config defines the external configuration required for the connector to run.
type config struct {
cli.BaseConfig `mapstructure:",squash"` // Puts the base config options in the same place as the connector options

AccountId string `mapstructure:"account-id"`
DatabricksClientId string `mapstructure:"databricks-client-id"`
DatabricksClientSecret string `mapstructure:"databricks-client-secret"`
Username string `mapstructure:"username"`
Password string `mapstructure:"password"`
Workspaces []string `mapstructure:"workspaces"`
Tokens []string `mapstructure:"workspace-tokens"`
}

func (c *config) IsBasicAuth() bool {
return c.Username != "" && c.Password != ""
}

func (c *config) IsOauth() bool {
return c.DatabricksClientId != "" && c.DatabricksClientSecret != ""
}

func (c *config) AreTokensSet() bool {
return (len(c.Tokens) > 0) && (len(c.Workspaces) == len(c.Tokens))
}

func (c *config) IsAuthReady() bool {
return c.AreTokensSet() || c.IsOauth() || c.IsBasicAuth()
}

// validateConfig is run after the configuration is loaded, and should return an error if it isn't valid.
func validateConfig(ctx context.Context, cfg *config) error {
if cfg.AccountId == "" {
return fmt.Errorf("account ID must be provided, use --help for more information")
}

if !cfg.IsAuthReady() {
return fmt.Errorf("either access token along workspaces or username and password or client id and client secret must be provided, use --help for more information")
}

return nil
}

// cmdFlags sets the cmdFlags required for the connector.
func cmdFlags(cmd *cobra.Command) {
cmd.PersistentFlags().String("account-id", "", "The Databricks account ID used to connect to the Databricks Account and Workspace API. ($BATON_ACCOUNT_ID)")
cmd.PersistentFlags().String(
var (
AccountIdField = field.StringField(
"account-id",
field.WithDescription("The Databricks account ID used to connect to the Databricks Account and Workspace API"),
field.WithRequired(true),
)
DatabricksClientIdField = field.StringField(
"databricks-client-id",
"",
"The Databricks service principal's client ID used to connect to the Databricks Account and Workspace API. ($BATON_DATABRICKS_CLIENT_ID)",
field.WithDescription("The Databricks service principal's client ID used to connect to the Databricks Account and Workspace API"),
)
cmd.PersistentFlags().String(
DatabricksClientSecretField = field.StringField(
"databricks-client-secret",
"",
"The Databricks service principal's client secret used to connect to the Databricks Account and Workspace API. ($BATON_DATABRICKS_CLIENT_SECRET)",
field.WithDescription("The Databricks service principal's client secret used to connect to the Databricks Account and Workspace API"),
)
cmd.PersistentFlags().String("username", "", "The Databricks username used to connect to the Databricks API. ($BATON_USERNAME)")
cmd.PersistentFlags().String("password", "", "The Databricks password used to connect to the Databricks API. ($BATON_PASSWORD)")
cmd.PersistentFlags().StringSlice("workspaces", []string{}, "Limit syncing to the specified workspaces. ($BATON_WORKSPACES)")
cmd.PersistentFlags().StringSlice(
UsernameField = field.StringField(
"username",
field.WithDescription("The Databricks username used to connect to the Databricks API"),
)
PasswordField = field.StringField(
"password",
field.WithDescription("The Databricks password used to connect to the Databricks API"),
)
WorkspacesField = field.StringSliceField(
"workspaces",
field.WithDescription("Limit syncing to the specified workspaces"),
)
TokensField = field.StringSliceField(
"workspace-tokens",
[]string{},
"The Databricks access tokens scoped to specific workspaces used to connect to the Databricks Workspace API. ($BATON_WORKSPACE_TOKENS)",
field.WithDescription("The Databricks access tokens scoped to specific workspaces used to connect to the Databricks Workspace API"),
)
configurationFields = []field.SchemaField{
AccountIdField,
DatabricksClientIdField,
DatabricksClientSecretField,
PasswordField,
TokensField,
UsernameField,
WorkspacesField,
}
fieldRelationships = []field.SchemaFieldRelationship{
field.FieldsAtLeastOneUsed(
DatabricksClientIdField,
UsernameField,
TokensField,
),
field.FieldsMutuallyExclusive(
DatabricksClientIdField,
UsernameField,
TokensField,
),
field.FieldsRequiredTogether(
DatabricksClientIdField,
DatabricksClientSecretField,
),
field.FieldsRequiredTogether(
UsernameField,
PasswordField,
),
field.FieldsDependentOn(
[]field.SchemaField{TokensField},
[]field.SchemaField{WorkspacesField},
),
}
)

// validateConfig - additional validations that cannot be encoded in relationships (yet!)
func validateConfig(ctx context.Context, cfg *viper.Viper) error {
workspaces := cfg.GetStringSlice(WorkspacesField.FieldName)
tokens := cfg.GetStringSlice(TokensField.FieldName)

// If there are tokens, there must be an equivalent number of workspaces.
if len(tokens) > 0 && len(workspaces) != len(tokens) {
return fmt.Errorf(
"comma-separated list of workspaces and tokens must be the same length. Received %d workspaces and %d tokens",
len(workspaces),
len(tokens),
)
}

return nil
}
111 changes: 111 additions & 0 deletions cmd/baton-databricks/config_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
package main

import (
"context"
"testing"

"github.com/conductorone/baton-sdk/pkg/field"
"github.com/conductorone/baton-sdk/pkg/test"
"github.com/conductorone/baton-sdk/pkg/ustrings"
"github.com/spf13/viper"
)

func TestConfigs(t *testing.T) {
ctx := context.Background()

testCases := []test.TestCaseFromExpression{
{
"--account-id 1",
false,
"missing auth method",
},
{
"--username 1 --password 1",
false,
"missing account-id",
},
{
"--account-id 1 --username 1 --password 1",
true,
"username + password",
},
{
"--account-id 1 --username 1",
false,
"missing password",
},
{
"--account-id 1 --databricks-client-id 1 --databricks-client-secret 1",
true,
"client id + secret",
},
{
"--account-id 1 --databricks-client-id 1",
false,
"missing client secret",
},
{
"--account-id 1 --workspaces 1",
false,
"missing auth method, but has workspaces",
},
{
"--account-id 1 --workspaces 1 --username 1 --password 1",
true,
"workspaces + username + password",
},
{
"--account-id 1 --workspaces 1 --workspace-tokens 1",
true,
"auth tokens",
},
{
"--account-id 1 --workspace-tokens 1",
false,
"mission workspaces",
},
}

configurations := field.NewConfiguration(
configurationFields,
fieldRelationships...,
)

extraValidationFunction := func(configs *viper.Viper) error {
return validateConfig(ctx, configs)
}

test.ExerciseTestCasesFromExpressions(
t,
configurations,
extraValidationFunction,
ustrings.ParseFlags,
testCases,
)

t.Run("should validate token list lengths match", func(t *testing.T) {
v := viper.New()
v.Set("account-id", "1")
v.Set("workspaces", []string{"1", "2"})

f := func() error {
err := field.Validate(configurations, v)
if err != nil {
return err
}
return extraValidationFunction(v)
}

t.Run("should fail", func(t *testing.T) {
v.Set("workspace-tokens", []string{"3"})

test.AssertValidation(t, f, false)
})

t.Run("should succeed", func(t *testing.T) {
v.Set("workspace-tokens", []string{"3", "4"})

test.AssertValidation(t, f, true)
})
})
}
Loading
Loading