Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of DBSCAN model #75

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from

Conversation

Kelang-Tian
Copy link

Implementation of the clustering model.
The test process will be submitted next time.

iris_target = iris.target


def check_model_exist(path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this function is not used?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the function.

@@ -6,7 +6,7 @@ setup: ## Setup virtual environment for local development
&& $(MAKE) install-requirements

install-requirements:
pip install -U -e .
pip install --default-timeout=1000 -U -e .
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why increasing the timeout threshold ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout occurred while installing the packages because of network instability.

import pandas as pd
from sklearn import datasets, metrics

def optimizer():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that it's not a NN model, should we remove this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there is no need to set optimizer.


return self

def _split_dataset(self, dataset):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need this function?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this function with _read_Dataset_data, which split tf.dataset type data into features and labels(If it exists).

cluster_labels[self.clusters[i][j]] = i
return cluster_labels

def fit(self, X):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of overriding fit function, we can implement sqlflow_train_loop to define the custom train loop logic. You may find that sqlflow_train_loop is not the standard API of Keras model, it's just let SQLFlow runtime to know to run the custom train loop.

c.f. https://github.com/sql-machine-learning/sqlflow/blob/9628487783878e9f019b1bf379e192134344e3a3/python/sqlflow_submitter/tensorflow/train_keras.py#L173

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enables sqlflow_train_loop function to handle tf.dataset type of data, because sqlflow server invokes tf.dataset.
https://github.com/sql-machine-learning/sqlflow/blob/483b8676cf93f373d5073d84b0bee311bb122012/python/runtime/tensorflow/input_fn.py#L72

purity_score(y_df, self.labels_))
'''
if __name__ == '__main__':
from sklearn.datasets.samples_generator import make_blobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the comment code and testing this model in the test_db_scan.py? A referenced test script: https://github.com/sql-machine-learning/models/blob/develop/tests/test_arima_with_stl_decomposition.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment code removed and rename the test_db_scan.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants