-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of DBSCAN model #75
base: develop
Are you sure you want to change the base?
Conversation
tests/test_dbscan.py
Outdated
iris_target = iris.target | ||
|
||
|
||
def check_model_exist(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that this function is not used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the function.
@@ -6,7 +6,7 @@ setup: ## Setup virtual environment for local development | |||
&& $(MAKE) install-requirements | |||
|
|||
install-requirements: | |||
pip install -U -e . | |||
pip install --default-timeout=1000 -U -e . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why increasing the timeout threshold ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout occurred while installing the packages because of network instability.
import pandas as pd | ||
from sklearn import datasets, metrics | ||
|
||
def optimizer(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that it's not a NN model, should we remove this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, there is no need to set optimizer.
sqlflow_models/dbscan.py
Outdated
|
||
return self | ||
|
||
def _split_dataset(self, dataset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace this function with _read_Dataset_data, which split tf.dataset type data into features and labels(If it exists).
sqlflow_models/dbscan.py
Outdated
cluster_labels[self.clusters[i][j]] = i | ||
return cluster_labels | ||
|
||
def fit(self, X): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overriding fit
function, we can implement sqlflow_train_loop
to define the custom train loop logic. You may find that sqlflow_train_loop
is not the standard API of Keras model, it's just let SQLFlow runtime
to know to run the custom train loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enables sqlflow_train_loop
function to handle tf.dataset type of data, because sqlflow
server invokes tf.dataset.
https://github.com/sql-machine-learning/sqlflow/blob/483b8676cf93f373d5073d84b0bee311bb122012/python/runtime/tensorflow/input_fn.py#L72
sqlflow_models/dbscan.py
Outdated
purity_score(y_df, self.labels_)) | ||
''' | ||
if __name__ == '__main__': | ||
from sklearn.datasets.samples_generator import make_blobs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the comment code and testing this model in the test_db_scan.py
? A referenced test script: https://github.com/sql-machine-learning/models/blob/develop/tests/test_arima_with_stl_decomposition.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment code removed and rename the test_db_scan.py.
Implementation of the clustering model.
The test process will be submitted next time.