Classification of Children's with Various Supervised and Unsupervised Methods

Stanford University CS229 Spring 2022 Project

Contributors: Daniel Huang, Ruth-Ann Armstrong, Radhika Kapoor

Model summary and usage

For running the code as-is, it is highly recommended to import the environment from environment.yml using conda env create -f environment.yml.

Neural network model (`neural_network.py`)

Two classes for neural networks are implemented: n_layer_neural_network() and two_layer_neural_network().The n-layer neural network is comprised of a user-specified $n_{\text{hidden layers}} = n_{HL}$ fully connected hidden layers each of uniform size $n_{\text{hidden}} = n_H$ with user-specified activation functions. (util.py contains ReLu and Sigmoid activations with their corresponding derivative functions). More specific documentation can be found in the docstring of each method. An example of creating, training, and evaluating an n-layer neural network is seen below:

from neural_network import n_layer_neural_network
import util

# Specify model sizes
n_features = 100
n_hidden = 10
n_layers = 5
n_classes = 3
nn = n_layer_neural_network(n_features, n_hidden, layers, n_levels,
                            [util.sigmoid] * n_layers, [util.dsigmoid] * n_layers)
# Gather data
train_data, train_labels = load_train_set()
dev_data, dev_labels = load_dev_set()
test_data, test_labels = load_test_set()
# Specify model training parameters
reg = 0
lr = 0.1
epochs = 50
batch_size = 10
# Fit model
cost_train, accuracy_train, cost_dev, accuracy_dev = nn.fit(train_data, train_labels, batch_size=batch_size, num_epochs=epochs, dev_data=dev_data, dev_labels=dev_labels,learning_rate=lr, reg=reg, print_epochs=True)
# Evaluate model
pred_test = nn.predict_one_hot(test_data) # predict() for raw output probabilities
# Confusion matrix on the test set
print((pred_test.T @ test_labels).astype(int))

The 2-layer neural network is simply the a n-layer neural network with only 1 hidden layer with a sigmoid activation function. It can be created similar to an n-layer neural network with fewer necessary parameters.

Naive Bayes model (`naive_bayes.py`)

The Naive Bayes model is implemented in the class naive_bayes_model(). Since it derives from the general model class util.classification_model, the workflow is extremely similar to that shown above for the n-layer neural network with fewer necessary parameters. See docstring documentation for more specific information.

K-means model (`cs229_kmeans.py`)

The K-means model was imported from sklearn.cluster.KMeans with no additional tweaks.

TODO list (Complete as of 6/6/2022)

Data analysis

Isolate and refine hyperparameter search
Create plots for k-means
Create plots for hyperparameter search
Clean up k-means

`util.py`

Unify util.load_dataset API with more dataset filter options
- Group by books (much less data, but more descriptive)
- ~~Appending other features into feature list~~
  - ~~Sentence repetition?~~
Encode the chunks of data using a NLP vectorizer?

`neural_network.py`

`naive_bayes.py`

Complete naive bayes implementation in a class

`construct_datafiles.py`

Process dataset
- Create class for each book containing attributes:
  - Title (str)
  - ISBN (int64)
  - Level (int) (0:A, 1:B, etc...)
  - Words (list of separated words stripped of ending punctuation)
  - Other features TBD
- Create word-to-index mapping of entire dataset (Must have all of the relevant words from all batches)
  - Save into a .csv file so it can be loaded more easily

Other

Develop k-means model
Import other language models?

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
kmeans_files		kmeans_files
neural_network_files		neural_network_files
.gitignore		.gitignore
README.md		README.md
construct_datafiles.py		construct_datafiles.py
cs229_kmeans.py		cs229_kmeans.py
data_summary.py		data_summary.py
environment.yml		environment.yml
large_books_dataset.py		large_books_dataset.py
naive_bayes.py		naive_bayes.py
neural_network.py		neural_network.py
nn_hyperparameter_tuning.py		nn_hyperparameter_tuning.py
pretrained_model_vectorizer.py		pretrained_model_vectorizer.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classification of Children's with Various Supervised and Unsupervised Methods

Model summary and usage

Neural network model (`neural_network.py`)

Naive Bayes model (`naive_bayes.py`)

K-means model (`cs229_kmeans.py`)

TODO list (Complete as of 6/6/2022)

Data analysis

`util.py`

`neural_network.py`

`naive_bayes.py`

`construct_datafiles.py`

Other

About

Releases

Packages

Contributors 2

Languages

pi314ever/cs229_sp2022_project

Folders and files

Latest commit

History

Repository files navigation

Classification of Children's with Various Supervised and Unsupervised Methods

Model summary and usage

Neural network model (neural_network.py)

Naive Bayes model (naive_bayes.py)

K-means model (cs229_kmeans.py)

TODO list (Complete as of 6/6/2022)

Data analysis

util.py

neural_network.py

naive_bayes.py

construct_datafiles.py

Other

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Neural network model (`neural_network.py`)

Naive Bayes model (`naive_bayes.py`)

K-means model (`cs229_kmeans.py`)

`util.py`

`neural_network.py`

`naive_bayes.py`

`construct_datafiles.py`

Packages