Skip to content

Latest commit

 

History

History
36 lines (27 loc) · 1.31 KB

README.md

File metadata and controls

36 lines (27 loc) · 1.31 KB

domain-adaptation-nlp

Dataset

Our amazon dataset (Blitzer et al., 2007) can be downloaded here. Put this file in a folder called "data/amazon_reviews".

This data contains 2000 samples of the four categories in the amazon reviews data:

  • Books
  • Electronics
  • Home and Kitchen (Kitchen)
  • Movies and TV (DVDs)

We choose these categories because they are frequently used in nlp sentiment analysis domain adaptation papers.

You can open the data (for example the amazon data) using the following code, although this step should be already included in any function you need to run.

with open("../data/amazon_reviews/amazon_4.pickle", "rb") as fr:
        all_data = pickle.load(fr)

For each element in the amazon data, and for the movie data, the structure is as follows:

  • [0] bert embeddings ([CLS] layer)
  • [1] y labels (0 means negative and 1 means positive)
  • [2] domain name

Instructions to run

Balanced Conf Model and Few Labels Models

  • Create an output folder under this root directory if it does not exist.
  • Run src/sentiment_classification_amazon.py from the root directory.

Householder Transformation

  • Adjust the n of n_fold want to use(default: 1000).
  • Run src/domain_space_alignment.py from the root directory.