Our amazon dataset (Blitzer et al., 2007) can be downloaded here. Put this file in a folder called "data/amazon_reviews".
This data contains 2000 samples of the four categories in the amazon reviews data:
- Books
- Electronics
- Home and Kitchen (Kitchen)
- Movies and TV (DVDs)
We choose these categories because they are frequently used in nlp sentiment analysis domain adaptation papers.
You can open the data (for example the amazon data) using the following code, although this step should be already included in any function you need to run.
with open("../data/amazon_reviews/amazon_4.pickle", "rb") as fr:
all_data = pickle.load(fr)
For each element in the amazon data, and for the movie data, the structure is as follows:
- [0] bert embeddings ([CLS] layer)
- [1] y labels (0 means negative and 1 means positive)
- [2] domain name
- Create an output folder under this root directory if it does not exist.
- Run src/sentiment_classification_amazon.py from the root directory.
- Adjust the n of n_fold want to use(default: 1000).
- Run src/domain_space_alignment.py from the root directory.