-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathAnalysis.txt
65 lines (48 loc) · 3.97 KB
/
Analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
----------
| ANALYSIS |
----------
-------------------------------------------------------------------------------------------
I used the following S3 Bucket for my project. It was video games analysis:
https://s3.amazonaws.com/amazon-reviews-pds/tsv/amazon_reviews_us_Video_Games_v1_00.tsv.gz
-------------------------------------------------------------------------------------------------
Executive Overview
-------------------------------------------------------------------------------------------------
Based on my findings, it is not surprising to see that non-vine customers had the highest number
of Total Votes and Helpful Votes, as there are a total of 1,781,596 non-vine customers compared
to just 4,290 vine customers for video games. However, both vine and non-vine customers provide
similar average ratings of 4.07 and 4.06, respectively.
However, looking at the final table in the appendix, I saw that the number of Helpful Votes per
customer is slightly higher for vine customers (2.35) compared to non-vine customers (2.26).
From the data, I also noticed that non-vine customers had the lion's share of five star ratings
(1,025,249) and the highest ratio of five star ratings per customer (0.58) compared to vine
clients (1,607 and 0.37).
Nevertheless, vine customers were less likely to provide ratings on the lower end of the spectrum.
The ratio of vine customers that gave one star ratings were 0.01 compared to 0.11 for non-vine
customers.
------------------------------------------------------------------------------------------------
APPENDIX
------------------------------------------------------------------------------------------------
After cleaning the data set, there were 1,785,886 distinct rows of information.
These were some of my findings:
Table 1 Table 2 Table 3
-------------------- ---------------------- ----------------------
Vine Total Votes Average Rating
-------------------- ---------------------- ----------------------
Paid 4,290 Paid 14,064 Paid 4.07
Unpaid 1,781,596 Unpaid 6,696,252 Unpaid 4.06
Table 4 Table 5 Table 6
-------------------- ---------------------- ----------------------
Helpful Votes 5-Star Rating 1-Star Rating
-------------------- ---------------------- ----------------------
Paid 10,076 Paid 1,607 Paid 60
Unpaid 4,024,920 Unpaid 1,025,249 Unpaid 192,088
-------------
Final Table
-------------
+----+-------+----------------+---------+----------------+------------------+-----------------+---------------+-------------------+--------------+------------------+
|vine| count|sum(total_votes)|votes_per|avg(star_rating)|sum(helpful_votes)|helpful_votes_per|five_star_votes|five_star_votes_per|one_star_votes|one_star_votes_per|
+----+-------+----------------+---------+----------------+------------------+-----------------+---------------+-------------------+--------------+------------------+
| Y| 4290| 14064| 3.28| 4.07| 10076| 2.35| 1607| 0.37| 60| 0.01|
| N|1781596| 6696252| 3.76| 4.06| 4024920| 2.26| 1025249| 0.58| 192088| 0.11|
+----+-------+----------------+---------+----------------+------------------+-----------------+---------------+-------------------+--------------+------------------+
----------------------------------------------------------------------------------------------------------------------------------------------------------------------