-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdiscussion.tex
129 lines (57 loc) · 10.2 KB
/
discussion.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
%!TEX root=paper.tex
\section{Discussion}
\label{sec:discussion}
% \todo{use this subsection to discuss limitations (from the previous evaluation section) and add the need to be able to handle the horizontal scaling of the application; remove sectioning and summarize briefly limitations that have already been discussed in the VISSOFT paper/move material to other sections; consider renaming as limitations or something similar}
The main goal of the \tool design was to allow analytics to be collected and insight to be gained by making the smallest possible changes to a running API.
%In this paper we present the integration with APIs written in Python and Flask hoping that this will not prevent the reader from seeing the more general idea; all the tools we show here for Flask can be applied to other API technologies (e.g. Django) by simply providing a few back-end adapters in the right places.
The presented approach was centered around an API written in Python and Flask, but, in principle,
we can use the same approach for other technologies. We showed here how to do it for Python and Flask because they are some of the most popular web development technologies at the moment. However,
% we hope that this will not prevent the reader from seeing the more general idea. All
all the interactive perspectives we developed can be equally applied to other service technologies (e.g. Django) by simply providing a few back-end adapters in the right places (e.g. endpoint discovery, the \code{/dashboard} endpoint deployment, etc.)
As such, our approach is more generic than the technologies that it relies upon for its implementation.
At the same time, the presented approach also comes with several limitations. Some of these limitations might also represent challenges for the future builders of similar tools.
\subsection*{The Dashboard Overhead}
We have provided a mechanism for measuring performance, and showed a glimpse into the overhead imposed by the first version of the Dashboard. For our case study the overhead is acceptable, but for performance critical applications, it might be too large.
One of the downsides of the current implementation is the fact that the dashboard runs in the same process as the main application and that Python only supports green threads. The performance could be greatly improved if instead the dashboard would run in a different process, %and the information would be posted to it by using a messaging service.
and a message queue was used to allow the decoupling of the monitoring itself from the persistence of measurements to the dashboard DB.
However, that would complicate the installation and configuration of the \tool.
%, and for this kind of architecture, there are also other alternatives.
\subsection*{Limitations of the Measurements}
One limitation of the benchmarking we used in Section~\ref{sec:overhead} is that it does not use a more realistic mix of workload, but rather, measures individual endpoints. It might be that multiple concurrent and diverse endpoints would result into different performance impact. In fact, based on the data in the dashboard one could actually construct a statistically relevant workload mix. At this point, this constitutes future work.
% \subsection*{Limitation of Visualizations }
% The visualizations for the user experiene perspectives as presented in Section \ref{sec:user} have been tested with several hundred users (of which about \activeUserCount were active during the course of the study), but the scalability of the visualizations must be further investigated for web services with tens of thousands of users.
\subsection*{Lack of Flexibility in Allowing Multiple Groupings}
To be agile, a tool has to be easily adaptable to different usage scenarios. However, currently one of the current limitations of \tool is that that one can specify only one type of grouping. This is insufficient for cases where multiple groupings are desirable. In the \zee case study this need emerged in the context in which it would be good to grouping the requests also by the client application that sends the request. This feature is currently under development.
\subsection*{Limitations to Evolution Monitoring}
The advantage of the presented approach to evolution monitoring based on observing the \code{.git} folder is the need for minimal configuration effort, as discussed in the presentation of the \tool. The disadvantage is that it will consider on equal ground the smallest of commits, even one that modifies a comment, and the shortest lived of commits, e.g.~a commit which was active only for a half an hour before a new version with a bug fix was deployed, with major and minor releases of the software. %as a distinct way of grouping the data points.
A mechanism to control which versions are important for monitoring purposes is therefore required to be added to the \tool.
\subsection*{Endpoint Provenance }
The \tool can not detect endpoint renames, and thus the history is normally lost in the case of an API rename. The problem can be approached in a similar way to the {\em provenance problem} in source code evolution analysis)\cite{Davi11a} by inferring that a newly introduced endpoint with similar patterns of request traffic might be the same as one that is not used anymore. Since this would still be a statistical approach, the integration in the tool would have to ask the user for confirmation.
However, this would work better in a system where endpoint tracking is activated by default for all the endpoints. Otherwise, it would require the user to remember to always enable the tracking of the new endpoint.
% \subsection*{Monitoring Failures}
% A Special Type of Outlier: Exceptions. TODO: probably something for the journal extension ...statistics about which endpoints fail most often might be useful too.
% same information as the outlier maybe?
% - tracking other types of information: exceptions, etc. especially since logs usually get lost... nobody has time to look in them...
\subsection*{Performance Prediction}
In Section~\ref{sec:testing} we provided evidence towards our assumption that integration testing can be used for performance triangulation across service versions. The idea is to use a synthetic load based on unit testing to drive endpoints before they are deployed in production and observe whether performance improves or deteriorates on a version by version basis. The next logical step is to use this information for performance prediction purposes. A full blown performance prediction feature would require advanced statistical work, but it can be used as an advanced warning system for the overall performance trend for a given set of endpoints. As discussed in Section~\ref{sec:testing} however further evaluation of this feature is currently ongoing.
\subsection*{Distributed Deployment}
Last but not least, the presented approach and case study assume that the monitored Flask application is deployed as a single node, i.e. without support for horizontal scaling by means of replication~\cite{vaquero2011dynamically} --- or at least that the application owner is interested in monitoring each application deployment in isolation. This is a clear limitation of this work which we are already working towards addressing by allowing multiple application instances to be registered with the same dashboard, essentially providing a ``meta-dashboard'' approach through which application endpoint performance and utilization can be monitored and visualized in a unified way. Due to the addition of an orthogonal to the service evolution dimension that the \tool currently supports, this requires a significant effort of re-engineering on our part.
\subsection*{Need of Further Proof of Usefulness and Ease of Adoptability}
We designed the dashboard to make it easy to adopt and we think this is to a certain degree apparent. However, we have only presented a single use case where the API developers have used the dashboard. It has been successfull, since based on the dashboard, the developers have become aware of the limitations of their service APIs and have started improving it. However, we belive that it will be worthwhile to organize a dedicated study to observe the challenges and opportunities of the adoption of such a dashboard across different applications. It will also be important to see whether the perspectives presented here are relevant for other developers, and also which other perspectives are missing.
% - Supporting situations where the API is deployed across multiple containers for example.
% \ml{@V: Do you want to add this? Or can we skip it?}
% - threats to the validity of the constant-load performance testing. performance regressions?
%
%
% - time as measured in days and commits are two sides of the same coin.
%
% \ml{@V: do you want to give a first stab to this?}
% \subsection*{Advanced Outlier Analysis}
% - We need more advanced outlier analysis tools... Instead of manually searching in the page to build stats about Google vs. Microsoft
% - outlier detection and monitoring is important. manual exploration is critical. however, one needs a performant way of analyzing multiple logs. in the example we saw we had to use the search facility to investigate our hypothesis that Google Translate was slower than Microsoft Translate. Moreover, these trends might change too...
% - the endpoint where the unit-testing results can be uploaded is prone to being tampered with by Bob who would want to upload junk information to mess up with the dashboard. to solve this, a UUID can also be added that is generated by the server, and thus prevents anybody else but the rightful owner of the dashboard
% - user management. although not explained in the paper, the dashboard also comes with a user management system. and two classes of users. admins and guests.
% - detecting minor versions which don't need to be tracked, since they didn't touch the performance of the system. E.g. a modification of the README, etc.
% - discuss later one situation in which opt-out might be desirable.
% - branches. the system does not care. it links to whatever commit is the current one. it could be that visualizations could be developed to comapre branches, but this optiona. future work.
%