AnalyticsData, Shareable, DXO #391

Nintorac · 2022-03-30T23:34:01Z

Nintorac
Mar 30, 2022

Hi could you provide some motivations for the distinctions between these classes?

I would like further understanding on why there needs to be the datatype for the former two, and how DXO fits into the picture.

I think the DXO ties in quite closely with what I was talking about in the Data Layer discussion but from my understanding at this point I think the DXO should be a little more abstract, eg subclass it to define to/from bytes.

Then for AnalyticsData and Shareable I'm not totally understanding their utility, as far as I see they are quite thin wrappers that provide a to_dxo functionality, but are they not already basically a DXO?

I would have thought for each data type you have eg. Weights, weights diff, AnalyticsDataType.Image etc would just be a subclass of DXO. Each subclass would then specify how to safely change its specific data type to bytes as well as implement type specific checks and balances.

Additional type specific convenience methods could also be added to these classes eg ModelWeights.todiff(model: ModelWeights) -> ModelDiff or sth. This could handle filters and conversions without having to define separate classes for each that rely on specific implementations, eg PercentilePrivacy currently assumes that you are using a learnable but this may not always be the case.

In this situtation I could conceivably see AnalyticsData or Shareable coming back as a kind of containers of DXOs that are themselves DXOs, these containers could then recursively call their sub DXO's to combine multiple data packets into a single object to send across the wire..however I am not sure about this, since IMO keeping things separate makes an easier job of filtering data across network boundaries, you wouldn't need to load and unpack your bytes DXO to perform filtering

Just my thought process on this matter anyway. What am I missing that motivated the enum/data type pattern?

yanchengnv · 2022-04-01T16:00:11Z

yanchengnv
Apr 1, 2022
Maintainer

The simplest way to think of these:

Shareable is like a message, which will travel from one place to another (server/clients) across network. As analogy, think HTTP message. It has headers and non-headers (content). We didn't have DXO before, and people used Shareable to carry content in many different ways (some are good, some are not so good).
One of the biggest problems in early days is that content in the Shareable is not self-sufficient/descriptive. For example, people just put a dict of numbers in the Shareable without any information about the nature of the data. You just have to know what it is: model weights, or gradients, or whatever, or whether the data is clear-text or encoded. This made it very hard/impossible to write reusable code that handles different cases properly. This motivated the creation of DXO - a structure trying to promote self-sufficient content creation. Essentially it has data portion and meta portion that describes the data. As an analogy, you can think of JPEG data to be carried by the HTTP message. Conceptually, JPEG is a good example of DXO - it has raw data and meta info about the data.
The analytix.py is just a wrapper that provides convenience methods for creating/using DXOs for analytics-related content. Conceivably you could create any number of such wrappers for special-purpose content kinds.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AnalyticsData, Shareable, DXO #391

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

AnalyticsData, Shareable, DXO #391

Nintorac Mar 30, 2022

Replies: 1 comment

yanchengnv Apr 1, 2022 Maintainer

Nintorac
Mar 30, 2022

yanchengnv
Apr 1, 2022
Maintainer