You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi could you provide some motivations for the distinctions between these classes?
I would like further understanding on why there needs to be the datatype for the former two, and how DXO fits into the picture.
I think the DXO ties in quite closely with what I was talking about in the Data Layer discussion but from my understanding at this point I think the DXO should be a little more abstract, eg subclass it to define to/from bytes.
Then for AnalyticsData and Shareable I'm not totally understanding their utility, as far as I see they are quite thin wrappers that provide a to_dxo functionality, but are they not already basically a DXO?
I would have thought for each data type you have eg. Weights, weights diff, AnalyticsDataType.Image etc would just be a subclass of DXO. Each subclass would then specify how to safely change its specific data type to bytes as well as implement type specific checks and balances.
Additional type specific convenience methods could also be added to these classes eg ModelWeights.todiff(model: ModelWeights) -> ModelDiff or sth. This could handle filters and conversions without having to define separate classes for each that rely on specific implementations, eg PercentilePrivacy currently assumes that you are using a learnable but this may not always be the case.
In this situtation I could conceivably see AnalyticsData or Shareable coming back as a kind of containers of DXOs that are themselves DXOs, these containers could then recursively call their sub DXO's to combine multiple data packets into a single object to send across the wire..however I am not sure about this, since IMO keeping things separate makes an easier job of filtering data across network boundaries, you wouldn't need to load and unpack your bytes DXO to perform filtering
Just my thought process on this matter anyway. What am I missing that motivated the enum/data type pattern?
Shareable is like a message, which will travel from one place to another (server/clients) across network. As analogy, think HTTP message. It has headers and non-headers (content). We didn't have DXO before, and people used Shareable to carry content in many different ways (some are good, some are not so good).
One of the biggest problems in early days is that content in the Shareable is not self-sufficient/descriptive. For example, people just put a dict of numbers in the Shareable without any information about the nature of the data. You just have to know what it is: model weights, or gradients, or whatever, or whether the data is clear-text or encoded. This made it very hard/impossible to write reusable code that handles different cases properly. This motivated the creation of DXO - a structure trying to promote self-sufficient content creation. Essentially it has data portion and meta portion that describes the data. As an analogy, you can think of JPEG data to be carried by the HTTP message. Conceptually, JPEG is a good example of DXO - it has raw data and meta info about the data.
The analytix.py is just a wrapper that provides convenience methods for creating/using DXOs for analytics-related content. Conceivably you could create any number of such wrappers for special-purpose content kinds.
This discussion was converted from issue #371 on April 05, 2022 13:17.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi could you provide some motivations for the distinctions between these classes?
I would like further understanding on why there needs to be the datatype for the former two, and how
DXO
fits into the picture.I think the
DXO
ties in quite closely with what I was talking about in the Data Layer discussion but from my understanding at this point I think theDXO
should be a little more abstract, eg subclass it to define to/from bytes.Then for
AnalyticsData
andShareable
I'm not totally understanding their utility, as far as I see they are quite thin wrappers that provide ato_dxo
functionality, but are they not already basically aDXO
?I would have thought for each data type you have eg. Weights, weights diff,
AnalyticsDataType.Image
etc would just be a subclass ofDXO
. Each subclass would then specify how to safely change its specific data type to bytes as well as implement type specific checks and balances.Additional type specific convenience methods could also be added to these classes eg
ModelWeights.todiff(model: ModelWeights) -> ModelDiff
or sth. This could handle filters and conversions without having to define separate classes for each that rely on specific implementations, egPercentilePrivacy
currently assumes that you are using a learnable but this may not always be the case.In this situtation I could conceivably see
AnalyticsData
orShareable
coming back as a kind of containers ofDXO
s that are themselvesDXO
s, these containers could then recursively call their sub DXO's to combine multiple data packets into a single object to send across the wire..however I am not sure about this, since IMO keeping things separate makes an easier job of filtering data across network boundaries, you wouldn't need to load and unpack your bytes DXO to perform filteringJust my thought process on this matter anyway. What am I missing that motivated the enum/data type pattern?
Beta Was this translation helpful? Give feedback.
All reactions