-
Notifications
You must be signed in to change notification settings - Fork 809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of 'com_record' as a subclassable Python type. #2437
base: main
Are you sure you want to change the base?
Conversation
This PR addresses issue #2361 and will allow to create subclasses of It does enable typing of interface methods that expect a particular structure as a parameter. To define a particular subclass of
As an example with the following Type Library IDL:
the record class definitions for
The metaclass in the above code is only used to avoid retyping the The subclasses provide valuable information for type hints.
With the above class definitions, new instances of a particular record type can simply be created with e.g.
This PR does not break any existing code because the On the other hand, COM records returned by a call to a COM interface method will receive the proper subclass type, if their RecordInfo does match the GUID of a |
A 'tp_new' method was added to the 'PyTypeObject' slots of 'com_record'. Replacement new is now used to create an instance of a 'com_record' type, i.e. in 'tp_new' as well as in the factory functions. The 'tp_dealloc' method explicitely calls the destructor before finally calling 'tp_free'. Records returned from a call to a COM method do get the subclass type according to the GUID in the COM RecordInfo. If no subclass with that GUID is found, the generic 'com_record' base class is used. The algorithm that retrieves the list of subclasses of 'com_record' only uses methods and data structures of the public Python API. This is important because in Python 3.12 the type of the 'tp_subclasses' slot of a 'PyTypeObject' was changed to 'void*' to indicate that for some types it does not hold a valid 'PyObject*'. The 'PyRecord_Check' function was modified to test if the object is an instance of 'com_record' or a derived type. The implementation does not break existing code. It is not required to define 'com_record' subclasses, i.e. it is possible to work with the generic 'com_record' type as before using the factory function.
0c591f8
to
ebfaa62
Compare
Happy New Year to everybody! I'm wondering if this PR has a chance to get merged or if it is too special for my particular use case? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any good reason to not take something like this once it's in good shape. LEt me know if I missed the point anywhere above, and I agree we'd want good tests for this.
com/win32com/src/PyRecord.cpp
Outdated
if (S_OK != StringFromCLSID(structguid, &guidString)) | ||
return NULL; | ||
// Obtain a copy of the subclasses list to iterate over. | ||
list = PyObject_CallMethod((PyObject *)type, "__subclasses__", NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I like this as IIUC it makes the world non-deterministic, in that how this works will depends on what else happens to have been imported. Another alternative might be a function which allows you to register the subclass by GUID, and store them in a global map of some sort? I guess you could argue that's still non-deterministic, but at least requires explicit calls rather than side-effects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please excuse my ignorance but I'm afraid I don't understand why this does depend on side effects.
Maybe you can point me on it with the help of the following examples.
Let's start with the plain vanilla pythoncom.com_record class:
import pythoncom
TL_CLSID = '{E6F07342-C1F7-4E4E-B021-11BBD54B9F37}'
TL_MJVER = 2
TL_MNVER = 3
TL_LCID = 0
T_ENTITY_CLSID = '{9F2C4E2E-2C5C-4F39-9FDB-840A1E08B165}'
T_LINE_CLSID = '{B1461DD4-8C86-4627-B444-3D833C980111}'
print(pythoncom.com_record.__subclasses__())
[]
and create an instance of the T_LINE COM Record:
line_1 = pythoncom.GetRecordFromGuids(TL_CLSID, TL_MJVER, TL_MNVER, TL_LCID, T_LINE_CLSID)
print(line_1)
print(type(line_1))
com_struct(startX=0.0, startY=0.0, startZ=0.0, endX=0.0, endY=0.0, endZ=0.0)
<class 'com_record'>
If we now define the subclasses of pythoncom.com_record with the appropriate GUIDs for the T_ENTITY and T_LINE COM Records from the TypeLibrary:
class T_ENTITY(pythoncom.com_record):
__slots__ = tuple()
TLBID = TL_CLSID
MJVER = TL_MJVER
MNVER = TL_MNVER
LCID = TL_LCID
GUID = T_ENTITY_CLSID
class T_LINE(pythoncom.com_record):
__slots__ = tuple()
TLBID = TL_CLSID
MJVER = TL_MJVER
MNVER = TL_MNVER
LCID = TL_LCID
GUID = T_LINE_CLSID
print(pythoncom.com_record.__subclasses__())
[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>]
we can do:
line_2 = T_LINE()
print(line_2)
print(type(line_2))
com_struct(startX=0.0, startY=0.0, startZ=0.0, endX=0.0, endY=0.0, endZ=0.0)
<class 'main.T_LINE'>
The type of the COM Record instance line_1, which we had created before the subclasses were defined, of course didn't change:
print(type(line_1))
<class 'com_record'>
However, creating a new T_LINE COM Record instance with the factory function, does now also return an instance of the subclass.
line_3 = pythoncom.GetRecordFromGuids(TL_CLSID, TL_MJVER, TL_MNVER, TL_LCID, T_LINE_CLSID)
print(line_3)
print(type(line_3))
com_struct(startX=0.0, startY=0.0, startZ=0.0, endX=0.0, endY=0.0, endZ=0.0)
<class 'main.T_LINE'>
If we define an invalid COM Record subclass using a GUID that is not contained in the TypeLibrary:
T_BAD_CLSID = '{A50732E7-F546-498B-AF28-E58161F9CA5C}'
class T_BAD(pythoncom.com_record):
__slots__ = tuple()
TLBID = TL_CLSID
MJVER = TL_MJVER
MNVER = TL_MNVER
LCID = TL_LCID
GUID = T_BAD_CLSID
print(pythoncom.com_record.__subclasses__())
[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>, <class 'main.T_BAD'>]
the attempt to create an instance fails with an exception:
bad = T_BAD()
---------------------------------------------------------------------------
com_error Traceback (most recent call last)
----> bad = T_BAD()
com_error: (-2147319765, 'Element nicht gefunden.', None, None)
Of course you are free to define all kinds of dysfunctional subclasses:
class T_STUPID(pythoncom.com_record):
description = 'blabla'
print(pythoncom.com_record.__subclasses__())
[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>, <class 'main.T_BAD'>, <class 'main.T_STUPID'>]
but you cannot instantiate them:
gaga = T_STUPID()
---------------------------------------------------------------------------
SystemError Traceback (most recent call last)
----> gaga = T_STUPID()
SystemError: <class 'main.T_STUPID'> returned NULL without setting an exception
So to define working subclasses of pythoncom.com_record you have to follow the recipe and provide the GUID of the TypeLibrary that containts the COM Record definition, as well as its major and minor version number, plus the LCID and last but not least the GUID of the COM Record type as class variables.
I would argue that the definition of such a subclass is a kind of global registration.
Please excuse if I completely missed the point with the above examples but hopefully they can help to identify where there are side effects or non-determinisms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that these subclasses are presumably going to be in your code rather than in pywin32. Thus:
print(pythoncom.com_record.__subclasses__())
[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>]
Assuming T_ENTITY
/T_LINE
etc are in modules you wrote, then the print above will only generate that output if those modules you wrote were actually imported. Thus, this new capability will implicitly depend on exactly what modules were previously imported and presumably what order they are in when multiple modules try and register the same type.
I would argue that the definition of such a subclass is a kind of global registration.
That's true, but it's implicit and hard to reason about. An explicit registration makes more sense to be because (a) it's easier to locate where these calls are and (b) there's an opportunity to fail if 2 callers try and register the same type. I'm saying that without (b), which type is used is largely non-deterministic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, understood.
Do you think it is necessary to implement the nuts and bolts of this registration in pythoncom or would it be sufficient to keep the implementation as is and just provide an 'official' class-factory function for pythoncom.com_record subclasses that does the registration and checks to prevent double registration?
I mean the whole class machinery of Python itself does not prevent from doing all kinds of weird things and in particular somehow related to your point (b):
class A():
a = 1
class B(A):
b = 2
class B(A):
c = 3
print(A.__subclasses__())
[<class 'main.B'>, <class 'main.B'>]
instantiating B does return an instance of the class that was last defined with this name:
x = B()
print(x.b)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
x = B()
----> print(x.b)
AttributeError: 'B' object has no attribute 'b'
print(x.c)
3
The class registration function could easily be written in Python and use the machinery as implemented.
Of course users could then still exploit the machinery under the hood and invoke the implicit behavior.
However, if you're not using the 'official' method you should know what you do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Python implementation seems fine to me and I'm not bothered that people can arrange to avoid this - it's just the fact that a base class which mediates which subclass is instantiated just smells inverted to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's just the fact that a base class which mediates which subclass is instantiated just smells inverted to me.
The issue for which I couldn't come up with a better solution is how to instantiate the proper pythoncom.com_record subclass for Records returned by a COM method call. Without the 'smelly' part everything was fine as long as the Records were created by client code. However, all Records returned from a COM method call were just generic base class pythoncom.com_record instances. Therefore I introduced this 'identification' logic based on the GUID of the returned Record.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking get_registered_record_class
or similar would hold classes, so the logic around here would be roughly the same - you are still making a call back into Python, but instead of subclasses you would be call the new function with some args. You would then expect the result to be a single class (or None) rather than iterating here - which you'd instantiate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the dictionary 'RecordClasses' to pythoncom.
The function 'PyRecord::new_record' does now check if the COM Record GUID is contained in this dict and uses the value returned for the GUID key as the subclass to instantiate.
The new Python function 'register_record_class' in the client module can be used to populate the dictionary with 'GUID' key / 'com_record subclass' value pairs. It checks that the value to be registered is a proper, instantiable com_record subclass and that the GUID is not already contained in the dictionary thus preventing accidential, potentially conflicting double registration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While working on the tests, I recognized that it might be confusing that you could register a subclass with a name different from the name for that GUID in the TypeLibrary. In general it shouldn't matter what name you give to a struct because it's identity is based on the GUID. However, it might be confusing if you have to use a different name in 'win32com.client.Record' to create the struct and the returned instance is then of a type with the registered name.
There are pros and cons. On the one hand it could be helpful to give COM Record types a more meaningful name with respect to your application than the one from the TypeLibrary. On the other hand it might create inconsistencies when mixing subclass instance creation with the use of 'win32com.client.Record'.
Although one could argue that a user who defines and registers a COM Record type with a different name should be aware of the consequences.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given this a second thought and would value consistency over flexibility.
Therefore I've added a check for the record type name in the 'register_record_class' Python function.
To enable this in a straight forward way I've also added another dunder attribute '__record_type_name__' to the 'PyRecord::getattro' function that retrieves the record type name via 'IRecordInfo::GetName'.
If someone would really like to give the subclass a different name from the name in the TypeLibrary there is still the possibility to bypass the registration function and enter the class directly into the 'pythoncom.RecordClasses' dictionary.
However, using the "official" way everything should be consistent now.
from which a COM Record instance will be created now looks up the Record GUID in a global map, to only create instances of subclasses that have explicitely been registered in this map.
8c0bb0d
to
bb40a88
Compare
'pythoncom.com_record' to the global map 'pythoncom.RecordClasses'. The function does only add instantiable subclasses of 'pythoncom.com_record' to the map if their GUID is not already contained in the map.
bb40a88
to
0f95786
Compare
Python function. Also added another dunder attribute '__record_type_name__' to the 'PyRecord::getattro' function that retrieves the record type name via 'IRecordInfo::GetName'.
Co-authored-by: Avasam <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great! It should be fairly easy to get a couple of basic tests for this in PyCOMTest?
com/win32com/src/PyRecord.cpp
Outdated
@@ -91,7 +94,7 @@ PyObject *PyObject_FromSAFEARRAYRecordInfo(SAFEARRAY *psa) | |||
hr = info->RecordCopy(source_data, this_dest_data); | |||
if (FAILED(hr)) | |||
goto exit; | |||
PyTuple_SET_ITEM(ret_tuple, i, new PyRecord(info, this_dest_data, owner)); | |||
PyTuple_SET_ITEM(ret_tuple, i, PyRecord::new_record(info, this_dest_data, owner)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably add a null check? It should have already been there for new
, but there are many more error conditions now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. The question is where to put this null check?
There are other places where PyRecord::new_record is used without a null check.
Should we instead raise a Python exception in PyRecord::new_record itself in every place where it currently returns a null without raising an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, it will only currently return null when OOM. In this patch we have a more complicated situation - sometimes when it returns null there will be an exception set, whereas OOM will not. Having those OOM cases call PyErr_NoMem()
seems easy, and it looks to me like all of the callers of PyRecord::new_record
will do the right thing if null/false is returned - but what the docs say will not be Ok is PyTuple_SET_ITEM
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that added a check for the record type name in the 'register_record_class' Python function.
'PyRecord::getattro' function that retrieves the record type GUID via 'IRecordInfo::GetGuid'.
'PyTuple_SET_ITEM' in the 'PyObject_FromSAFEARRAYRecordInfo' function. Also enhanced the treatment of NULL return values in several other places in 'PyRecord.cpp'.
ed5fe41
to
cb675ab
Compare
Added my first shot on the tests to PyCOMTest. |
06105f2
to
0cef441
Compare
'PyTuple_SET_ITEM' in the 'PyRecord::getattro' function. Also 'PyRecord::new_record' will now raise a 'PyErr_NoMemory' exception when the call to 'tp_alloc' returns NULL.
0cef441
to
61662ab
Compare
This is looking good to me - would you like to add a changelog entry (if not I'll make sure one gets added)? |
Thank you Mark, I'll add one. In the meantime I spent some time to think about how those subclasses could be initialized and discovered the following. Initially I thought it would be a good idea to provide the constructor functionality also for the base class com_record in addition to the Record factory function. The result is that the constructor of the base class requires more parameters than the one of the subclasses which becomes a problem for implementing a method for the tp_init slot. Moreover it is a little inconsistent that with the current implementation, the com_record constructor returns either an instance of the com_record base class or an instance of the matching subclass, in case the latter was registered. Therefore I would like to propose, to keep the base class com_record only instantiable via the Record factory function and raise an exception if the constructor is called. In addition I would also raise an exception when the constructor of a subclass is used but this subclass has not been registered. Currently the constructor would return an instance of the base class com_record which is virtually the inverse of the behavior of com_record. I think that this would implement a more deterministic behavior. I'll add the proposed changes to the discussion of this PR shortly. |
instance initialization. Also changed the com_record base class to be only instantiable via the Record factory function and subclasses of com_record only if they are registered. Modified the test in 'testPyComTest.py' to reflect this. Added a test for the initialization of com_record subclasses.
44fdad2
to
aed5d91
Compare
Added the initialization functionality for com_record subclasses and added a test. |
I'm wondering if this functionality and what's required to successfully register com_record subclasses should be further documented? I could write a short explanation but I'm not familiar what would be required to included it in the documentation. |
@mhammond Do you agree with these changes? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking great!
@@ -209,35 +209,38 @@ PyObject *pythoncom_GetRecordFromTypeInfo(PyObject *self, PyObject *args) | |||
// This function creates a new 'com_record' instance with placement new. | |||
// If the particular Record GUID belongs to a registered subclass | |||
// of the 'com_record' base type, it instantiates this subclass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this comment needs an update for the param. Or maybe a new function PyRecord::new_record_with_type
taking the extra param, which helps move the type-finding out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tp_new method is the only code path that would use such a PyRecord::new_record_with_type
function and it would duplicate code that's already in the PyRecord::new_record
function. Therefore I would prefer to keep this extra parameter which shortcuts the type identification procedure and just update the comment.
com/win32com/src/PyRecord.cpp
Outdated
return NULL; | ||
if (!PyWinObject_AsIID(obGuid, &guid)) | ||
} | ||
if (!PyWinObject_AsIID(guidUnicode, &infoGuid)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this overwriting the more useful inner error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention was to provide an indication if the problem is with the Record GUID or with the TLBID which gets lost when just using the inner error. The latter does also not tell much more in case of a error returned from COM which just says that it's an invalid class identifier.
The best would probably be to follow the Python pattern of raise SomeException() from existing_exception
but it seems to be not that straight forward on the C-API side to chain exceptions according to this Stackoverflow post.
Is there an example somewhere else in the pywin32 code base that I could use to handle this in a better way?
I'm also wondering if I should issue a Py_ErrClear()
before raising my exceptions?
I'm sorry but I don't have much experience with this on the C-API side.
ret_tuple = NULL; | ||
goto array_end; | ||
} | ||
PyTuple_SET_ITEM(ret_tuple, i, rec); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, now I look again, this shouldn't even be a tuple 😅 . No need to change that now though.
@@ -2248,6 +2256,10 @@ PYWIN_MODULE_INIT_FUNC(pythoncom) | |||
PyType_Ready(&PyRecord::Type) == -1) | |||
PYWIN_MODULE_INIT_RETURN_ERROR; | |||
|
|||
// Add the PyRecord type as a module attribute | |||
if (PyModule_AddObject(module, "com_record", (PyObject *)&PyRecord::Type) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it have "type" in the name? (I've no idea about idiomatic python tbh!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, in the example code on python.org it has the same name as in the tp_name slot. So the module attribute is
pythoncom.com_record
<class 'com_record'>
and the type of an instance is
type(instance)
<class 'com_record'>
which, I would say, makes sense.
Although more precisely, in the example code, the tp_name slot also specifies the module name of the new type.
However, I didn't want to change this to PYWIN_OBJECT_HEAD "pythoncom.com_record"
because I didn't want to break any existing code.
Of course the name of the module attribute could be changed to com_record_type
which would only make a difference in one place as far as I understand:
pythoncom.com_record_type
<class 'com_record'>
Up to you.
A 'tp_new' method was added to the 'PyTypeObject' slots of 'com_record'. Replacement new is now used to create an instance of a 'com_record' type, i.e. in 'tp_new' as well as in the factory functions. The 'tp_dealloc' method explicitely calls the destructor before finally calling 'tp_free'.
Records returned from a call to a COM method do get the subclass type according to the GUID in the COM RecordInfo. If no subclass with that GUID is found, the generic 'com_record' base class is used.
The algorithm that retrieves the list of subclasses of 'com_record' only uses methods and data structures of the public Python API. This is important because in Python 3.12 the type of the 'tp_subclasses' slot of a 'PyTypeObject' was changed to 'void*' to indicate that for some types it does not hold a valid 'PyObject*'.
The 'PyRecord_Check' function was modified to test if the object is an instance of 'com_record' or a derived type.
The implementation does not break existing code.
It is not required to define 'com_record' subclasses, i.e. it is possible to work with the generic 'com_record' type as before using the factory function.
Resolves #2361