-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patharticles.xml
3427 lines (3355 loc) · 405 KB
/
articles.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Articles</title><link href="https://www.psycopg.org/articles/" rel="alternate"></link><link href="https://www.psycopg.org/articles.xml" rel="self"></link><id>urn:uuid:1f583092-c4bb-3a77-9f6f-c8ac41335aab</id><updated>2024-09-23T00:00:00Z</updated><author><name></name></author><entry><title>Automatic async to sync code conversion</title><link href="https://www.psycopg.org/articles/2024/09/23/async-to-sync/" rel="alternate"></link><updated>2024-09-23T00:00:00Z</updated><author><name>Daniele Varrazzo</name></author><id>urn:uuid:765ce6dc-afc5-34ff-bf78-d5f78cc8aed5</id><content type="html"><p>Psycopg 3 provides both a sync and an async Python interface: for each object
used to perform I/O operations, such as <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/connections.html#psycopg.Connection">Connection</a>, <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/cursors.html#psycopg.Cursor">Cursor</a>, there is an
async counterpart: <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/connections.html#psycopg.AsyncConnection">AsyncConnection</a>, <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/cursors.html#psycopg.AsyncCursor">AsyncCursor</a>, with an intuitive
interface: just add the right <tt class="docutils literal">async</tt> or <tt class="docutils literal">await</tt> keyword where needed:</p>
<pre class="code python literal-block">
<span class="c1"># Familiar sync code</span><span class="w">
</span><span class="n">conn</span> <span class="o">=</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">Connection</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">cur</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;select now()&quot;</span><span class="p">)</span><span class="w">
</span><span class="nb">print</span><span class="p">(</span><span class="n">cur</span><span class="o">.</span><span class="n">fetchone</span><span class="p">()[</span><span class="mi">0</span><span class="p">])</span><span class="w">
</span><span class="c1"># Similar async code</span><span class="w">
</span><span class="n">aconn</span> <span class="o">=</span> <span class="k">await</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">AsyncConnection</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">acur</span> <span class="o">=</span> <span class="k">await</span> <span class="n">aconn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;select now()&quot;</span><span class="p">)</span><span class="w">
</span><span class="nb">print</span><span class="p">((</span><span class="k">await</span> <span class="n">acur</span><span class="o">.</span><span class="n">fetchone</span><span class="p">())[</span><span class="mi">0</span><span class="p">])</span>
</pre>
<p>The decision to provide both sync and async code <a class="reference external" href="https://www.varrazzo.com/blog/2020/03/26/psycopg3-first-report/">was made early in the
development of Psycopg 3</a> and most of the internal code is written in a way
to be compatible with both sync and async code, in order to keep code
duplication to a minimum. This was achieved by making all the libpq
communication async, and writing the network code as generators,
<tt class="docutils literal">yield</tt>ing at the moment they need to wait, isolating the differences in
the sync/async wait policy all in <tt class="docutils literal">wait()</tt> functions.</p>
<p>This helped to minimise the async/sync differences in the code related to the
communication between PostgreSQL and Psycopg. However, the interface between
Psycopg and the Python user is still a lot to maintain, and consists of a lot
of code that is very similar, almost duplicated, between the sync and async
sided. Apart from the obvious <tt class="docutils literal">async</tt>/<tt class="docutils literal">await</tt> keywords, there would be
subtle implementation differences, for example:</p>
<ul class="simple">
<li>using <tt class="docutils literal">asyncio</tt> functions instead of blocking counterparts, for instance
<tt class="docutils literal">await asyncio.sleep()</tt> instead of <tt class="docutils literal">time.sleep()</tt>;</li>
<li><tt class="docutils literal">asyncio.create_task(f(arg1, arg2))</tt> is similar to <tt class="docutils literal">thread.Thread(f,
(arg1, <span class="pre">arg2)).start()</span></tt>;</li>
<li><tt class="docutils literal">threading.Event</tt> has a <tt class="docutils literal">asyncio.Event</tt> counterpart whose <tt class="docutils literal">lock()</tt>
method doesn't have a <tt class="docutils literal">timeout</tt>, parameter, so <tt class="docutils literal">event.wait(timeout=10)</tt>
needs to be rewritten as <tt class="docutils literal"><span class="pre">asyncio.wait_for(event.wait(),</span> timeout=10)</tt>.</li>
</ul>
<p>Up until Psycopg 3.1, the two variants of each object were kept in sync
manually. Every time changes were made on the sync side, they had to be ported
to the async side, with cumbersome and noisy diffs, with subtle differences
being introduced from time to time. Even the tests were pretty much duplicated
(with some sync tests being accidentally lost on the async side, or vice
versa). It seemed like a situation that could have been improved.</p>
<div class="section" id="this-is-so-boring-that">
<h2>This is so boring that...</h2>
<p>...a computer should do it for me instead.</p>
<p>Writing the async side starting from the sync side? Actually, the opposite. It
is obvious that the async side has more information than the sync side (every
method definition and call clearly indicates whether it will block or not) and
most of the differences are minimal and repetitive. What we want then is <em>a
script that takes asyncio-based source code as input and outputs
equivalent sync code</em>.</p>
<p>This article describes what we did to implement such a script and how we used
it for the initial transformation (replacing manually written sync code with
auto-generated code without loss of quality) and how we are currently using it
to maintain the Psycopg 3 codebase.</p>
</div>
<div class="section" id="abstract-syntax-tree">
<h2>Abstract Syntax Tree</h2>
<p>You would be tempted to write a bunch of regular expressions to just scrub
away every <tt class="docutils literal">async</tt> and <tt class="docutils literal">await</tt> keyword found, but the source code is
probably the wrong level to attack the problem: Python knows how to parse
Python itself well and can allow us to reason at a higher level.</p>
<p>A better tool to work with is the <a class="reference external" href="https://en.wikipedia.org/wiki/Abstract_syntax_tree">Abstract Syntax Tree</a> (AST): an in-memory
representation of the code obtained after parsing. At this level we manipulate
objects that represent &quot;the for loop&quot;, or &quot;the function call&quot;, and we are not
fooled by unexpected spaces, extra brackets, comments, literal strings, and
other traps.</p>
<p>The <a class="reference external" href="https://docs.python.org/3/library/ast.html">Python 'ast' module</a> is the obvious starting point: if you have a bit
of source code such as:</p>
<pre class="code python literal-block">
<span class="kn">import</span><span class="w"> </span><span class="nn">asyncio</span><span class="w">
</span><span class="k">async</span> <span class="k">def</span><span class="w"> </span><span class="nf">async_square</span><span class="p">(</span><span class="n">n</span><span class="p">):</span><span class="w">
</span> <span class="c1"># Squares are slow</span><span class="w">
</span> <span class="k">await</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span><span class="w">
</span> <span class="k">return</span> <span class="n">n</span> <span class="o">*</span> <span class="n">n</span>
</pre>
<p>you can pass it to the module to see the AST tree that represents it:</p>
<pre class="literal-block">
$ python -m ast ast1.py
<strong>Module</strong>(
body=[
<strong>Import</strong>(
names=[
alias(name='asyncio')]),
<strong>AsyncFunctionDef</strong>(
name='async_square',
args=arguments(
args=[
arg(arg='n')],
defaults=[]),
body=[
Expr(
value=<strong>Await</strong>(
value=<strong>Call</strong>(
func=Attribute(
value=Name(id='asyncio'),
attr='sleep'),
args=[
Constant(value=1)]))),
<strong>Return</strong>(
value=BinOp(
left=Name(id='n'),
op=Mult(),
right=Name(id='n')))],
decorator_list=[],
returns=Name(id='float'))])
</pre>
<p>You can see, highlighted, the nodes in the tree representing the main
statements in the code: the tree represents a <em>module</em>, whose body contains two
<em>statements</em> - an <tt class="docutils literal">import</tt> and an <tt class="docutils literal">async def</tt> - with the function body
defining an <tt class="docutils literal">await</tt> call and a <tt class="docutils literal">return</tt> statement.</p>
<p>The same <tt class="docutils literal">ast</tt> module can perform the reverse transformation, converting an
AST tree back to source:</p>
<pre class="literal-block">
$ python -c &quot;import ast; print(ast.unparse(ast.parse(open('ast1.py').read())))&quot;
import asyncio
async def square(n: float) -&gt; float:
await asyncio.sleep(1)
return n * n
</pre>
<p>As you can see, the transformation back to code is unfortunately not a perfect
reconstruction of the original code, it is only <em>equivalent</em>, with missing
comments and different spacing. This is because the syntax tree is <em>abstract</em>
and whitespaces and comments don't affect it. If you wanted to take those
details into account you would need a <em>concrete</em> syntax tree (<a class="reference external" href="https://pypi.org/project/libcst/">something like
that exists</a>, but I haven't played with it).</p>
<p>Changing whitespaces is not a problem, but losing comments can be, especially
when they are used to control linters (such as Flake8's <tt class="docutils literal">noqa</tt> or Mypy's
<tt class="docutils literal">type: ignore</tt>), or simply when you happen to be a human being and want to
read the source code. Fortunately there is a simple wrapper module,
<a class="reference external" href="https://pypi.org/project/ast-comments/">ast-comments</a>, which does exactly what it says on the tin: it introduces
<tt class="docutils literal">Comment</tt> nodes as part of an AST. Playing around with it, it turned out to
be a good compromise between an abstract and a concrete syntax tree, after
some taming of the comments placement.</p>
</div>
<div class="section" id="du-ast-mich">
<h2>Du AST Mich</h2>
<p>To perform the code transformation, we will walk over the abstract
syntax tree and we will perform some operation to generate a different tree of
our liking. Typically, this type of operation is performed using an
implementation of the <a class="reference external" href="https://en.wikipedia.org/wiki/Visitor_pattern">visitor pattern</a>.</p>
<p>This pattern can be incredibly useful whenever you need to perform operations
on data structures composed of heterogeneous nodes (I've seen it in
applications ranging from converting UML representations to code, converting
markup language to HTML, converting Kubernetes manifests to Helm charts,
converting annotated lyrics files to Ukulele tab sheets...); unfortunately
many of the descriptions of the pattern you can find online fail to make its
brilliance immediately apparent (the Wikipedia page is pretty bad at it)
because they have historically focused on solving the <em>double-dispatch</em>
problem in static languages like as C++ or Java (which is trivial in a dynamic
language like Python) rather than focusing on the <strong>awesome</strong> things you
can do with it.</p>
<p>In a nutshell, you will have an object that traverses an input data structure,
element by element, building an output structure in the process, allowing you
to run different code and to perform different manipulations depending on the
type of element being traversed.</p>
<p>In our case, both the input and the output are AST trees, which will happen to
be very similar to each other (since we are just trying to translate some
subtle differences from one Python module to another): for many nodes, the
visitor will just output a copy of it (for example, the <tt class="docutils literal">return</tt> statement
in the above example is unchanged). But, if we see a pattern of interest, we
can tell our visitor to produce a different node.</p>
<p>The <tt class="docutils literal">ast</tt> module provides a base class <a class="reference external" href="https://docs.python.org/3/library/ast.html#ast.NodeTransformer">ast.NodeTransformer</a> which
implements the node traversal and tree production parts. By itself it
doesn't perform any operations on the nodes, so it just produces a copy of the
input tree. However, by subclassing the class and adding visit methods, you
can implement node-specific transformations.</p>
<p>With the AST node transformer, the method called is based on the name of the
node being visited; for example, if you add a method called <tt class="docutils literal">visit_Import</tt>
to your subclass, the visitor will call it whenever it traverses an <tt class="docutils literal">Import</tt>
node, giving you the chance to manipulate an <tt class="docutils literal">import</tt> statement. You can
then decide whether you want to change some of the details of the node (drop
some imports, change some names), or replace the node with something completely
different (such as replacing an async function definition with a sync one).</p>
<p>Let's say that we want to produce a sync version of the above script: the
differences should be the following:</p>
<pre class="code diff literal-block">
<span class="gu">&#64;&#64; -1,7 +1,7 &#64;&#64;</span><span class="w">
</span><span class="gd">-import asyncio</span><span class="w">
</span><span class="gi">+import time</span><span class="w">
</span><span class="gd">-async def async_square(n: float) -&gt; float:</span><span class="w">
</span><span class="gi">+def square(n: float) -&gt; float:</span><span class="w">
</span> # Squares are slow<span class="w">
</span><span class="gd">- await asyncio.sleep(1)</span><span class="w">
</span><span class="gi">+ time.sleep(1)</span><span class="w">
</span> return n * n
</pre>
<p>In our toy example, we want to convert the <tt class="docutils literal">asyncio</tt> module into the <tt class="docutils literal">time</tt>
module (which is obviously not the right thing to do in the general case, but
let's keep the example simple). The following script implements the
transformation and prints the converted module:</p>
<pre class="code python literal-block">
<span class="kn">import</span><span class="w"> </span><span class="nn">ast</span><span class="w">
</span><span class="k">class</span><span class="w"> </span><span class="nc">MyTransformer</span><span class="p">(</span><span class="n">ast</span><span class="o">.</span><span class="n">NodeTransformer</span><span class="p">):</span><span class="w">
</span> <span class="k">def</span><span class="w"> </span><span class="nf">visit_Import</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span><span class="w">
</span> <span class="k">for</span> <span class="n">alias</span> <span class="ow">in</span> <span class="n">node</span><span class="o">.</span><span class="n">names</span><span class="p">:</span><span class="w">
</span> <span class="k">if</span> <span class="n">alias</span><span class="o">.</span><span class="n">name</span> <span class="o">==</span> <span class="s2">&quot;asyncio&quot;</span><span class="p">:</span><span class="w">
</span> <span class="n">alias</span><span class="o">.</span><span class="n">name</span> <span class="o">=</span> <span class="s2">&quot;time&quot;</span><span class="w">
</span> <span class="k">return</span> <span class="n">node</span><span class="w">
</span><span class="k">with</span> <span class="nb">open</span><span class="p">(</span><span class="s2">&quot;ast1.py&quot;</span><span class="p">)</span> <span class="k">as</span> <span class="n">f</span><span class="p">:</span><span class="w">
</span> <span class="n">tree</span> <span class="o">=</span> <span class="n">ast</span><span class="o">.</span><span class="n">parse</span><span class="p">(</span><span class="n">f</span><span class="o">.</span><span class="n">read</span><span class="p">())</span><span class="w">
</span><span class="n">tree</span> <span class="o">=</span> <span class="n">MyTransformer</span><span class="p">()</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">tree</span><span class="p">)</span><span class="w">
</span><span class="nb">print</span><span class="p">(</span><span class="n">ast</span><span class="o">.</span><span class="n">unparse</span><span class="p">(</span><span class="n">tree</span><span class="p">))</span>
</pre>
<p>The script will print the new source, with an <tt class="docutils literal">import time</tt> replacing the
original <tt class="docutils literal">import asyncio</tt>.</p>
<p>Changing the <tt class="docutils literal">async</tt> call is a bit trickier: we want to change the
highlighted parts of the original tree:</p>
<pre class="literal-block">
<strong>value=Await</strong>( &lt;&lt; this node must be dropped, replaced by its <strong>value</strong>
value=Call(
func=Attribute(
value=Name(id='<strong>asyncio</strong>'), &lt;&lt; we want <strong>time</strong> here
attr='sleep'),
args=[
Constant(value=1)],
keywords=[]))),
</pre>
<p>Adding the following two methods to the above class will implement what has
been described.</p>
<pre class="code python literal-block">
<span class="k">def</span><span class="w"> </span><span class="nf">visit_Await</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span><span class="w">
</span> <span class="n">new_node</span> <span class="o">=</span> <span class="n">node</span><span class="o">.</span><span class="n">value</span> <span class="c1"># drop the node, continue to operate on the value</span><span class="w">
</span> <span class="bp">self</span><span class="o">.</span><span class="n">visit</span><span class="p">(</span><span class="n">new_node</span><span class="p">)</span><span class="w">
</span> <span class="k">return</span> <span class="n">new_node</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">visit_Call</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">node</span><span class="p">):</span><span class="w">
</span> <span class="k">match</span> <span class="n">node</span><span class="o">.</span><span class="n">func</span><span class="p">:</span><span class="w">
</span> <span class="k">case</span> <span class="n">ast</span><span class="o">.</span><span class="n">Attribute</span><span class="p">(</span><span class="n">value</span><span class="o">=</span><span class="n">ast</span><span class="o">.</span><span class="n">Name</span><span class="p">(</span><span class="nb">id</span><span class="o">=</span><span class="s2">&quot;asyncio&quot;</span><span class="p">),</span> <span class="n">attr</span><span class="o">=</span><span class="s2">&quot;sleep&quot;</span><span class="p">):</span><span class="w">
</span> <span class="n">node</span><span class="o">.</span><span class="n">func</span><span class="o">.</span><span class="n">value</span><span class="o">.</span><span class="n">id</span> <span class="o">=</span> <span class="s2">&quot;time&quot;</span><span class="w">
</span> <span class="k">return</span> <span class="n">node</span>
</pre>
<p>To make sense of how these methods operate on their input nodes, and then to
implement your own transformations, you can always look at the output of
<tt class="docutils literal">python <span class="pre">-m</span> ast</tt> in order to see the attributes on each node and how they are
nested.</p>
<p>The <tt class="docutils literal">visit_Call</tt> method shows how the <a class="reference external" href="https://peps.python.org/pep-0636/">structural pattern matching</a>
introduced in Python 3.10 comes in handy for this operation. The method is
called for each function call found in the input code; checking whether the
one just received requires any manipulation would have involved a cascade of
ifs (the value is a <tt class="docutils literal">Name</tt>, its id is <tt class="docutils literal">asyncio</tt>, the attr is <tt class="docutils literal">sleep</tt>...)
which becomes pretty ugly pretty quickly, whereas instead a <tt class="docutils literal">match</tt>
statement can describe a complex nested test very succinctly.</p>
</div>
<div class="section" id="problems-with-sleep">
<h2>Problems with <tt class="docutils literal">sleep()</tt></h2>
<p>Performing the transformation from <tt class="docutils literal">asyncio.sleep</tt> to <tt class="docutils literal">time.sleep</tt> for
real is much more complex than this. What if our source includes <tt class="docutils literal">from asycio
import sleep, Event</tt>? We would have to split the import into several parts:</p>
<pre class="code python literal-block">
<span class="kn">from</span><span class="w"> </span><span class="nn">time</span><span class="w"> </span><span class="kn">import</span> <span class="n">sleep</span><span class="w">
</span><span class="kn">from</span><span class="w"> </span><span class="nn">threading</span><span class="w"> </span><span class="kn">import</span> <span class="n">Event</span>
</pre>
<p>and the latter should be treated differently later because the two <tt class="docutils literal">Event</tt>
objects have a different <tt class="docutils literal">wait()</tt> signatures.</p>
<p>To help with this operation, in Psycopg 3 we introduced <a class="reference external" href="https://github.com/psycopg/psycopg/blob/d13137aacb82fed79459a9dd487846a2ec972571/psycopg/psycopg/_acompat.py">an internal
'_acompat' module</a> (actually <a class="reference external" href="https://github.com/psycopg/psycopg/blob/d13137aacb82fed79459a9dd487846a2ec972571/psycopg_pool/psycopg_pool/_acompat.py">two</a>, because the pool is released
separately and uses different functions; actually <a class="reference external" href="https://github.com/psycopg/psycopg/blob/d13137aacb82fed79459a9dd487846a2ec972571/tests/acompat.py">three</a>, because the tests
also have their own...) to expose pairs of functions or objects that should be
used alternatively in sync or in async mode.</p>
<p>For example we can solve the <tt class="docutils literal">sleep()</tt> problem with:</p>
<pre class="code python literal-block">
<span class="c1"># module _acompat.py</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="nn">time</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="nn">asyncio</span><span class="w">
</span><span class="n">sleep</span> <span class="o">=</span> <span class="n">time</span><span class="o">.</span><span class="n">sleep</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">asleep</span><span class="p">(</span><span class="n">seconds</span><span class="p">:</span> <span class="nb">float</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Coroutine</span><span class="p">[</span><span class="n">Any</span><span class="p">,</span> <span class="n">Any</span><span class="p">,</span> <span class="kc">None</span><span class="p">]:</span><span class="w">
</span><span class="sd">&quot;&quot;&quot;
Equivalent to asyncio.sleep(), converted to time.sleep() by async_to_sync.
&quot;&quot;&quot;</span><span class="w">
</span> <span class="k">return</span> <span class="n">asyncio</span><span class="o">.</span><span class="n">sleep</span><span class="p">(</span><span class="n">seconds</span><span class="p">)</span>
</pre>
<p>Now it's easy to use <tt class="docutils literal">from ._acompat import asleep; await asleep(1)</tt> and do
some simple name substitutions in the AST: the resulting statement <tt class="docutils literal">from
._acompat import sleep; sleep(1)</tt> will work as expected.</p>
<p>Other goodies we have implemented to help unify async and sync code are
<tt class="docutils literal">aspawn</tt>/<tt class="docutils literal">spawn</tt> and <tt class="docutils literal">agather</tt>/<tt class="docutils literal">gather</tt> to unify threads and
asyncio tasks creation, <tt class="docutils literal">alist()</tt> to encapsulate <tt class="docutils literal">[x for x in await
iterable]</tt> in a way that can be easily converted to <tt class="docutils literal">list(iterable)</tt> and
many other helpers to smooth the transition.</p>
</div>
<div class="section" id="when-everything-else-fail">
<h2>When everything else fail</h2>
<p>There may be parts of the codebase where the difference between sync and async
versions is too difficult to handle in a practical way, and is not worth to
put together a complex matching for a complex, one-off case. What we want is a
simple &quot;if async, do this, else do that&quot;.</p>
<p>We have solved this problem by using a pattern like:</p>
<pre class="code python literal-block">
<span class="k">if</span> <span class="kc">True</span><span class="p">:</span> <span class="c1"># ASYNC</span><span class="w">
</span> <span class="n">foo</span><span class="p">()</span><span class="w">
</span><span class="k">else</span><span class="p">:</span><span class="w">
</span> <span class="n">bar</span><span class="p">()</span>
</pre>
<p>The AST with this code, including the comments, looks like:</p>
<pre class="literal-block">
Module(
body=[
<strong>If(</strong>
<strong>test=Constant(value=True),</strong>
body=[
<strong>Comment(value='# ASYNC', inline=True),</strong>
Expr(
value=Call(
func=Name(id='foo'),
args=[],
keywords=[]))],
orelse=[
Expr(
value=Call(
func=Name(id='bar'),
args=[],
keywords=[]))])],
type_ignores=[])
</pre>
<p><a class="reference external" href="https://github.com/psycopg/psycopg/blob/d13137aacb82fed79459a9dd487846a2ec972571/tools/async_to_sync.py#L253-L262">Our transformation</a> will find the <tt class="docutils literal">ASYNC</tt> comment: in this case it will
simply discard the if side of the condition, as well as the <tt class="docutils literal">if</tt> itself, and
will leave only the <tt class="docutils literal">else</tt> branch in the sync code, allowing you to discard
unneeded imports or other code that would simply be invalid in the sync
context.</p>
<p>This pattern is also efficient, because the Python compiler is able to
recognise that <tt class="docutils literal">if True</tt> will always take the first branch, so it will
discard the test and the code in the <tt class="docutils literal">else</tt> branch. The <a class="reference external" href="https://docs.python.org/3/library/dis.html">dis</a>(assembler)
module shows no jump and that no reference to the <tt class="docutils literal">bar()</tt> function call:</p>
<pre class="literal-block">
$ python -m dis ast3.py
1 0 NOP
2 2 LOAD_NAME 0 (foo)
4 CALL_FUNCTION 0
6 POP_TOP
8 LOAD_CONST 1 (None)
10 RETURN_VALUE
</pre>
</div>
<div class="section" id="conversion-methodology">
<h2>Conversion methodology</h2>
<p>Once we have our conversion script, how do we use it to actually convert the
code base, making sure to not break it? The process, for us, was
iterative: going module by module and adding features to the script until
all the &quot;duplicated&quot; modules were complete.</p>
<p>For each module to be converted, the procedure was roughly as follows.</p>
<p>First step: refactoring the code with the intention of not changing any
behaviour, but of making the async module as similar as possible to the sync
module. This might have meant some code reorganisation, the renaming of some
variables, the swapping of some function definitions, the rediscovery of some
forgotten skeletons and a chance of giving them a proper burial.</p>
<p>Often we would have implemented some non-I/O related helper function on the
sync side and imported it on the async side:</p>
<pre class="code python literal-block">
<span class="c1"># connection.py</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">clean_up_conninfo</span><span class="p">(</span><span class="n">conninfo</span><span class="p">):</span><span class="w">
</span> <span class="o">...</span> <span class="c1"># hack hack</span><span class="w">
</span> <span class="k">return</span> <span class="n">better_conninfo</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">connect</span><span class="p">(</span><span class="n">conninfo</span><span class="p">):</span><span class="w">
</span> <span class="n">better_conninfo</span> <span class="o">=</span> <span class="n">clean_up_conninfo</span><span class="p">(</span><span class="n">conninfo</span><span class="p">)</span><span class="w">
</span> <span class="n">conn</span> <span class="o">=</span> <span class="n">wait</span><span class="p">(</span><span class="n">connection_gen</span><span class="p">(</span><span class="n">bettern_conninfo</span><span class="p">))</span><span class="w">
</span> <span class="k">return</span> <span class="n">conn</span><span class="w">
</span><span class="c1"># connection_async.py</span><span class="w">
</span><span class="kn">from</span><span class="w"> </span><span class="nn">.connection</span><span class="w"> </span><span class="kn">import</span> <span class="n">clean_up_conninfo</span><span class="w">
</span><span class="k">async</span> <span class="k">def</span><span class="w"> </span><span class="nf">connect_async</span><span class="p">(</span><span class="n">conninfo</span><span class="p">):</span><span class="w">
</span> <span class="n">better_conninfo</span> <span class="o">=</span> <span class="n">clean_up_conninfo</span><span class="p">(</span><span class="n">conninfo</span><span class="p">)</span><span class="w">
</span> <span class="n">aconn</span> <span class="o">=</span> <span class="k">await</span> <span class="n">async_wait</span><span class="p">(</span><span class="n">connection_gen</span><span class="p">(</span><span class="n">bettern_conninfo</span><span class="p">))</span><span class="w">
</span> <span class="k">return</span> <span class="n">aconn</span>
</pre>
<p>In this case we would have moved the shared functionality in a separate
internal module and imported the function on both the sides:</p>
<pre class="code python literal-block">
<span class="c1"># _connection.py</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">clean_up_conninfo</span><span class="p">(</span><span class="n">conninfo</span><span class="p">):</span><span class="w">
</span> <span class="o">...</span> <span class="c1"># hack hack</span><span class="w">
</span> <span class="k">return</span> <span class="n">better_conninfo</span><span class="w">
</span><span class="c1"># connection.py</span><span class="w">
</span><span class="kn">from</span><span class="w"> </span><span class="nn">._connection</span><span class="w"> </span><span class="kn">import</span> <span class="n">clean_up_conninfo</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">connect</span><span class="p">(</span><span class="n">conninfo</span><span class="p">):</span><span class="w">
</span> <span class="n">better_conninfo</span> <span class="o">=</span> <span class="n">clean_up_conninfo</span><span class="p">(</span><span class="n">conninfo</span><span class="p">)</span><span class="w">
</span> <span class="n">conn</span> <span class="o">=</span> <span class="n">wait</span><span class="p">(</span><span class="n">connection_gen</span><span class="p">(</span><span class="n">bettern_conninfo</span><span class="p">))</span><span class="w">
</span> <span class="k">return</span> <span class="n">conn</span><span class="w">
</span><span class="c1"># connection_async.py</span><span class="w">
</span><span class="kn">from</span><span class="w"> </span><span class="nn">._connection</span><span class="w"> </span><span class="kn">import</span> <span class="n">clean_up_conninfo</span><span class="w">
</span><span class="k">async</span> <span class="k">def</span><span class="w"> </span><span class="nf">connect_async</span><span class="p">(</span><span class="n">conninfo</span><span class="p">):</span><span class="w">
</span> <span class="n">better_conninfo</span> <span class="o">=</span> <span class="n">clean_up_conninfo</span><span class="p">(</span><span class="n">conninfo</span><span class="p">)</span><span class="w">
</span> <span class="n">aconn</span> <span class="o">=</span> <span class="k">await</span> <span class="n">async_wait</span><span class="p">(</span><span class="n">connection_gen</span><span class="p">(</span><span class="n">bettern_conninfo</span><span class="p">))</span><span class="w">
</span> <span class="k">return</span> <span class="n">aconn</span>
</pre>
<p>Now that the two modules are more similar we can run the test suite to verify
that the library still works and can commit the current state to git.</p>
<p>Second step: run an async -&gt; sync conversion with the current version of the
script. Even running a no-op script is useful: it produces changes that can be
easily seen with <tt class="docutils literal">git diff</tt>, suggesting which conversion feature is missing,
or what further cleanup we could have done in the code to make the sync and
async flavours more similar.</p>
<p>For example, a no-op script that just copies the async side to the sync
side, would show up in <tt class="docutils literal">git diff</tt> as:</p>
<pre class="code diff literal-block">
<span class="gu">&#64;&#64; -1,6 +1,6 &#64;&#64;</span><span class="w">
</span>from ._connection import clean_up_conninfo<span class="w">
</span><span class="gd">-def connect(conninfo):</span><span class="w">
</span><span class="gi">+async def connect_async(conninfo):</span><span class="w">
</span> better_conninfo = clean_up_conninfo(conninfo)<span class="w">
</span><span class="gd">- conn = wait(connection_gen(bettern_conninfo))</span><span class="w">
</span><span class="gd">- return conn</span><span class="w">
</span><span class="gi">+ aconn = await async_wait(connection_gen(bettern_conninfo))</span><span class="w">
</span><span class="gi">+ return aconn</span>
</pre>
<p>The first feature to add to the conversion script is to remove the <tt class="docutils literal">async</tt>
and <tt class="docutils literal">await</tt> keywords. Run the conversion and diff again and you will see:</p>
<pre class="code diff literal-block">
<span class="gu">&#64;&#64; -1,6 +1,6 &#64;&#64;</span><span class="w">
</span>from ._connection import clean_up_conninfo<span class="w">
</span><span class="gd">-def connect(conninfo):</span><span class="w">
</span><span class="gi">+def connect_async(conninfo):</span><span class="w">
</span> better_conninfo = clean_up_conninfo(conninfo)<span class="w">
</span><span class="gd">- conn = wait(connection_gen(bettern_conninfo))</span><span class="w">
</span><span class="gd">- return conn</span><span class="w">
</span><span class="gi">+ aconn = async_wait(connection_gen(bettern_conninfo))</span><span class="w">
</span><span class="gi">+ return aconn</span>
</pre>
<p>The next step is some renaming. If <tt class="docutils literal">connect()</tt> and <tt class="docutils literal">connect_async()</tt>
are public functions we don't want to change their names. The script should
have a name mapping function suggesting to convert:</p>
<ul class="simple">
<li><tt class="docutils literal">connect_async</tt> -&gt; <tt class="docutils literal">connect</tt></li>
<li><tt class="docutils literal">wait_async</tt> -&gt; <tt class="docutils literal">wait</tt></li>
</ul>
<p>Implementing this renaming in the AST we would bring us to the diff:</p>
<pre class="code diff literal-block">
<span class="gu">&#64;&#64; -2,5 +2,5 &#64;&#64;</span><span class="w">
</span>def connect(conninfo):<span class="w">
</span> better_conninfo = clean_up_conninfo(conninfo)<span class="w">
</span><span class="gd">- conn = wait(libpq.connect_async())</span><span class="w">
</span><span class="gd">- return conn</span><span class="w">
</span><span class="gi">+ aconn = async(libpq.connect_async())</span><span class="w">
</span><span class="gi">+ return aconn</span>
</pre>
<p>We are getting there. This remaining <tt class="docutils literal">aconnn</tt>/<tt class="docutils literal">conn</tt> is actually a
gratuitous difference: we can change the async side and call the local
variable <tt class="docutils literal">conn</tt> without losing readability and obviously without changing
any behaviour.</p>
<p>Committing the change on the async side and re-running the conversion would
show no more difference on the sync side. At this point we can commit the
whole project (any remaining but acceptable change on the sync side, the new
features added to the conversion script, new entries in the renaming
mapping...), run the tests to verify that no regression has been introduced,
and move on to the next module.</p>
<p>This operation, in Psycopg 3, started at commit <a class="reference external" href="https://github.com/psycopg/psycopg/commit/765f663f171bf5d5e4862d5c4a5d572b7e3227d8">765f663f</a> and can be seen in
the git history as a parallel branch that was eventually merged in <a class="reference external" href="https://github.com/psycopg/psycopg/commit/8bb0f9bfef945861e8f671fba9073b3fae45c67f">8bb0f9bf</a>.
The <tt class="docutils literal">diff <span class="pre">--stat</span></tt> shows a whopping:</p>
<pre class="literal-block">
99 files changed, 9697 insertions(+), 8486 deletions(-)
</pre>
<p>which is obviously a monster changeset, but mostly consists of incremental
refactorings, conversions, finding new ways to minimise differences. It could
be an interesting ride if you have a project where you need to introduce
a similar automatic conversion.</p>
</div>
<div class="section" id="the-final-result">
<h2>The final result</h2>
<p>Here is the <a class="reference external" href="https://github.com/psycopg/psycopg/blob/3.2.0/tools/async_to_sync.py">Psycopg 3 async to sync conversion script</a> (as of the <a class="reference external" href="https://www.psycopg.org/articles/2024/06/30/psycopg-32-released/">Psycopg
3.2 release</a>). At the time of writing, It processes 27 files and
automatically generates about the 25% of the codebase. Some of the features it
boasts:</p>
<ul>
<li><p class="first">the AST transformations described above, including tricks like recursion
into strings containing code to be transformed, such as Mypy annotations
expressed as strings, adjusting the output and the comments to make the
resulting unparsed code almost as good as the handwritten side;</p>
</li>
<li><p class="first">it inserts non-essential whitespace, and runs <a class="reference external" href="https://black.readthedocs.io/">black</a> on the output, in
order to make the resulting code as uniform as possible to the original and
as good for humans to work with (to read, debug, diff, etc);</p>
</li>
<li><p class="first">since different Python versions may generate different ASTs and different
output code, it can run in a Docker container, whose image is created on the
fly using as base the Python image of the reference version;</p>
</li>
<li><p class="first">it adds a useful disclaimer to the top of the file:</p>
<pre class="code python literal-block">
<span class="c1"># WARNING: this file is auto-generated by 'async_to_sync.py'</span><span class="w">
</span><span class="c1"># from the original file 'connection_async.py'</span><span class="w">
</span><span class="c1"># DO NOT CHANGE! Change the original file instead.</span>
</pre>
</li>
<li><p class="first">it has a &quot;check&quot; mode that runs in Github Action upon every commit, as part
of the lint step, and will fail if it finds any files to convert that haven't
been committed;</p>
</li>
<li><p class="first">the check mode has its own check: if any script containing the above disclaimer
is not included in the list of files to be converted, it will throw an
error (because a converted file has not been added to the automatic
conversion list);</p>
</li>
<li><p class="first">the check of the check also has its own check! If no file with the
disclaimer is found then it means that something is wrong... Maybe the
disclaimer has been rewritten and the check doesn't work anymore;</p>
</li>
<li><p class="first">it can run in parallel and only on the files that have changed. Almost as
good as <tt class="docutils literal">make</tt> (but for certain tasks it is useful to have all the input
files at once, therefore, &quot;better than <tt class="docutils literal">make</tt>&quot;).</p>
</li>
</ul>
<p>The code is specific to the Psycopg 3 codebase and formatting style, so it's
probably not ready to be used as it is in other projects. But it is probably a
good starting point to to your own conversion: change the list of files to
process, the name mapping, and you should be good to start.</p>
<p>Hope this helps. Happy hacking!</p>
</div>
</content></entry><entry><title>Psycopg 3.2 released</title><link href="https://www.psycopg.org/articles/2024/06/30/psycopg-32-released/" rel="alternate"></link><updated>2024-06-30T00:00:00Z</updated><author><name>Daniele Varrazzo</name></author><id>urn:uuid:31cb8a57-f930-3fb0-82c6-4067eb76bd76</id><content type="html"><p>It was quite the ride! But we made it!</p>
<p>After almost two years, 846 commits, more than 700 new tests, more than 20000
changes in 310 files (I didn't even realise that there were 310 files in this
project...) Psycopg 3.2 has been released!</p>
<p>This release brings a few new feature and hopefully no meaningful non-backward
compatible change. The whole list of changes is available <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/news.html#psycopg-3-2">in the changelog</a>; these are
some of the major points explained.</p>
<!-- CUT-HERE -->
<div class="section" id="numpy-scalars-support">
<h2>Numpy scalars support</h2>
<p>In many scientific applications, <a class="reference external" href="https://numpy.org/doc/stable/reference/arrays.scalars.html#built-in-scalar-types">Numpy scalars</a>
are widely used, either by themselves or in conjunction with regular Python
values. However there was no support for storing them to the database and a
conversion to normal Python values was necessary. Starting from Psycopg 3.2
storing Numpy scalars is automatic and the operation efficient.</p>
<p>A natural extension would be to convert between Numpy and PostgreSQL arrays
too. However there hasn't been much demand for the feature, therefore it's
currently <a class="reference external" href="https://github.com/psycopg/psycopg/issues/336">on the back burner</a> but can be implemented if
there is demand.</p>
</div>
<div class="section" id="postgresql-parameters">
<h2>PostgreSQL parameters</h2>
<p>Psycopg uses placeholders such as <tt class="docutils literal">%s</tt> and <tt class="docutils literal">%(name)s</tt> to <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/basic/params.html">pass values to
queries</a>. These
formats are familiar to Python developers, but they are quite foreign in
PostgreSQL environment, because, natively, <a class="reference external" href="https://www.postgresql.org/docs/current/libpq-exec.html#LIBPQ-PQEXECPARAMS">PostgreSQL uses a number-based
placeholder format</a>
(such as <tt class="docutils literal">$1</tt>, <tt class="docutils literal">$2</tt>...) Psycopg, internally, converts the first format
into the second.</p>
<p>It is now possible to execute queries using the PostgreSQL format by using the
<a class="reference external" href="https://www.psycopg.org/psycopg3/docs/advanced/cursors.html#raw-query-cursors">raw query cursors</a>,
which should feel more familiar to PostgreSQL developers and maybe lower the
barrier to convert programs using large bodies of native queries to Python
(the PostgreSQL test suite, maybe?)</p>
<pre class="code python literal-block">
<span class="n">cur</span> <span class="o">=</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">RawCursor</span><span class="p">(</span><span class="n">conn</span><span class="p">)</span><span class="w">
</span><span class="n">cur</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;SELECT ($1 + $2) * $1&quot;</span><span class="p">,</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span><span class="o">.</span><span class="n">fetchone</span><span class="p">()</span><span class="w">
</span><span class="p">(</span><span class="mi">24</span><span class="p">,)</span>
</pre>
</div>
<div class="section" id="scalar-row-factory">
<h2>Scalar row factory</h2>
<p>The example above shows a pretty common annoyance. How many times do you need
a single value from the database and you are returned a tuple?</p>
<p>Psycopg normally emits records as Python tuples; the behaviour can be
customized to return named tuples, dictionaries, or entirely custom objects
with the use of <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/advanced/rows.html">row factories</a>.</p>
<p>In the frequent case of a query returning a single value, the new <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/rows.html#psycopg.rows.scalar_row">scalar_row</a>
factory will return only that:</p>
<pre class="code python literal-block">
<span class="n">cur</span> <span class="o">=</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">RawCursor</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">row_factory</span><span class="o">=</span><span class="n">scalar_row</span><span class="p">)</span><span class="w">
</span><span class="n">cur</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;SELECT ($1 + $2) * $1&quot;</span><span class="p">,</span> <span class="p">[</span><span class="mi">3</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span><span class="o">.</span><span class="n">fetchone</span><span class="p">()</span><span class="w">
</span><span class="mi">24</span>
</pre>
<p>This is not a feature of <tt class="docutils literal">RawCursor</tt> only, but it's independent from the
choice of the cursor class. We just needed to fix the example above!</p>
</div>
<div class="section" id="libpq-17-features">
<h2>Libpq 17 features</h2>
<p>In the upcoming PostgreSQL 17 release, the libpq (the PostgreSQL client
library used internally by Psycopg) has seen an unusually intense activity,
with the introduction of <a class="reference external" href="https://www.postgresql.org/docs/17/release-17.html#RELEASE-17-LIBPQ">several new features</a>.</p>
<p>Our friend Denis Laxalde has been quick to build features and improvements on
top of these new functionalities. So, when Psycopg is used with libpq 17, it
can benefit of features such as:</p>
<ul class="simple">
<li><a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/connections.html#psycopg.Connection.cancel_safe">asynchronous, safe cancellation</a></li>
<li><a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/cursors.html#psycopg.Cursor.stream">chunked stream results</a></li>
<li><a class="reference external" href="https://www.psycopg.org/psycopg3/docs/advanced/prepare.html#pgbouncer">better interaction with PgBouncer</a></li>
</ul>
<p>A new <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/objects.html#psycopg.Capabilities">capabilities object</a>
can help to navigate the differences and to write programs either degrading
gracefully or crashing helpfully if the libpq used doesn't offer a requested
functionality.</p>
</div>
<div class="section" id="easier-interaction-with-notifications">
<h2>Easier interaction with notifications</h2>
<p>Psycopg 3 introduced a <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/advanced/async.html#asynchronous-notifications">notifications generator</a>
to receive asynchronous notification from the database. However the generator
turned out to be... difficult to stop! It could be stopped upon receiving a
specific notification as a message, but, because of Python quirks, not easily
from the rest of the program.</p>
<pre class="code python literal-block">
<span class="kn">import</span><span class="w"> </span><span class="nn">psycopg</span><span class="w">
</span><span class="n">conn</span> <span class="o">=</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="s2">&quot;&quot;</span><span class="p">,</span> <span class="n">autocommit</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="w">
</span><span class="n">conn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;LISTEN mychan&quot;</span><span class="p">)</span><span class="w">
</span><span class="n">gen</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">notifies</span><span class="p">()</span><span class="w">
</span><span class="k">for</span> <span class="n">notify</span> <span class="ow">in</span> <span class="n">gen</span><span class="p">:</span><span class="w">
</span> <span class="nb">print</span><span class="p">(</span><span class="n">notify</span><span class="p">)</span><span class="w">
</span> <span class="c1"># ehm... please kill me!</span>
</pre>
<p>New <tt class="docutils literal">timeout</tt> and <tt class="docutils literal">stop_after</tt> parameters allow for better control of a
notification listening task (often a component of larger applications) and to
provide better ways to control its operations. Such as to kindly tell it that
its services are not requested anymore without having to kill the whole
program!</p>
</div>
<div class="section" id="less-work-for-us">
<h2>Less work for us!</h2>
<p>An interesting internal change has helped us to reduce the amount of code to
write and maintain.</p>
<p>All the Psycopg objects interacting with the network come in two flavours: one
implementing &quot;classic&quot; blocking methods (with which concurrency in a process
can be implemented via multi-threading) and one implementing <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/advanced/async.html#asynchronous-operations">asynchronous
methods</a>
to participate in <a class="reference external" href="https://docs.python.org/3/library/asyncio.html">collaborative concurrency</a>.</p>
<p>Thanks to an early design choice, all the libpq I/O interaction only happens
via asynchronous functions and is shared by both the sync and the async
objects; however the code implementing the outermost objects and highest level
behaviour had to be pretty much almost duplicated, with the same features
implemented almost identically with and without async/await keywords, bugs to
be tested and fixed on two sides...</p>
<p>We have therefore developed an <a class="reference external" href="https://github.com/psycopg/psycopg/blob/3.2.0/tools/async_to_sync.py">async_to_sync conversion tool</a> to
generate the synchronous code starting from the AST of the asynchronous
counterpart. As a result, the 20-25% of the codebase is now automatically
generated and doesn't require specific maintenance. The process of converting
the sync side from hand-written to auto-generated has also highlighted subtle
differences between async and sync behaviours, which have been addressed, and
affects tests too.</p>
<p>The technique could be useful for other projects maintaining both sync and
async code, and is interesting enough to require an article of its own to
be written...</p>
</div>
<div class="section" id="we-need-your-help">
<h2>We need your help!</h2>
<p>Psycopg, first v2, now v3, is the de-facto standard for the communication
between Python and PostgreSQL, two major components of innumerable businesses
and mission-critical infrastructures.</p>
<p>Maintaining such a critical library to the highest standard of reliability,
completeness, performance requires a lot of care and work.</p>
<p>If you are a Python and PostgreSQL user and would like to make sure that the
interface between the two is well maintained and continuously improved, please
consider <a class="reference external" href="https://github.com/sponsors/dvarrazzo">sponsoring the project</a>
and to be one of <a class="reference external" href="https://www.psycopg.org/sponsors/">our sponsors</a> 💜</p>
<p>Thank you very much, happy hacking!</p>
</div>
</content></entry><entry><title>Pipeline mode in Psycopg</title><link href="https://www.psycopg.org/articles/2024/05/08/psycopg3-pipeline-mode/" rel="alternate"></link><updated>2024-05-08T00:00:00Z</updated><author><name>Denis Laxalde</name></author><id>urn:uuid:cc006cca-31a0-34b1-a369-6a1f6b9d3fbd</id><content type="html"><p><a class="reference external" href="https://www.psycopg.org/articles/2022/08/30/psycopg-31-released/">Version 3.1</a> of Psycopg added support for <a class="reference external" href="https://www.postgresql.org/docs/current/libpq-pipeline-mode.html">libpq pipeline mode</a>, bringing
significant performance boost, especially when network latency is important.
In this article, we’ll briefly describe how it works from users’ perspective
and <em>under the hood</em> while also providing a few implementation details.</p>
<!-- CUT-HERE -->
<p>Supporting <a class="reference external" href="https://www.postgresql.org/docs/current/libpq-pipeline-mode.html">libpq pipeline mode</a> involved significant changes to the query
processing logic in the driver. Yet, the challenge was to make it compatible
with the “normal” query mode in order to keep the API almost unchanged and
thus bring performance benefits to users without exposing the complexity of
the batch query mode.</p>
<p>For the impatient, head out to the <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/advanced/pipeline.html">pipeline mode</a> documentation of
Psycopg: it’s self-consistent, explains nicely the details for
client/server communication, as well as how things work from the user’s
perspective.</p>
<div class="section" id="using-the-pipeline-mode-in-psycopg">
<h2>Using the pipeline mode in Psycopg</h2>
<p><tt class="docutils literal">Connection</tt> objects gained a <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/connections.html#psycopg.Connection.pipeline">pipeline()</a> method to enable the
pipeline mode through a context manager (<a class="reference external" href="https://docs.python.org/3/reference/datamodel.html#context-managers">&quot;with&quot; statement</a>); so
using it is as simple as:</p>
<pre class="code python literal-block">
<span class="n">conn</span> <span class="o">=</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">connect</span><span class="p">()</span><span class="w">
</span><span class="k">with</span> <span class="n">conn</span><span class="o">.</span><span class="n">pipeline</span><span class="p">():</span><span class="w">
</span> <span class="c1"># do work</span>
</pre>
</div>
<div class="section" id="what-is-the-pipeline-mode-for">
<h2>What is the pipeline mode for?</h2>
<p><a class="reference external" href="https://www.postgresql.org/docs/14/libpq-pipeline-mode.html#LIBPQ-PIPELINE-TIPS">Postgres documentation</a> contains advices on when the pipeline mode is
useful. One particular case is when the application is doing many write
operations (<tt class="docutils literal">INSERT</tt>, <tt class="docutils literal">UPDATE</tt>, <tt class="docutils literal">DELETE</tt>).</p>
<p>For instance, let’s consider the following schema:</p>
<pre class="code sql literal-block">
<span class="k">CREATE</span><span class="w"> </span><span class="k">TABLE</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="nb">numeric</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="w"> </span><span class="k">timestamp</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="w"> </span><span class="nb">boolean</span><span class="p">)</span>
</pre>
<p>and assume an application does a lot of queries like:</p>
<pre class="code sql literal-block">
<span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">t</span><span class="w"> </span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">d</span><span class="p">,</span><span class="w"> </span><span class="n">p</span><span class="p">)</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="o">%</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="o">%</span><span class="n">s</span><span class="p">,</span><span class="w"> </span><span class="o">%</span><span class="n">s</span><span class="p">)</span>
</pre>
<p>with distinct values. Maybe the application could make use of batch inserts
such as <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/cursors.html#psycopg.Cursor.executemany">executemany()</a>, maybe not (e.g.&nbsp;because it needs to do some other
operations between inserts, like querying another resource): this does not
matter much.</p>
<p>Let’s put this together into a little <tt class="docutils literal">demo.py</tt> Python program:</p>
<pre class="code python literal-block">
<span class="kn">import</span><span class="w"> </span><span class="nn">math</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="nn">sys</span><span class="w">
</span><span class="kn">from</span><span class="w"> </span><span class="nn">datetime</span><span class="w"> </span><span class="kn">import</span> <span class="n">datetime</span><span class="w">
</span><span class="kn">import</span><span class="w"> </span><span class="nn">psycopg</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">create_table</span><span class="p">(</span><span class="n">conn</span><span class="p">:</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">Connection</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span><span class="w">
</span> <span class="n">conn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;DROP TABLE IF EXISTS t&quot;</span><span class="p">)</span><span class="w">
</span> <span class="n">conn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;CREATE UNLOGGED TABLE t (x numeric, d timestamp, p boolean)&quot;</span><span class="p">)</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">do_insert</span><span class="p">(</span><span class="n">conn</span><span class="p">:</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">Connection</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span> <span class="n">pipeline</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span> <span class="n">count</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span><span class="w">
</span> <span class="n">query</span> <span class="o">=</span> <span class="s2">&quot;INSERT INTO t (x, d, p) VALUES (</span><span class="si">%s</span><span class="s2">, </span><span class="si">%s</span><span class="s2">, </span><span class="si">%s</span><span class="s2">)&quot;</span><span class="w">
</span> <span class="k">for</span> <span class="n">n</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">count</span><span class="p">):</span><span class="w">
</span> <span class="n">params</span> <span class="o">=</span> <span class="p">(</span><span class="n">math</span><span class="o">.</span><span class="n">factorial</span><span class="p">(</span><span class="n">n</span><span class="p">),</span> <span class="n">datetime</span><span class="o">.</span><span class="n">now</span><span class="p">(),</span> <span class="n">pipeline</span><span class="p">)</span><span class="w">
</span> <span class="n">conn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">prepare</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="w">
</span><span class="k">with</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">connect</span><span class="p">(</span><span class="n">autocommit</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span> <span class="k">as</span> <span class="n">conn</span><span class="p">:</span><span class="w">
</span> <span class="n">create_table</span><span class="p">(</span><span class="n">conn</span><span class="p">)</span><span class="w">
</span> <span class="k">if</span> <span class="s2">&quot;--pipeline&quot;</span> <span class="ow">in</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">:</span><span class="w">
</span> <span class="k">with</span> <span class="n">conn</span><span class="o">.</span><span class="n">pipeline</span><span class="p">():</span><span class="w">
</span> <span class="n">do_insert</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">pipeline</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span><span class="w">
</span> <span class="k">else</span><span class="p">:</span><span class="w">
</span> <span class="n">do_insert</span><span class="p">(</span><span class="n">conn</span><span class="p">,</span> <span class="n">pipeline</span><span class="o">=</span><span class="kc">False</span><span class="p">)</span><span class="w">
</span> <span class="n">row_count</span> <span class="o">=</span> <span class="n">conn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="s2">&quot;select count(*) from t&quot;</span><span class="p">)</span><span class="o">.</span><span class="n">fetchone</span><span class="p">()[</span><span class="mi">0</span><span class="p">]</span><span class="w">
</span> <span class="nb">print</span><span class="p">(</span><span class="sa">f</span><span class="s2">&quot;→ </span><span class="si">{</span><span class="n">row_count</span><span class="si">}</span><span class="s2"> rows&quot;</span><span class="p">)</span>
</pre>
<p>we’ll run our script as <tt class="docutils literal">python demo.py <span class="pre">[--pipeline]</span></tt>, the
<tt class="docutils literal"><span class="pre">--pipeline</span></tt> flag allowing to enable pipeline mode. Note that we
passed <tt class="docutils literal">prepare=True</tt> to <tt class="docutils literal">Connection.execute()</tt>, in order to issue a
<tt class="docutils literal">PREPARE</tt> statement as we’ll emit the same query many times.</p>
<p>In general, each <tt class="docutils literal">INSERT</tt> query will be fast to execute server-side.
Without the pipeline mode enabled, the client will typically issue the
query and then wait for its result (though it is unused here): thus the
client/server round-trip time will probably be much larger than the
execution time (on server). With the pipeline mode, we basically save
these round-trips most of the times.</p>
</div>
<div class="section" id="interlude-tracing">
<h2>Interlude: tracing</h2>
<p>When working on optimizing client/server communication, it’s essential
to be able to monitor this communication at a reasonably <em>low level</em>.
From Psycopg’s perspective, the boundary is the libpq. Fortunately, the
library provides a tracing mechanism through the <a class="reference external" href="https://www.postgresql.org/docs/14/libpq-control.html#LIBPQ-PQTRACE">PQtrace</a> function and
friends.</p>
<p>The output of this function looks like (example taken from the
<a class="reference external" href="https://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/test/modules/libpq_pipeline/traces/prepared.trace;h=1a7de5c3e65e35da3f711e0eeea961cb0b77c5cd;hb=278273ccbad27a8834dfdf11895da9cd91de4114">PostgreSQL test suite</a>):</p>
<pre class="literal-block">
F 68 Parse &quot;select_one&quot; &quot;SELECT $1, '42', $1::numeric, interval '1 sec'&quot; 1 NNNN
F 16 Describe S &quot;select_one&quot;
F 4 Sync
B 4 ParseComplete
B 10 ParameterDescription 1 NNNN
B 113 RowDescription 4 &quot;?column?&quot; NNNN 0 NNNN 4 -1 0 &quot;?column?&quot; NNNN 0 NNNN 65535 -1 0 &quot;numeric&quot; NNNN 0 NNNN 65535 -1 0 &quot;interval&quot; NNNN 0 NNNN 16 -1 0
B 5 ReadyForQuery I
F 10 Query &quot;BEGIN&quot;
B 10 CommandComplete &quot;BEGIN&quot;
B 5 ReadyForQuery T
F 43 Query &quot;DECLARE cursor_one CURSOR FOR SELECT 1&quot;
B 19 CommandComplete &quot;DECLARE CURSOR&quot;
B 5 ReadyForQuery T
F 16 Describe P &quot;cursor_one&quot;
F 4 Sync
B 33 RowDescription 1 &quot;?column?&quot; NNNN 0 NNNN 4 -1 0
B 5 ReadyForQuery T
F 4 Terminate
</pre>
<p>Each row contains the “direction indicator” (<tt class="docutils literal">F</tt> for messages from
client to server or <tt class="docutils literal">B</tt> for messages from server to client), the
message length, the <a class="reference external" href="https://www.postgresql.org/docs/14/protocol-message-formats.html">message type</a>, and its content. This example shows
messages from the <a class="reference external" href="https://www.postgresql.org/docs/14/protocol-flow.html#PROTOCOL-FLOW-EXT-QUERY">Extended Query</a> protocol.</p>
<p>In Psycopg, we have access to the low-level <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/pq.html#psycopg.pq.PGconn">PGconn</a> object,
representing the libpq connection, through <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/connections.html#psycopg.Connection.pgconn">Connection.pgconn</a>
attribute.</p>
<p>Here’s how to enable tracing to <tt class="docutils literal">stderr</tt>, for our <tt class="docutils literal">demo.py</tt> program
above:</p>
<pre class="code python literal-block">
<span class="kn">from</span><span class="w"> </span><span class="nn">contextlib</span><span class="w"> </span><span class="kn">import</span> <span class="n">contextmanager</span><span class="w">
</span><span class="kn">from</span><span class="w"> </span><span class="nn">typing</span><span class="w"> </span><span class="kn">import</span> <span class="n">Iterator</span><span class="w">
</span><span class="kn">from</span><span class="w"> </span><span class="nn">psycopg</span><span class="w"> </span><span class="kn">import</span> <span class="n">pq</span><span class="w">
</span><span class="nd">&#64;contextmanager</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">trace_to_stderr</span><span class="p">(</span><span class="n">conn</span><span class="p">:</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">Connection</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="n">Iterator</span><span class="p">[</span><span class="kc">None</span><span class="p">]:</span><span class="w">
</span><span class="sd">&quot;&quot;&quot;Enable tracing of the client/server communication to STDERR.&quot;&quot;&quot;</span><span class="w">
</span> <span class="n">conn</span><span class="o">.</span><span class="n">pgconn</span><span class="o">.</span><span class="n">trace</span><span class="p">(</span><span class="n">sys</span><span class="o">.</span><span class="n">stderr</span><span class="o">.</span><span class="n">fileno</span><span class="p">())</span><span class="w">
</span> <span class="n">conn</span><span class="o">.</span><span class="n">pgconn</span><span class="o">.</span><span class="n">set_trace_flags</span><span class="p">(</span><span class="n">pq</span><span class="o">.</span><span class="n">Trace</span><span class="o">.</span><span class="n">SUPPRESS_TIMESTAMPS</span> <span class="o">|</span> <span class="n">pq</span><span class="o">.</span><span class="n">Trace</span><span class="o">.</span><span class="n">REGRESS_MODE</span><span class="p">)</span><span class="w">
</span> <span class="k">try</span><span class="p">:</span><span class="w">
</span> <span class="k">yield</span><span class="w">
</span> <span class="k">finally</span><span class="p">:</span><span class="w">
</span> <span class="n">conn</span><span class="o">.</span><span class="n">pgconn</span><span class="o">.</span><span class="n">untrace</span><span class="p">()</span><span class="w">
</span><span class="k">def</span><span class="w"> </span><span class="nf">do_insert</span><span class="p">(</span><span class="n">conn</span><span class="p">:</span> <span class="n">psycopg</span><span class="o">.</span><span class="n">Connection</span><span class="p">,</span> <span class="o">*</span><span class="p">,</span> <span class="n">pipeline</span><span class="p">:</span> <span class="nb">bool</span><span class="p">,</span> <span class="n">count</span><span class="p">:</span> <span class="nb">int</span> <span class="o">=</span> <span class="mi">1000</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="kc">None</span><span class="p">:</span><span class="w">
</span> <span class="c1"># ...</span><span class="w">
</span> <span class="k">with</span> <span class="n">trace_to_stderr</span><span class="p">(</span><span class="n">conn</span><span class="p">):</span><span class="w">
</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">count</span><span class="p">):</span><span class="w">
</span> <span class="n">conn</span><span class="o">.</span><span class="n">execute</span><span class="p">(</span><span class="n">query</span><span class="p">,</span> <span class="n">params</span><span class="p">,</span> <span class="n">prepare</span><span class="o">=</span><span class="kc">True</span><span class="p">)</span>
</pre>
</div>
<div class="section" id="to-pipeline-or-not-to-pipeline">
<h2>To pipeline or not to pipeline</h2>
<p>If we run our demo script (without pipeline mode), we’ll typically get
the following output:</p>
<pre class="literal-block">
F 69 Parse &quot;_pg3_0&quot; &quot;INSERT INTO t (x, d, p) VALUES ($1, $2, $3)&quot; 3 NNNN NNNN NNNN
F 4 Sync
B 4 ParseComplete
B 5 ReadyForQuery I
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x00\x01' 8 '\x00\x02\xffffff8b\xffffff8fp~WN' 1 '\x00' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
F 4 Sync
B 4 BindComplete
B 4 NoData
B 15 CommandComplete &quot;INSERT 0 1&quot;
B 5 ReadyForQuery I
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x00\x01' 8 '\x00\x02\xffffff8b\xffffff8fp~^\xffffff80' 1 '\x00' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
F 4 Sync
B 4 BindComplete
B 4 NoData
B 15 CommandComplete &quot;INSERT 0 1&quot;
B 5 ReadyForQuery I
[ ... and so forth ~1000 more times ... ]
</pre>
<p>we indeed see the client/server <em>round-trips</em> in the form of sequences
of <tt class="docutils literal">F</tt> messages followed by sequences of <tt class="docutils literal">B</tt> messages for each
query.</p>
<p>The first message sequence <tt class="docutils literal">Parse</tt>+<tt class="docutils literal">ParseComplete</tt> corresponds
to the <tt class="docutils literal">PREPARE</tt> statement. Next ones only have a
<tt class="docutils literal">Bind</tt>/<tt class="docutils literal">Describe</tt>/<tt class="docutils literal">Execute</tt> client messages followed by server
response.</p>
<p>Now using the pipeline mode (run the script with <tt class="docutils literal"><span class="pre">--pipeline</span></tt>), we get
the following trace:</p>
<pre class="literal-block">
F 69 Parse &quot;_pg3_0&quot; &quot;INSERT INTO t (x, d, p) VALUES ($1, $2, $3)&quot; 3 NNNN NNNN NNNN
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x00\x00' 8 '\x00\x02\xffffff8e&lt;k\x0c\x16\xffffffe6' 1 '\x01' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x00\x01' 8 '\x00\x02\xffffff8e&lt;k\x0c\x18\xffffffcd' 1 '\x01' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x00\x02' 8 '\x00\x02\xffffff8e&lt;k\x0c\x19\xffffff8a' 1 '\x01' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
[ ... ~300 more of those ... ]
B 4 ParseComplete
B 4 BindComplete
B 4 NoData
B 15 CommandComplete &quot;INSERT 0 1&quot;
B 4 BindComplete
B 4 NoData
B 15 CommandComplete &quot;INSERT 0 1&quot;
B 4 BindComplete
B 4 NoData
B 15 CommandComplete &quot;INSERT 0 1&quot;
[ ... ~300 more of those ... ]
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x01&lt;' 8 '\x00\x02\xffffff8e&lt;k\x0c\xffffff96\xffffff8a' 1 '\x01' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x01=' 8 '\x00\x02\xffffff8e&lt;k\x0c\xffffff9c'' 1 '\x01' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
F 49 Bind &quot;&quot; &quot;_pg3_0&quot; 3 1 1 1 3 2 '\x01&gt;' 8 '\x00\x02\xffffff8e&lt;k\x0c\xffffff9c\xffffff85' 1 '\x01' 1 0
F 6 Describe P &quot;&quot;
F 9 Execute &quot;&quot; 0
[ ... ]
</pre>
<p>We can see that the client sends more than 900 messages before the
server replies (with the same number of messages). Clearly, this can
have a huge impact on performance, especially when network latency
matters. And indeed, this runs twice faster even though the Postgres
server is on <tt class="docutils literal">localhost</tt>!</p>
<p>What’s actually happening is that the client sends as many queries as
possible, until the server decides it cannot manage more (in general
because its output buffer is full, typically here because of the large
integers we’re inserting), at which point the server sends back the
results of all queries; rinse and repeat. Instead of producing small and
frequent client/server round-trips, the pipeline mode optimizes network
communication by producing large and scarce round-trips. The “downside”
(remember we got a 2x speed-up) is that the client program needs to
handle more data in memory in general.</p>
</div>
<div class="section" id="how-does-it-work">
<h2>How does it work?</h2>
<p>As mentioned earlier, the entry point for the pipeline mode is the
<a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/connections.html#psycopg.Connection.pipeline">pipeline()</a> method on <tt class="docutils literal">Connection</tt> object which enters and exists
pipeline mode. But what does this mean? Well, basically, this involves
calling underlying <a class="reference external" href="https://www.postgresql.org/docs/14/libpq-pipeline-mode.html#LIBPQ-PQENTERPIPELINEMODE">PQ{enter,exit}PipelineMode</a> functions.</p>
<p>But this does not tell much about how things work in Psycopg.</p>
<p>To actually understand how things work, we need to step back and read
<a class="reference external" href="https://www.postgresql.org/docs/current/libpq-pipeline-mode.html">libpq pipeline mode</a> documentation, in which section “Interleaving
Result Processing and Query Dispatch” states:</p>
<blockquote>
The client application should generally maintain a queue of work remaining
to be dispatched and a queue of work that has been dispatched but not yet
had its results processed. When the socket is writable it should dispatch
more work. When the socket is readable it should read results and process
them, matching them up to the next entry in its corresponding results
queue.</blockquote>
<p>As often with PostgreSQL, everything is there although this paragraph is
somehow enigmatic. However, it, in fact, describes the heart of the
algorithm for the Psycopg driver (though it took us a while to grasp all
the details implied by these few sentences…).</p>
<div class="section" id="socket-communication">
<h3>Socket communication</h3>
<blockquote>
When the socket is writable it should dispatch more work. When the
socket is readable it should read results […].</blockquote>
<p>In Psycopg, socket communication for exchanging libpq messages is
implemented through <em>waiting functions</em> and <em>generators</em> that are tied
together by the I/O layer (either blocking or async): this is explained
in details in <a class="reference external" href="https://www.varrazzo.com/blog/2020/03/26/psycopg3-first-report/">Daniele’s blog post</a>.</p>
<p>What’s important for the pipeline mode (mostly) is the generator part,
as it is responsible for <em>dispatching queries to</em> or <em>reading results
from</em> the socket. In contrast with normal query mode, where these steps
are handled sequentially by independent logic, the pipeline mode needs
<em>interleaving result processing and query dispatch</em>: this is implemented
by the <a class="reference external" href="https://github.com/psycopg/psycopg/blob/3.1/psycopg/psycopg/generators.py#L180">pipeline_communicate()</a> generator. Without going too much into
the details, we can notice that: - the function takes a <em>queue</em> of
“commands”, e.g. <a class="reference external" href="https://www.postgresql.org/docs/14/libpq-async.html#LIBPQ-PQSENDQUERYPARAMS">pgconn.send_query_params()</a> or similar, - it
continuously waits for the socket to be either Read or Write ready (or
both) (<tt class="docutils literal">ready = yield Wait.RW</tt>), - when the socket is Read-ready
(<tt class="docutils literal">if ready &amp; Ready.R:</tt>), results are fetched (calling
<a class="reference external" href="https://www.postgresql.org/docs/14/libpq-async.html#LIBPQ-PQGETRESULT">pgconn.get_result()</a>), - when the socket is Write-ready
(<tt class="docutils literal">if ready &amp; Ready.W:</tt>), commands are sent (calling <a class="reference external" href="https://www.postgresql.org/docs/14/libpq-async.html#LIBPQ-PQFLUSH">pgconn.flush()</a>
to flush the queue of previously sent commands, and then calling any
pending one), - until the queue of commands gets empty.</p>
</div>
<div class="section" id="queueing-work-processing-results">
<h3>Queueing work, processing results</h3>
<p>Around the <tt class="docutils literal">pipeline_communicate()</tt> generator described above, we need
to handle the commands queue as well as the queue of results pending
processing. The first part, filling the commands queue, is simply
managed by stacking commands instead of directly calling them along with
keeping a reference of the cursor used for <tt class="docutils literal">execute()</tt>. The second
part implies handling the output of <a class="reference external" href="https://github.com/psycopg/psycopg/blob/3.1/psycopg/psycopg/generators.py#L180">pipeline_communicate()</a> generator
described above, a list of <a class="reference external" href="https://www.psycopg.org/psycopg3/docs/api/pq.html#psycopg.pq.PGresult">PGresult</a>. Each fetched result item: - is
possibly bound back to its respective cursor (the one where respective
<tt class="docutils literal">execute()</tt> originates from), - might trigger an error if its status
is non-<tt class="docutils literal">OK</tt> (e.g.&nbsp;<tt class="docutils literal">FATAL_ERROR</tt>).</p>
<p>All this is handled in methods of the <a class="reference external" href="https://github.com/psycopg/psycopg/blob/3.1/psycopg/psycopg/_pipeline.py#L37">BasePipeline</a> class (see methods
prefixed with an <tt class="docutils literal">_</tt> at the end).</p>
</div>
</div>
<div class="section" id="integration-with-high-level-features-transactions">
<h2>Integration with high-level features: transactions</h2>
<p>Beside the low-level logic described above, implementing pipeline mode
in Psycopg implied handling some Psycopg-specific features such as:
transactions.</p>
<p>Transactions need special attention because of how <a class="reference external" href="https://www.postgresql.org/docs/14/libpq-pipeline-mode.html#LIBPQ-PIPELINE-ERROS">error handling</a>
works in the pipeline mode. There is a few distinct cases that need to
be handled properly, depending on whether the pipeline uses an <em>implicit
transaction</em> or if it contains <em>explicit transactions</em>. But the general
rule is that when an error occurs, the pipeline gets in <em>aborted</em> state
meaning subsequent commands are skipped and prior statements might get
persisted or not (depending on the usage of explicit transactions or
not).</p>
<p>Consider the following statements, executed within a pipeline:</p>
<pre class="code sql literal-block">
<span class="k">BEGIN</span><span class="p">;</span><span class="w"> </span><span class="o">#</span><span class="w"> </span><span class="k">transaction</span><span class="w"> </span><span class="mi">1</span><span class="w">
</span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'abc'</span><span class="p">);</span><span class="w">
</span><span class="k">COMMIT</span><span class="p">;</span><span class="w">
</span><span class="k">BEGIN</span><span class="p">;</span><span class="w"> </span><span class="o">#</span><span class="w"> </span><span class="k">transaction</span><span class="w"> </span><span class="mi">2</span><span class="w">
</span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">no_such_table</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'x'</span><span class="p">);</span><span class="w">
</span><span class="k">ROLLBACK</span><span class="p">;</span><span class="w">
</span><span class="k">BEGIN</span><span class="p">;</span><span class="w"> </span><span class="o">#</span><span class="w"> </span><span class="k">transaction</span><span class="w"> </span><span class="mi">3</span><span class="w">
</span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'xyz'</span><span class="p">);</span><span class="w">
</span><span class="k">COMMIT</span><span class="p">;</span><span class="w">
</span><span class="k">SELECT</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="k">from</span><span class="w"> </span><span class="n">s</span><span class="p">;</span><span class="w">
</span><span class="o">-&gt;</span><span class="w"> </span><span class="n">abc</span>
</pre>
<p>The <tt class="docutils literal">INSERT INTO no_such_table</tt> statement would produce an error,
making the pipeline <strong>aborted</strong>; accordingly, the following explicit
<tt class="docutils literal">ROLLBACK</tt> will not be executed. And the next statements (“transaction
3”) will also be skipped.</p>
<p>Another example:</p>
<pre class="code sql literal-block">
<span class="k">BEGIN</span><span class="p">;</span><span class="w"> </span><span class="o">#</span><span class="w"> </span><span class="n">main</span><span class="w"> </span><span class="k">transaction</span><span class="w">
</span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'abc'</span><span class="p">);</span><span class="w">
</span><span class="k">BEGIN</span><span class="p">;</span><span class="w"> </span><span class="o">#</span><span class="w"> </span><span class="n">sub</span><span class="o">-</span><span class="k">transaction</span><span class="w">
</span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">no_such_table</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'x'</span><span class="p">);</span><span class="w">
</span><span class="k">ROLLBACK</span><span class="p">;</span><span class="w">
</span><span class="k">INSERT</span><span class="w"> </span><span class="k">INTO</span><span class="w"> </span><span class="n">s</span><span class="w"> </span><span class="k">VALUES</span><span class="w"> </span><span class="p">(</span><span class="s1">'xyz'</span><span class="p">);</span><span class="w">
</span><span class="k">COMMIT</span><span class="p">;</span><span class="w">