Planet Python
Last update: February 17, 2019 07:49 PM UTC
February 17, 2019
Reuven Lerner
Python’s str.isdigit vs. str.isnumeric
Let’s say that I want to write some Python code that invites the user to enter a number, and then prints that number, tripled. We could say:
>>> n = input("Enter a number: ")
>>> print(f"{n} * 3 = {n*3}")
The good news is that this code works just fine. The bad news is that it probably doesn’t do what you might expect. If I run this program, I’ll see:
Enter a number: 5
5 * 3 = 555
The reason for this output is that the “input” function always returns a string. So sure, we asked the user for a number, but we got the string ‘5’, rather than the integer 5. The ‘555’ output is thanks to the fact that you can multiply strings in Python by integers, getting a longer string back. So ‘a’ * 5 will give us ‘aaaaa’.
Of course, we can always create an integer from a string by applying the “int” class to the user’s input:
>>> n = input("Enter a number: ")
>>> n = int(n)
>>> print(f"{n} * 3 = {n*3}")
Sure enough, we get the following output:
Enter a number: 5
5 * 3 = 15
Great, right? But what happens if the user gives us something that’s no longer numeric? The program will blow up:
Enter a number: abcd
ValueError: invalid literal for int() with base 10: 'abcd'
Clearly, we want to avoid this problem. You could make a good argument that in this case, it’s probably best to run the conversion inside of a “try” block, and trap any exception that we might get.
But there’s another way to test this, one which I use with my into Python classes, before we’ve covered exceptions: Strings have a great method called “isdigit” that we can run, to tell us whether a string contains only digits (0-9), or if it contains something else. For example:
>>> '1234'.isdigit()
True
>>> '1234 '.isdigit() # space at the end
False
>>> '1234a'.isdigit() # letter at the end
False
>>> 'a1234'.isdigit() # letter at the start
False
>>> '12.34'.isdigit() # decimal point
False
>>> ''.isdigit() # empty string
False
If you know regular expressions, then you can see that str.isdigit returns True for ‘^\d+$’. Which can be very useful, as we can see here:
>>> n = input("Enter a number: ")
>>> if n.isdigit():
n = int(n)
print(f"{n} * 3 = {n*3}")
But wait: Python also includes another method, str.isnumeric. And it’s not at all obvious, at least at first, what the difference is between them, because they would seem to give the same results:
>>> n = input("Enter a number: ")
>>> if n.numeric():
n = int(n)
print(f"{n} * 3 = {n*3}")
So, what’s the difference? It’s actually pretty straightforward, but took some time for me to find out: Bascially, str.isdigit only returns True for what I said before, strings containing solely the digits 0-9.
By contrast, str.isnumeric returns True if it contains any numeric characters. When I first read this, I figured that it would mean decimal points and minus signs — but no! It’s just the digits 0-9, plus any character from another language that’s used in place of digits.
For example, we’re used to writing numbers with Arabic numerals. But there are other languages that traditionally use other characters. For example, in Chinese, we count 1, 2, 3, 4, 5 as 一,二,三,四, 五. It turns out that the Chinese characters for numbers will return False for str.isdigit, but True for str.isnumeric, behaving differently from their 0-9 counterparts:
>>> '12345'.isdigit()
True
>>> '12345'.isnumeric()
True
>>> '一二三四五'.isdigit()
False
>>> '一二三四五'.isnumeric()
True
So, which should you use? For most people, “isdigit” is probably a better choice, simply because it’s more clearly what you likely want. Of course, if you want to accept other types of numerals and numeric characters, then “isnumeric” is better. But if you’re interested in turning strings into integers, then you’re probably safer using “isdigit”, just in case someone tries to enter something else:
>>> int('二')
ValueError: invalid literal for int() with base 10: '二'
I tried to determine whether there was any difference in speed between the two methods, just in case, but after numerous tests with “%timeit” in Jupyter, I found that I was getting roughly the same speeds from both methods.
If you’re like me, and ever wondered how a language that claims to have “one obvious way to do it” can have two different, seemingly identical methods… well, now you know!
The post Python’s str.isdigit vs. str.isnumeric appeared first on Lerner Consulting Blog.
February 17, 2019 07:32 PM UTC
Stefan Behnel
Speeding up basic object operations in Cython
Raymond Hettinger published a nice little micro-benchmark script for comparing basic operations like attribute or item access in CPython and comparing the performance across Python versions. Unsurprisingly, Cython performs quite well in comparison to the latest CPython 3.8-pre development version, executing most operations 30-50% faster. But the script allowed me to tune some more performance out of certain less well performing operations. The timings are shown below, first those for CPython 3.8-pre as a baseline, then (for comparison) the Cython timings with all optimisations disabled that can be controlled by C macros (gcc -DCYTHON_...=0), the normal (optimised) Cython timings, and the now improved version at the end.
CPython 3.8 (pre) | Cython 3.0 (no opt) | Cython 3.0 (pre) | Cython 3.0 (tuned) | |
---|---|---|---|---|
Variable and attribute read access: | ||||
read_local
|
5.5 ns
|
0.2 ns
|
0.2 ns
|
0.2 ns
|
read_nonlocal
|
6.0 ns
|
0.2 ns
|
0.2 ns
|
0.2 ns
|
read_global
|
17.9 ns
|
13.3 ns
|
2.2 ns
|
2.2 ns
|
read_builtin
|
21.0 ns
|
0.2 ns
|
0.2 ns
|
0.1 ns
|
read_classvar_from_class
|
23.7 ns
|
16.1 ns
|
14.1 ns
|
14.1 ns
|
read_classvar_from_instance
|
20.9 ns
|
11.9 ns
|
11.2 ns
|
11.0 ns
|
read_instancevar
|
31.7 ns
|
22.3 ns
|
20.8 ns
|
22.0 ns
|
read_instancevar_slots
|
25.8 ns
|
16.5 ns
|
15.3 ns
|
17.0 ns
|
read_namedtuple
|
23.6 ns
|
16.2 ns
|
13.9 ns
|
13.5 ns
|
read_boundmethod
|
32.5 ns
|
23.4 ns
|
22.2 ns
|
21.6 ns
|
Variable and attribute write access: | ||||
write_local
|
6.4 ns
|
0.2 ns
|
0.1 ns
|
0.1 ns
|
write_nonlocal
|
6.8 ns
|
0.2 ns
|
0.1 ns
|
0.1 ns
|
write_global
|
22.2 ns
|
13.2 ns
|
13.7 ns
|
13.0 ns
|
write_classvar
|
114.2 ns
|
103.2 ns
|
113.9 ns
|
94.7 ns
|
write_instancevar
|
49.1 ns
|
34.9 ns
|
28.6 ns
|
29.8 ns
|
write_instancevar_slots
|
33.4 ns
|
22.6 ns
|
16.7 ns
|
17.8 ns
|
Data structure read access: | ||||
read_list
|
23.1 ns
|
5.5 ns
|
4.0 ns
|
4.1 ns
|
read_deque
|
24.0 ns
|
5.7 ns
|
4.3 ns
|
4.4 ns
|
read_dict
|
28.7 ns
|
21.2 ns
|
16.5 ns
|
16.5 ns
|
read_strdict
|
23.3 ns
|
10.7 ns
|
10.5 ns
|
12.0 ns
|
Data structure write access: | ||||
write_list
|
28.0 ns
|
8.2 ns
|
4.3 ns
|
4.2 ns
|
write_deque
|
29.5 ns
|
8.2 ns
|
6.3 ns
|
6.4 ns
|
write_dict
|
32.9 ns
|
24.0 ns
|
21.7 ns
|
22.6 ns
|
write_strdict
|
29.2 ns
|
16.4 ns
|
15.8 ns
|
16.0 ns
|
Stack (or queue) operations: | ||||
list_append_pop
|
63.6 ns
|
67.9 ns
|
20.6 ns
|
20.5 ns
|
deque_append_pop
|
56.0 ns
|
81.5 ns
|
159.3 ns
|
46.0 ns
|
deque_append_popleft
|
58.0 ns
|
56.2 ns
|
88.1 ns
|
36.4 ns
|
Timing loop overhead: | ||||
loop_overhead
|
0.4 ns
|
0.2 ns
|
0.1 ns
|
0.2 ns
|
Some things that are worth noting:
- There is always a bit of variance across the runs, so don't get excited about a couple of percent difference.
- The read/write access to local variables is not reasonably measurable in Cython since it uses local/global C variables, and the C compiler discards any useless access to them. But don't worry, they are really fast.
- Builtins (and module global variables in Py3.6+) are cached, which explains the "close to nothing" timings for them above.
- Even with several optimisations disabled, Cython code is still visibly faster than CPython.
- The write_classvar benchmark revealed a performance problem in CPython that is being worked on.
- The deque related benchmarks revealed performance problems in Cython that are now fixed, as you can see in the last column.
February 17, 2019 06:24 PM UTC
gamingdirectional
Continue with the boy boundary detection mechanism
In the previous article, we have successfully made the boy climbing up the ladder but the boy will continue climbing even though there is no more ladder for him to climb. In this article, we will solve the previous problem by introducing the following rules. When the boy is climbing the ladder he can only move in either the upward or the downward direction but not side-way.There will be no return...
February 17, 2019 10:42 AM UTC
PyBites
PyBites Twitter Digest - Issue 01, 2019
It has been too long 😞 but we're excited to bring you today: 🐍 PyBites Twitter Digest - Issue 01, 2019 😎
Python Developers Survey 2018 results
Hey Pythonistas, have you already seen the Python Developers Survey 2018 results? https://t.co/N0bBPVUXZ0 #pythondevsurvey
— Python Software (@ThePSF) February 16, 2019
Cool: Jupyter for every high schooler (4 steps of inquiry-based learning)
Jupyter for every high schooler- Rob Newton (Trinity School): https://t.co/BlCSz7rzaH. Find out about the 4 steps of inquiry-based learning.
— Python Software (@ThePSF) February 15, 2019
The Ultimate List of Data Science Podcasts by Real Python
🐍🎙 The ultimate list of data science podcasts! Over a dozen shows that discuss topics in big data, data analysis, s… https://t.co/tOZCWUJlnX
— Real Python (@realpython) February 14, 2019
Always interesting interviews on Talk Python
RT @TalkPython: Deeply interesting conversation with @pwang from Anaconda Inc on this episode of @talkpython, check it out! https://t.co/og…
— Pybites (@pybites) February 16, 2019
We started to use carbon to share out our tips on Twitter
Given a list of friends how many pairs can be formed? #Python's itertools.combinations is your friend: 🐍 Check ou… https://t.co/TNj9BoHnEH
— Pybites (@pybites) February 15, 2019
We hit 171 Bite exercises on our platform (= 3 new ones / week streak since New Year!)
We just cracked Bite of Py 171. Make a terminal spinner animation - and now We Challenge You!… https://t.co/ovyMKXXeNh
— Pybites (@pybites) February 15, 2019
Virtual envs and more news on Python Bytes
RT @pythonbytes: Is this the end of Python virtual environments? We cover some pretty big news around python isolation this week on @python…
— Pybites (@pybites) February 15, 2019
Hope you had a happy Valentine's day!
Roses are red Violets are blue Python is awesome R is great too https://t.co/ZSu7FHJ3hO (the code will run both in… https://t.co/bzeI0YMwHh
— Daily Python Tip (@python_tip) February 14, 2019
And some more @python_tip goodness
perfplot: measure execution time of code snippets with different input parameters and plot the results. #Python… https://t.co/dcHmHI6Eqp
— Daily Python Tip (@python_tip) February 11, 2019
Congrats to Ant and others that had their talks accepted. If rejected: never give up!
RT @anthonypjshaw: Yey!!! Jumping up and down in my room. First ever #PyCon talk accepted! Now I only have a few months to find a fox onesi…
— Real Python (@realpython) February 17, 2019
Confused by super()
in Python?
🐍📰 Supercharge Your Classes With Python super() In this step-by-step tutorial, you will learn how to leverage sing… https://t.co/XeFOsafDLo
— Real Python (@realpython) February 16, 2019
The state of Python Packaging
An awesome series of articles by Bernat @gjbernat about Python packaging: https://t.co/dAwRtX1gwZ Very useful for P… https://t.co/WFs9Pz71I3
— Dmitry Figol (@dmfigol) February 16, 2019
Anybody tried out 3.8's new Walrus Operator?
Trying Out the `:=` "Walrus Operator" in Python 3.8 https://t.co/WtALzSb0uL
— PyCoder’s Weekly (@pycoders) February 16, 2019
Python 2->3 migration of Dropbox's desktop client
Incrementally Migrating Over One Million Lines of Code From Python 2 to Python 3 https://t.co/Ypm3AZ1jYL
— PyCoder’s Weekly (@pycoders) February 16, 2019
Interesting question on Twitter by Diane Chen:
Do you follow any convention for the order of methods in class definitions?
Hey #python peeps! Do you follow any convention for the order of methods in class definitions? Dunder methods all… https://t.co/lC1H6Jq7Dl
— Diane Chen (@PurpleDiane88) February 10, 2019
Teachingpython podcast: using turtle
(stdlib!) to teach Python
Episode 10: Teaching with Python Turtle is now live! With Kelly in the Florida Keys for a field trip, we wanted to… https://t.co/791AyCu7IP
— Teachingpython (@teachingpython) February 07, 2019
VIM and Python – A Match Made in Heaven
Still referring back to this article to configure proper PEP 8 indentation in Vim
🐍💻 VIM and Python – A Match Made in Heaven This article details how to set up a powerful VIM environment for Pyth… https://t.co/PpwvwexEOA
— Real Python (@realpython) February 16, 2019
Nice Hacktoberfest t-shirt Bryan :)
RT @brnkimani: Just received my parcel for #hactoberfest. My 5+ repos were all #python based. @github @twilio @digitalocean @pybites https:…
— Pybites (@pybites) February 14, 2019
Test & Code 64: Practicing Programming!
You are a knowledge worker. Your tool is your brain.
(sponsored by PyBites)
RT @brianokken: Test & Code 64: Practicing Programming You are a knowledge worker. Your tool is your brain. Keep it sharp. https://t.co/jL…
— Test & Code (@testandcode) February 07, 2019
And finally a good reminder:
Take regular breaks, people! https://t.co/lflAiWPrgO
— Real Python (@realpython) February 17, 2019
>>> from pybites import Bob, Julian
Keep Calm and Code in Python!
February 17, 2019 09:00 AM UTC
Thomas Guest
Aligning the first line of a triple-quoted string in Python
Python’s triple-quoted strings are a convenient syntax for strings where the contents span multiple lines. Unescaped newlines are allowed in triple-quoted strings. So, rather than write:
song = ("Happy birthday to you\n"
"Happy birthday to you\n"
"Happy birthday dear Gail\n"
"Happy birthday to you\n")
you can write:
song = """Happy birthday to you
Happy birthday to you
Happy birthday dear Gail
Happy birthday to you
"""
The only downside here is that the first line doesn’t align nicely with the lines which follow. The way around this is to embed a \newline
escape sequence, meaning both backslash and newline are ignored.
song = """\
Happy birthday to you
Happy birthday to you
Happy birthday dear Gail
Happy birthday to you
"""
February 17, 2019 12:00 AM UTC
February 16, 2019
Python Sweetness
Threadless mode in Mitogen 0.3
Mitogen has been explicitly multi-threaded since the design was first conceived. This choice is hard to regret, as it aligns well with the needs of operating systems like Windows, makes background tasks like proxying possible, and allows painless integration with existing programs where the user doesn't have to care how communication is implemented. Easy blocking APIs simply work as documented from any context, and magical timeouts, file transfers and routing happen in the background without effort.
The story has for the most part played out well, but as work on the Ansible extension revealed, this thread-centric worldview is more than somewhat idealized, and scenarios exist where background threads are not only problematic, but a serious hazard that works against us.
For that reason a new operating mode will hopefully soon be included, one where relatively minor structural restrictions are traded for no background thread at all. This article documents the reasoning behind threadless mode, and a strange set of circumstances that allow such a major feature to be supported with the same blocking API as exists today, and surprisingly minimal disruption to existing code.
Recap
Above is a rough view of Mitogen's process model, revealing a desirable symmetry as it currently exists. In the master program and replicated children, the user's code maintains full control of the main thread, with library communication requirements handled by a background thread using an identical implementation in every process.
Keeping the user in control of the main thread is important, as it possesses certain magical privileges. In Python it is the only thread from which signal handlers can be installed or executed, and on Linux some niche system interfaces require its participation.
When a method like remote_host.call(myfunc)
is invoked, an outgoing message
is constructed and enqueued with the Broker thread, and a callback handler is
installed to cause any return value response message to be posted to another
queue created especially to receive it. Meanwhile the thread that invoked
Context.call(..)
sleeps waiting for a message on the call's dedicated reply
queue.
Latches
Those queues aren't simply Queue.Queue
, but a custom
reimplementation
added early during Ansible extension development, as deficiencies in Python 2.x
threading began to manifest. Python 2 permits the choice between up to 50 ms
latency added to each Queue.get()
, or for waits to execute with UNIX signals
masked, thus preventing CTRL+C from interrupting the program. Given these
options a reimplementation made plentiful sense.
The custom queue is called Latch
, a name chosen simply because it was short
and vaguely fitting. To say its existence is a great discomfort would be an
understatement: reimplementing synchronization was never desired, even if just
by leveraging OS facilities. True to tribal wisdom, the folly of Latch
has
been a vast time sink, costing many days hunting races and subtle
misbehaviours, yet without it, good performance and usability is not possible
on Python 2, and so it remains.
Due to this, when any thread blocks waiting for a result from a remote process,
it always does so within Latch
, a detail that will soon become important.
The Broker
Threading requirements are mostly due to Broker
, a thread that has often
changed role over time. Today its main function is to run an I/O multiplexer,
like Twisted or
asyncio. Except for some
local file IO in master processes, broker thread code is asynchronous and
non-reentrant, regardless of whether it is communicating with a remote machine
via an SSH subprocess or a local thread via a Latch
.
When a user's thread is blocked on a reply queue, that thread isn't really
blocked on a remote process - it is waiting for the broker thread to receive
and decode any reply, then post it to the queue (or Latch
) the thread is
sleeping on.
Performance
Having a dedicated IO thread in a multi-threaded environment simplifies reasoning about communication, as events like unexpected disconnection always occur in a consistent location far from user code. But as is evident, it means every IO requires interaction of two threads in the local process, and when that communication is with a remote Mitogen process, a further two in the remote process.
It may come as no surprise that poor interaction with the OS scheduler often manifests, where load balancing pushes related communicating threads out across distinct cores, where their execution schedule bears no resemblance to the inherent lock-step communication pattern caused by the request-reply structure of RPCs, and between threads of the same process due to the Global Interpreter Lock. The range of undesirable effects defies simple description, it is sufficient to say that poor behaviour here can be disastrous.
To cope with this, the Ansible extension introduced CPU pinning. This feature locks related threads to one core, so that as a user thread enters a wait on the broker after sending it a message, the broker has much higher chance of being scheduled expediently, and for its use of shared resources (like the GIL) to be uncontended and exist in the cache of the CPU it runs on.
Pinned? | Round-trip delay | |
---|---|---|
No | 960 usec | Average 848 usec ± 111 usec |
782 usec | ||
803 usec | ||
Yes | 198 usec | Average 197 usec ± 1 usec |
197 usec | ||
197 usec |
It is hard to overstate the value of pinning, as revealed by the 20% speedup visible in this stress test, but enabling it is a double-edged sword, as the scheduler loses the freedom to migrate processes to balance load, and no general pinning strategy is possible that does not approach the complexity of an entirely new scheduler. As a simple example, if two uncooperative processes (such as Ansible and, say, a database server) were to pin their busiest workers to the same CPU, both will suffer disastrous contention for resources that a scheduler could alleviate if it were permitted.
While performance loss due to scheduling could be considered a scheduler bug, it could be argued that expecting consistently low latency lock-step communication between arbitrary threads is unreasonable, and so it is desirable that threading rather than scheduling be considered at fault, especially as one and not the other is within our control.
The desire is not to remove threading entirely, but instead provide an option
to disable it where it makes sense. For example in Ansible, it is possible to
almost halve the running threads if worker processes were switched to a
threadless implementation, since there is no benefit in the otherwise
single-threaded WorkerProcess
from having a distinct broker thread.
UNIX fork()
In its UNIX manifestation, fork()
is a defective abstraction protected by
religious symbolism and dogma, conceived at a time long predating the 1984
actualization of the problem it failed to solve. It has remained obsolete ever
since. A full description of this exceeds any one paragraph, and an article in
drafting since October already in excess of 8,000 words has not yet succeeded
in fully capturing it.
For our purposes it is sufficient to know that, as when mixed with most UNIX
facilities, mixing fork()
with threads is extremely
unsafe, but many UNIX
programs presently rely on it, such as in Ansible's forking of per-task worker
processes. For that reason in the Ansible extension, Mitogen cannot be
permanently active in the top-level process, but only after fork within a
"connection multiplexer" subprocess, and within the per-task workers.
In upcoming work, there is a renewed desire for a broker to be active in the top-level process, but this is extremely difficult while remaining compatible with Ansible's existing forking model. A threadless mode would be immediately helpful there.
Python 2.4
Another manifestation of fork()
trouble comes in Python 2.4, where the
youthful implementation makes no attempt to repair its threading state after
fork, leading to incurable deadlocks across the board. For this reason when
running on Python 2.4, the Ansible extension disables its internal use of fork
for isolation of certain tasks, but it is not enough, as deadlocks while
starting subprocesses are also possible.
A common idea would be to forget about Python 2.4 as it is too old, much as it is tempting to imagine HTTP 0.9 does not exist, but as in that case, Mitogen treats Python not just as a language runtime, but as an established network protocol that must be complied with in order to communicate with infrastructure that will continue to exist long into the future.
Implementation Approach
Recall it is not possible for a user thread to block without waiting on a
Latch
. With threadless mode, we can instead reinterpret the presence of a
waiting Latch
as the user's indication some network IO is pending, and since
the user cannot become unblocked until that IO is complete, and has given up
forward execution in favour of waiting, Latch.get()
becomes the only location
where the IO loop must run, and only until the Latch
that caused it to run
has some result posted to it by the previous iteration.
@mitogen.main(threadless=True)
def main(router):
host1 = router.ssh(hostname='a.b.c')
host2 = router.ssh(hostname='c.b.a')
call1 = host1.call_async(os.system, 'hostname')
call2 = host2.call_async(os.system, 'hostname')
print call1.get().unpickle()
print call2.get().unpickle()
In the example, after the (presently blocking) connection procedure completes,
neither call_async()
wakes any broker thread, as none exists. Instead they
enqueue messages for the broker to run, but the broker implementation does not
start execution until call1.get()
, where get()
is internally synchronized
using Latch
.
The broker loop ceases after a result becomes available for the Latch
that is
executing it, only to be restarted again for call2.get()
, where it again runs
until its result is available. In this way asynchronous execution progresses
opportunistically, and only when the calling thread indicated it cannot
progress until a result is available.
Owing to the inconvenient existence of Latch
, an initial prototype was
functional with only a 30 line change. In this way, an ugly and undesirable
custom synchronization primitive has accidentally become the centrepiece of an
important new feature.
Size Benefit
The intention is that threadless mode will become the new default in a future
version. As it has much lower synchronization requirements, it becomes possible
to move large pieces of code out of the bootstrap, including any relating to
implementing the UNIX self-pipe trick, as required by Latch
, and to wake the
broker thread from user threads.
Instead this code can be moved to a new mitogen.threads
module, where it can
progressively upgrade an existing threadless mitogen.core
, much like
mitogen.parent
already progressively upgrades it with an industrial-strength
Poller
as required.
Any code that can be removed from the bootstrap has an immediate benefit on cold start performance with large numbers of targets, as the bottleneck during cold start is often a restriction on bandwidth.
Performance Benefit
Threadless mode tallies in well with existing desires to lower latency and resource consumption, such as the plan to reduce context switches.
.right-aligned td, .right-aligned th { text-align: right; }Threaded+Pinned | Threadless | |
---|---|---|
Average Round-trip Time | 201 usec | 131 usec (-34.82%) |
Elapsed Time | 4.220 sec | 3.243 sec (-23.15%) |
Context Switches | 304,330 | 40,037 (-86.84%) |
Instructions | 10,663,813,051 | 8,876,096,105 (-16.76%) |
Branches | 2,146,781,967 | 1,784,930,498 (-15.85%) |
Page Faults | 6,412 | 17,529 (+173.37%) |
Because no broker thread exists, no system calls are required to wake it when a message is enqueued, nor are any necessary to wake the user thread when a reply is received, nor any futex() calls due to one just-woke thread contending on a GIL that has not yet been released by a just-about-to-sleep peer. The effect across two communicating processes is a huge reduction in kernel/user mode switches, contributing to vastly reduced round-trip latency.
In the table an as-yet undiagnosed jump in page faults is visible. One possibility is that either the Python or C library allocator employs a different strategy in the absence of threads, the other is that a memory leak exists in the prototype.
Restrictions
Naturally this will place some restraints on execution. Transparent routing
will no longer be quite so transparent, as it is not possible to execute a
function call in a remote process that is also acting as a proxy to another
process: proxying will not run while Dispatcher
is busy executing the
function call.
One simple solution is to start an additional child of the proxying process in which function calls will run, leaving its parent dedicated just to routing, i.e. exclusively dedicated to running what was previously the broker thread. It is expected this will require only a few lines of additional code to support in the Ansible extension.
For children of a threadless master, import
statements will hang while the
master is otherwise busy, but this is not much of a problem, since import
statements usually happen once shortly after the first parent->child call, when
the master will be waiting in a Latch
.
For threadless children, no background thread exists to notice a parent has
disconnected, and to ensure the process shuts down gracefully in case the main
thread has hung. Some options are possible, including starting a subprocess for
the task, or supporting SIGIO
-based asynchronous IO, so the broker thread can
run from the signal handler and notice the parent is gone.
Another restriction is that when threadless mode is enabled, Mitogen primitives cannot be used from multiple threads. After some consideration, while possible to support, it does not seem worth the complexity, and would prevent the aforementioned reduction of bootstrap code size.
Ongoing Work
Mitogen has quite an ugly concept of Services, added in a hurry during the initial Ansible extension development. Services represent a bundle of a callable method exposed to the network, a security policy determining who may call it, and an execution policy governing its concurrency requirements. Service execution always happens in a background thread pool, and is used to implement things like file transfer in the Ansible extension.
Despite heavy use, it has always been an ugly feature as it partially duplicates the normal parent->child function call mechanism. Looking at services from the perspective of threadless mode reveals some notion of a "threadless service", and how such a threadless service looks even more similar to a function call than previously.
It is possible that as part of the threadless work, the unification of function calls and services may finally happen, although no design for it is certain yet.
Summary
There are doubtlessly many edge cases left to discover, but threadless mode looks very doable, and promises to make Mitogen suitable in even more scenarios than before.
Until next time!
Just tuning in?
- 2017-09-15: Mitogen, an infrastructure code baseline that sucks less
- 2018-03-06: Quadrupling Ansible performance with Mitogen
- 2018-07-10: Mitogen released!
February 16, 2019 10:00 PM UTC
Python Insider
Python 2.7.16 release candidate 1 available
A release candidate for the upcoming 2.7.16 bug fix release is now available for download.
February 16, 2019 07:56 PM UTC
Weekly Python StackOverflow Report
(clxv) stackoverflow python report
These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2019-02-16 19:24:24 GMT
- Why does Python allow out-of-range slice indexes for sequences? - [55/2]
- How to efficiently use asyncio when calling a method on a BaseProxy? - [11/2]
- Python multiprocessing crashes docker container - [10/2]
- Python 3 pandas.groupby.filter - [9/5]
- Apply list of regex pattern on list python - [9/2]
- max/min function on list with strings and integers - [8/3]
- re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7 - [8/1]
- Fill Bounding Boxes in 2d array - [7/2]
- Slider button click in selenium python - [7/1]
- One line, three variables - [6/2]
February 16, 2019 07:25 PM UTC
Codementor
Getting Started with Pathlib
Introduction This tutorial will guide you on how to use the Pathlib module for working with filesystem paths, the benefits, and understand the problem it solves since the Python standard library…
February 16, 2019 07:23 PM UTC
gamingdirectional
Detect the player’s boundary
In this article, we will start to create the boundary detection mechanism which can be used to help the boy moving around the canvas. We will go slowly where this topic will take a few chapters to complete. In this chapter, we will focus on below issues. The boy will not be able to move past the horizontal boundary of either 0 or 576 pixels which is the physical boundary for the boy sprite.
February 16, 2019 12:49 PM UTC
Vasudev Ram
pprint.isrecursive: Check if object requires recursive representation
- By Vasudev Ram - Online Python training / SQL training / Linux training
Tree image attribution
Hi, readers,
I was using the pprint module to pretty-print some Python data structures in a program I was writing. Then saw that it has a function called isrecursive.
The docstring for pprint.isrecursive says:
>>> print pprint.isrecursive.__doc__Here is a Python 3 shell session that shows what the isrecursive function does, with a list:
Determine if object requires a recursive representation.
>>> import pprintHow about for a dict?
>>> print(pprint.pprint.__doc__)
Pretty-print a Python object to a stream [default is sys.stdout].
>>> a = []
>>> a
[]
>>> pprint.isrecursive(a)
False
>>>
>>> a.append(a)
>>> a
[[...]]
>>>
>>> pprint.isrecursive(a)
True
>>> b = {}How about if an object is recursive, but not directly, like in the above two examples? Instead, it is recursive via a chain of objects:
>>> pprint.isrecursive(b)
False
>>>
>>> b[1] = b
>>> b
{1: {...}}
>>> id(b) == id(b[1])
True
>>> pprint.isrecursive(b)
True
>>> c = []So we can see that isrecursive is useful to detect some recursive Python object structures.
>>> d = []
>>> e = []
>>> c.append(d)
>>> d.append(e)
>>> c
[[[]]]
>>> pprint.isrecursive(c)
False
>>> e.append(c)
>>> c
[[[[...]]]]
>>> pprint.isrecursive(c)
True
Interestingly, if I compare c with c[0] (after making c a recursive structure), I get:
>>> c == c[0]In Python 2, I get:
Traceback (most recent call last):
File "", line 1, in
RecursionError: maximum recursion depth exceeded in comparison
>>>
RuntimeError: maximum recursion depth exceeded in cmp
Also, relevant XKCD-clone.
The image at the top of the post is of a tree created in the LOGO programming language using recursion.
- Enjoy.
- Vasudev Ram - Online Python training and consulting
I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.
The course details and testimonials are here.
Contact me for details of course content, terms and schedule.
Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.
Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.
Sell your digital products via DPD: Digital Publishing for Ebooks and Downloads.
Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.
Check out WP Engine, powerful WordPress hosting.
Get a fast web site with A2 Hosting.
Creating online products for sale? Check out ConvertKit, email marketing for online creators.
Teachable: feature-packed course creation platform, with unlimited video, courses and students.
Posts about: Python * DLang * xtopdf
My ActiveState Code recipes
Follow me on:
February 16, 2019 01:47 AM UTC
February 15, 2019
PyCharm
PyCharm 2019.1 EAP 4
Our fourth Early Access Program (EAP) version for PyCharm 2019.1 is now available on our website.
New in This Version
Parallel and concurrent testing with pytest
PyCharm makes it easy to run tests quickly using multiprocessing (parallelism) and multithreading (concurrency). All you need to do in order to run your pytest tests in parallel is to install the pytest-xdist plugin as a normal python package using the PyCharm’s package manager, specify pytest as the project testing framework, create a pytest run/debug configuration where you can specify the number of CPUs to run the tests on, and you’re good to go.
Read more about setting up and running pytest tests in parallel in our help
Further Improvements
- Many JavaScript improvements: PyCharm Professional Edition bundles all JavaScript features from WebStorm. You can read more about the new JavaScript features on the WebStorm blog.
- Many platform improvements: PyCharm bundles new features and bug fixes coming from the IntelliJ platform
- And more, read the release notes for details.
Interested?
Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.
With PyCharm 2019.1 we’re moving to a new runtime environment: this EAP build already bundles the brand new JetBrains Runtime Environment (a customized version of JRE 11). Unfortunately, since this build uses the brand-new platform, the patch-update from previous versions is not available this time. Please use the full installation method instead.
If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.
PyCharm 2019.1 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2019.1 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.
All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you’ll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.
February 15, 2019 10:54 AM UTC
February 14, 2019
Continuum Analytics Blog
Intake released on Conda-Forge
Intake is a package for cataloging, finding and loading your data. It has been developed recently by Anaconda, Inc., and continues to gain new features. To read general information about Intake and how to use…
The post Intake released on Conda-Forge appeared first on Anaconda.
February 14, 2019 09:26 PM UTC
PyCon
Eighth Annual PyLadies Auction at PyCon 2019
Photo Courtesy of Mike Pirnat |
The Python Software Foundation (PSF) is proud to announce the Eighth Annual PyCon Charity Auction for 2019.
PyCon 2018’s auction was a huge success raising over $30K! More than 40 items from sponsors and fellow attendees were auctioned. Attendance was overwhelming and, rather than turn more people away for 2019, we have decided to increase capacity this year!
The PSF subsidizes this event each year by covering the cost of the venue, food, and beverages. In addition, the PSF adds a substantial donation to the event after everything is auctioned off.
If you are interested in donating and item for the auction send the information to pycon-auction@python.org
Thinking about becoming a Sponsor?
We are hoping to find one or two companies who love what the PyLadies are doing and are willing to sponsor this wonderful event. It’s a great opportunity to let our community know that you support them. Sponsorships start at $7500. More information can be found here, or you can contact pycon-sponsors@python.org.Photo Courtesy of Mike Pirnat |
The auction is a fun and entertaining way to support the PyLadies community.
We hope to see you in Cleveland!
February 14, 2019 06:43 PM UTC
Made With Mu
A GPIOZero Theramin for Valentine’s Day
Thanks to Ben Nuttall, one of the maintainers of the wonderful, GPIOZero library, love is in the air.
Why not serenade your Valentine with Mu, a distance sensor, speaker, Raspberry Pi and some nifty Python code which uses GPIOZero to turn the hardware into a romantic Theramin..?
It’s only four lines of code, thus showing how easy it is to make cool hardware hacks with Mu, Python and GPIOZero:
from gpiozero import TonalBuzzer, DistanceSensor
buzzer = TonalBuzzer(20)
ds = DistanceSensor(14, 26)
buzzer.source = ds
The end result is full of love (for GPIO related shenanigans):
February 14, 2019 09:00 AM UTC
Talk Python to Me
#199 Automate all the things with Python at Zapier
Do your applications call a lot of APIs? Maybe you have a bunch of microservices driving your app. You probably don't have the crazy combinatorial explosion that Zapier does for connecting APIs! They have millions of users automating things with 1,000s of APIs. It's pretty crazy. And they are doing it all with Python. Join me and Bryan Helmig, the CTO and co-founder of Zapier as we discuss how they pull this off with Python.
February 14, 2019 08:00 AM UTC
Python Bytes
#117 Is this the end of Python virtual environments?
February 14, 2019 08:00 AM UTC
Codementor
What You Don't Know About Python Variables
The first time you get introduced to Python’s variable, it is usually defined as “parts of your computer’s memory where you store some information.” Some define it as a “storage placeholder for texts and numbers." Python variables is more than he above definition.
February 14, 2019 12:14 AM UTC
February 13, 2019
Dataquest
How to Learn Python for Data Science In 5 Steps
Why Learn Python For Data Science?

Before we explore how to learn Python for data science, we should briefly answer why you should learn Python in the first place.
In short, understanding Python is one of the valuable skills needed for a data science career.
Though it hasn’t always been, Python is the programming language of choice for data science. Here’s a brief history:
- In 2016, it overtook R on Kaggle, the premier platform for data science competitions.
- In 2017, it overtook R on KDNuggets’s annual poll of data scientists’ most used tools.
- In 2018, 66% of data scientists reported using Python daily, making it the number one tool for analytics professionals.
Data science experts expect this trend to continue with increasing development in the Python ecosystem. And while your journey to learn Python programming may be just beginning, it’s nice to know that employment opportunities are abundant (and growing) as well.
According to Indeed, the average salary for a Data Scientist is $127,918.
The good news? That number is only expected to increase. The experts at IBM predicted a 28% increase in demand for data scientists by the year 2020.
So, the future is bright for data science, and Python is just one piece of the proverbial pie. Fortunately, learning Python and other programming fundamentals is as attainable as ever. We’ll show you how in five simple steps.
But remember – just because the steps are simple doesn’t mean you won’t have to put in the work. If you apply yourself and dedicate meaningful time to learning Python, you have the potential to not only pick up a new skill, but potentially bring your career to a new level.
How to Learn Python for Data Science
First, you’ll want to find the right course to help you learn Python programming. Dataquest’s courses are specifically designed for you to learn Python for data science at your own pace.
In addition to learning Python in a course setting, your journey to becoming a data scientist should also include soft skills. Plus, there are some complimentary technical skills we recommend you learn along the way.
Step 1: Learn Python Fundamentals
Everyone starts somewhere. This first step is where you’ll learn Python programming basics. You’ll also want an introduction to data science.
One of the important tools you should start using early in your journey is Jupyter Notebook, which comes prepackaged with Python libraries to help you learn these two things.
Kickstart your learning by: Joining a community
By joining a community, you’ll put yourself around like-minded people and increase your opportunities for employment. According to the Society for Human Resource Management, employee referrals account for 30% of all hires.
Create a Kaggle account, join a local Meetup group, and participate in Dataquest’s members-only Slack discussions with current students and alums.
Related skills: Try the Command Line Interface
The Command Line Interface (CLI) lets you run scripts more quickly, allowing you to test programs faster and work with more data.
Step 2: Practice Mini Python Projects
We truly believe in hands-on learning. You may be surprised by how soon you’ll be ready to build small Python projects.
Try programming things like calculators for an online game, or a program that fetches the weather from Google in your city. Building mini projects like these will help you learn Python. programming projects like these are standard for all languages, and a great way to solidify your understanding of the basics.
You should start to build your experience with APIs and begin web scraping. Beyond helping you learn Python programming, web scraping will be useful for you in gathering data later.
Kickstart your learning by: Reading
Enhance your coursework and find answers to the Python programming challenges you encounter. Read guidebooks, blog posts, and even other people’s open source code to learn Python and data science best practices - and get new ideas.
Automate The Boring Stuff With Python by Al Sweigart is an excellent and entertaining resource.
Related skills: Work with databases using SQL
SQL is used to talk to databases to alter, edit, and reorganize information. SQL is a staple in the data science community, as 40% of data scientists report consistently using it.*
Step 3: Learn Python Data Science Libraries
Unlike some other programming languages, in Python, there is generally a best way of doing something. The three best and most important Python libraries for data science are NumPy, Pandas, and Matplotlib.
NumPy and Pandas are great for exploring and playing with data. Matplotlib is a data visualization library that makes graphs like you’d find in Excel or Google Sheets.
Kickstart your learning by: Asking questions
You don’t know what you don’t know!
Python has a rich community of experts who are eager to help you learn Python. Resources like Quora, Stack Overflow, and Dataquest’s Slack are full of people excited to share their knowledge and help you learn Python programming. We also have an FAQ for each mission to help with questions you encounter throughout your programming courses with Dataquest.
Related skills: Use Git for version control
Git is a popular tool that helps you keep track of changes made to your code, which makes it much easier to correct mistakes, experiment, and collaborate with others.
Step 4: Build a Data Science Portfolio as you Learn Python
For aspiring data scientists, a portfolio is a must.
These projects should include several different datasets and should leave readers with interesting insights that you’ve gleaned. Your portfolio doesn’t need a particular theme; find datasets that interest you, then come up with a way to put them together.
Displaying projects like these gives fellow data scientists something to collaborate on and shows future employers that you’ve truly taken the time to learn Python and other important programming skills.
One of the nice things about data science is that your portfolio doubles as a resume while highlighting the skills you’ve learned, like Python programming.
Kickstart your learning by: Communicating, collaborating, and focusing on technical competence
During this time, you’ll want to make sure you’re cultivating those soft skills required to work with others, making sure you really understand the inner workings of the tools you’re using.
Related skills: Learn beginner and intermediate statistics
While learning Python for data science, you’ll also want to get a solid background in statistics. Understanding statistics will give you the mindset you need to focus on the right things, so you’ll find valuable insights (and real solutions) rather than just executing code.
Step 5: Apply Advanced Data Science Techniques
Finally, aim to sharpen your skills. Your data science journey will be full of constant learning, but there are advanced courses you can complete to ensure you’ve covered all the bases.
You’ll want to be comfortable with regression, classification, and k-means clustering models. You can also step into machine learning - bootstrapping models and creating neural networks using scikit-learn.
At this point, programming projects can include creating models using live data feeds. Machine learning models of this kind adjust their predictions over time.
Remember to: Keep learning!
Data science is an ever-growing field that spans numerous industries.
At the rate that demand is increasing, there are exponential opportunities to learn. Continue reading, collaborating, and conversing with others, and you’re sure to maintain interest and a competitive edge over time.
How Long Will It Take To Learn Python?
After reading these steps, the most common question we have people ask us is: “How long does all this take?”
There are a lot of estimates for the time it takes to learn Python. For data science specifically, estimates a range from 3 months to a year of consistent practice.
We’ve watched people move through our courses at lightning speed and others who have taken it much slower.
Really, it all depends on your desired timeline, free time that you can dedicate to learn Python programming and the pace at which you learn.
Dataquest’s courses are created for you to go at your own speed. Each path is full of missions, hands-on learning and opportunities to ask questions so that you get can an in-depth mastery of data science fundamentals.
Get started for free. Learn Python with our Data Scientist path and start mastering a new skill today.
Resources and studies cited:
February 13, 2019 03:57 PM UTC
Kushal Das
Tracking my phone's silent connections
My phone has more friends than me. It talks to more peers (computers) than the number of human beings I talk on an average. In this age of smartphones and mobile apps for A-Z things, we are dependent on these technologies. However, at the same time, we don’t know much of what is going on in the computers equipped with powerful cameras, GPS device, microphone we are carrying all the time. All these apps are talking to their respective servers (or can we call them masters?), but, there is no easy way to track them.
These questions bothered me for a long time: I wanted to see the servers my phone is connecting to, and I want to block those connections as I wish. However, I never managed to work on this. A few weeks ago, I finally sat down to start working to build up a system by reusing already available open source projects and tools to create the system, which will allow me to track what my phone is doing. Maybe not in full details, but, at least shed some light on the network traffic from the phone.
Initial trial
I tried to create a wifi hotspot at home using a Raspberry Pi and then started
capturing all the packets from the device using standard tools (dumpcap
) and
later reading through the logs using Wireshark. This procedure meant that I
could only capture when I am connected to the network at home. What about when I
am not at home?
Next round
This time I took a bit different approach. I chose algo to create a VPN server. Using WireGuard, it became straightforward to connect my iPhone to the VPN. This process also allows capturing all the traffic from the phone very easily on the VPN server. A few days in the experiment, Kashmir started posting her experiment named Life Without the Tech Giants, where she started blocking all the services from 5 big technology companies. With her help, I contacted Dhruv Mehrotra, who is a technologist behind the story. After talking to him, I felt that I am going in the right direction. He already posted details on how they did the blocking, and you can try that at home :)
Looking at the data after 1 week
After capturing the data for the first week, I moved the captured pcap files into my computer. Wrote some Python code to put the data into a SQLite database, enabling me to query the data much faster.
Domain Name System (DNS) data
The Domain Name System (DNS) is a decentralized system which helps to translate the human memory safe domain names (like kushaldas.in) into Internet Protocol (IP) addresses (like 192.168.1.1 ). Computers talk to each other using these IP addresses, we, don’t have to worry to remember so many names. When the developers develop their applications for the phone, they generally use those domain names to specify where the app should connect.
If I plot all the different domains (including any subdomain) which got queried at least 10 times in a week, we see the following graph.
The first thing to notice is how the phone is trying to find servers from Apple, which makes sense as this is an iPhone. I use the mobile Twitter app a lot, so we also see many queries related to Twitter. Lookout is a special mention there, it was suggested to me by my friends who understand these technologies and security better than me. The 3rd position is taken by Google, though sometimes I watch Youtube videos, but, the phone queried for many other Google domains.
There are also many queries to Akamai CDN service, and I could not find any easy way to identify those hosts, the same with Amazon AWS related hosts. If you know any better way, please drop me a note.
You can see a lot of data analytics related companies were also queried.
dev.appboy.com
is a major one, and thankfully algo already blocked that domain
in the DNS level. I don’t know which app is trying to connect to which all
servers, I found about a few of the apps in my phone by searching about the
client list of the above-mentioned analytics companies. Next, in coming months,
I will start blocking those hosts/domains one by one and see which all apps stop
working.
Looking at data flow
The number of DNS queries is an easy start, but, next I wanted to learn more about the actual servers my phone is talking to. The paranoid part inside of me was pushing for discovering these servers.
If we put all of the major companies the phone is talking to, we get the following graph.
Apple is leading the chart by taking 44% of all the connections, and the number is 495225 times. Twitter is in the second place, and Edgecastcdn is in the third. My phone talked to Google servers 67344 number of times, which is like 7 times less than the number of times Apple itself.
In the next graph, I removed the big players (including Google and Amazon).
Then, I can see that analytics companies like nflxso.net
and mparticle.com
have 31% of the connections, which is a lot. Most probably I will start with
blocking these two first. The 3 other CDN companies, Akamai, Cloudfront, and
Cloudflare has 8%, 7%, and 6% respectively. Do I know what all things are these
companies tracking? Nope, and that is scary enough that one of my friend
commented “It makes me think about throwing my phone in the garbage.”
What about encrypted vs unencrypted traffic? What all protocols are being used? I tried to find the answer for the first question, and the answer looks like the following graph. Maybe the number will come down if I try to refine the query and add other parameters, that is a future task.
What next?
As I said earlier, I am working on creating a set of tools, which then can be deployed on the VPN server, that will provide a user-friendly way to monitor, and block/unblock traffic from their phone. The major part of the work is to make sure that the whole thing is easy to deploy, and can be used by someone with less technical knowledge.
How can you help?
The biggest thing we need is the knowledge of “How to analyze the data we are capturing?”. It is one thing to make reports for personal user, but, trying to help others is an entirely different game altogether. We will, of course, need all sorts of contributions to the project. Before anything else, we will have to join the random code we have, into a proper project structure. Keep following this blog for more updates and details about the project.
Note to self
Do not try to read data after midnight, or else I will again think a local address as some random dynamic address in Bangkok and freak out (thank you reverse-dns).
February 13, 2019 02:47 AM UTC
February 12, 2019
The No Title® Tech Blog
Why and how I have just redesigned my (other) website
Going through a moment of change in my life, I have decided to redesign my other website, using Pelican and other open source tools. The older version was starting to look a bit aged, especially on mobile devices, so it seemed like a good idea to start a complete makeover. As they use to say, new year… new website.
February 12, 2019 11:15 PM UTC
Codementor
Testing isn't everything, but it's important
A talk that I've been thinking about for the last little while is one by Gary Bernhardt called Ideology (https://www.destroyallsoftware.com/talks/ideology). I highly recommend that you go watch it...
February 12, 2019 09:32 PM UTC
Real Python
Supercharge Your Classes With Python super()
While Python isn’t purely an object-oriented language, it’s flexible enough and powerful enough to allow you to build your applications using the object-oriented paradigm. One of the ways in which Python achieves this is by supporting inheritance, which it does with super()
.
In this tutorial, you’ll learn about the following:
- The concept of inheritance in Python
- Multiple inheritance in Python
- How the
super()
function works - How the
super()
function in single inheritance works - How the
super()
function in multiple inheritance works
Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.
An Overview of Python’s super()
Function
If you have experience with object-oriented languages, you may already be familiar with the functionality of super()
.
If not, don’t fear! While the official documentation is fairly technical, at a high level super()
gives you access to methods in a superclass from the subclass that inherits from it.
super()
alone returns a temporary object of the superclass that then allows you to call that superclass’s methods.
Why would you want to do any of this? While the possibilities are limited by your imagination, a common use case is building classes that extend the functionality of previously built classes.
Calling the previously built methods with super()
saves you from needing to rewrite those methods in your subclass, and allows you to swap out superclasses with minimal code changes.
super()
in Single Inheritance
If you’re unfamiliar with object-oriented programming concepts, inheritance might be an unfamiliar term. Inheritance is a concept in object-oriented programming in which a class derives (or inherits) attributes and behaviors from another class without needing to implement them again.
For me at least, it’s easier to understand these concepts when looking at code, so let’s write classes describing some shapes:
class Rectangle:
def __init__(self, length, width):
self.length = length
self.width = width
def area(self):
return self.length * self.width
def perimeter(self):
return 2 * self.length + 2 * self.width
class Square:
def __init__(self, length):
self.length = length
def area(self):
return self.length * self.length
def perimeter(self):
return 4 * self.length
Here, there are two similar classes: Rectangle
and Square
.
You can use them as below:
>>> square = Square(4)
>>> square.area()
16
>>> rectangle = Rectangle(2,4)
>>> rectangle.area()
8
In this example, you have two shapes that are related to each other: a square is a special kind of rectangle. The code, however, doesn’t reflect that relationship and thus has code that is essentially repeated.
By using inheritance, you can reduce the amount of code you write while simultaneously reflecting the real-world relationship between rectangles and squares:
class Rectangle:
def __init__(self, length, width):
self.length = length
self.width = width
def area(self):
return self.length * self.width
def perimeter(self):
return 2 * self.length + 2 * self.width
# Here we declare that the Square class inherits from the Rectangle class
class Square(Rectangle):
def __init__(self, length):
super().__init__(length, length)
Here, you’ve used super()
to call the __init__()
of the Rectangle
class, allowing you to use it in the Square
class without repeating code. Below, the core functionality remains after making changes:
>>> square = Square(4)
>>> square.area()
16
In this example, Rectangle
is the superclass, and Square
is the subclass.
Because the Square
and Rectangle
.__init__()
methods are so similar, you can simply call the superclass’s .__init__()
method (Rectangle.__init__()
) from that of Square
by using super()
. This sets the .length
and .width
attributes even though you just had to supply a single length
parameter to the Square
constructor.
When you run this, even though your Square
class doesn’t explicitly implement it, the call to .area()
will use the .area()
method in the superclass and print 16
. The Square
class inherited .area()
from the Rectangle
class.
Note: To learn more about inheritance and object-oriented concepts in Python, be sure to check out Object-Oriented Programming (OOP) in Python 3.
What Can super()
Do for You?
So what can super()
do for you in single inheritance?
Like in other object-oriented languages, it allows you to call methods of the superclass in your subclass. The primary use case of this is to extend the functionality of the inherited method.
In the example below, you will create a class Cube
that inherits from Square
and extends the functionality of .area()
(inherited from the Rectangle
class through Square
) to calculate the surface area and volume of a Cube
instance:
class Square(Rectangle):
def __init__(self, length):
super().__init__(length, length)
class Cube(Square):
def surface_area(self):
face_area = super().area()
return face_area * 6
def volume(self):
face_area = super().area()
return face_area * self.length
Now that you’ve built the classes, let’s look at the surface area and volume of a cube with a side length of 3
:
>>> cube = Cube(3)
>>> cube.surface_area()
54
>>> cube.volume()
27
Caution: Note that in our example above, super()
alone won’t make the method calls for you: you have to call the method on the proxy object itself.
Here you have implemented two methods for the Cube
class: .surface_area()
and .volume()
. Both of these calculations rely on calculating the area of a single face, so rather than reimplementing the area calculation, you use super()
to extend the area calculation.
Also notice that the Cube
class definition does not have an .__init__()
. Because Cube
inherits from Square
and .__init__()
doesn’t really do anything differently for Cube
than it already does for Square
, you can skip defining it, and the .__init__()
of the superclass (Square
) will be called automatically.
super()
returns a delegate object to a parent class, so you call the method you want directly on it: super().area()
.
Not only does this save us from having to rewrite the area calculations, but it also allows us to change the internal .area()
logic in a single location. This is especially in handy when you have a number of subclasses inheriting from one superclass.
A super()
Deep Dive
Before heading into multiple inheritance, let’s take a quick detour into the mechanics of super()
.
While the examples above (and below) call super()
without any parameters, super()
can also take two parameters: the first is the subclass, and the second parameter is an object that is an instance of that subclass.
First, let’s see two examples showing what manipulating the first variable can do, using the classes already shown:
class Rectangle:
def __init__(self, length, width):
self.length = length
self.width = width
def area(self):
return self.length * self.width
def perimeter(self):
return 2 * self.length + 2 * self.width
class Square(Rectangle):
def __init__(self, length):
super(Square, self).__init__(self, length, length)
In Python 3, the super(Square, self)
call is equivalent to the parameterless super()
call. The first parameter refers to the subclass Square
, while the second parameter refers to a Square
object which, in this case, is self
. You can call super()
with other classes as well:
class Cube(Square):
def surface_area(self):
face_area = super(Square, self).area()
return face_area * 6
def volume(self):
face_area = super(Square, self).area()
return face_area * self.length
In this example, you are setting Square
as the subclass argument to super()
, instead of Cube
. This causes super()
to start searching for a matching method (in this case, .area()
) at one level above Square
in the instance hierarchy, in this case Rectangle
.
In this specific example, the behavior doesn’t change. But imagine that Square
also implemented an .area()
function that you wanted to make sure Cube
did not use. Calling super()
in this way allows you to do that.
Caution: While we are doing a lot of fiddling with the parameters to super()
in order to explore how it works under the hood, I’d caution against doing this regularly.
The parameterless call to super()
is recommended and sufficient for most use cases, and needing to change the search hierarchy regularly could be indicative of a larger design issue.
What about the second parameter? Remember, this is an object that is an instance of the class used as the first parameter. For an example, isinstance(Cube, Square)
must return True
.
By including an instantiated object, super()
returns a bound method: a method that is bound to the object, which gives the method the object’s context such as any instance attributes. If this parameter is not included, the method returned is just a function, unassociated with an object’s context.
For more information about bound methods, unbound methods, and functions, read the Python documentation on its descriptor system.
Note: Technically, super()
doesn’t return a method. It returns a proxy object. This is an object that delegates calls to the correct class methods without making an additional object in order to do so.
super()
in Multiple Inheritance
Now that you’ve worked through an overview and some examples of super()
and single inheritance, you will be introduced to an overview and some examples that will demonstrate how multiple inheritance works and how super()
enables that functionality.
Multiple Inheritance Overview
There is another use case in which super()
really shines, and this one isn’t as common as the single inheritance scenario. In addition to single inheritance, Python supports multiple inheritance, in which a subclass can inherit from multiple superclasses that don’t necessarily inherit from each other (also known as sibling classes).
I’m a very visual person, and I find diagrams are incredibly helpful to understand concepts like this. The image below shows a very simple multiple inheritance scenario, where one class inherits from two unrelated (sibling) superclasses:

To better illustrate multiple inheritance in action, here is some code for you to try out, showing how you can build a right pyramid (a pyramid with a square base) out of a Triangle
and a Square
:
class Triangle:
def __init__(self, base, height):
self.base = base
self.height = height
def area(self):
return 0.5 * self.base * self.height
class RightPyramid(Triangle, Square):
def __init__(self, base, slant_height):
self.base = base
self.slant_height = slant_height
def area(self):
base_area = super().area()
perimeter = super().perimeter()
return 0.5 * perimeter * self.slant_height + base_area
Note: The term slant height may be unfamiliar, especially if it’s been a while since you’ve taken a geometry class or worked on any pyramids.
The slant height is the height from the center of the base of an object (like a pyramid) up its face to the peak of that object. You can read more about slant heights at WolframMathWorld.
This example declares a Triangle
class and a RightPyramid
class that inherits from both Square
and Triangle
.
You’ll see another .area()
method that uses super()
just like in single inheritance, with the aim of it reaching the .perimeter()
and .area()
methods defined all the way up in the Rectangle
class.
Note: You may notice that the code above isn’t using any inherited properties from the Triangle
class yet. Later examples will fully take advantage of inheritance from both Triangle
and Square
.
The problem, though, is that both superclasses (Triangle
and Square
) define a .area()
. Take a second and think about what might happen when you call .area()
on RightPyramid
, and then try calling it like below:
>> pyramid = RightPyramid(2, 4)
>> pyramid.area()
Traceback (most recent call last):
File "shapes.py", line 63, in <module>
print(pyramid.area())
File "shapes.py", line 47, in area
base_area = super().area()
File "shapes.py", line 38, in area
return 0.5 * self.base * self.height
AttributeError: 'RightPyramid' object has no attribute 'height'
Did you guess that Python will try to call Triangle.area()
? This is because of something called the method resolution order.
Note: How did we notice that Triangle.area()
was called and not, as we hoped, Square.area()
? If you look at the last line of the traceback (before the AttributeError
), you’ll see a reference to a specific line of code:
return 0.5 * self.base * self.height
You may recognize this from geometry class as the formula for the area of a triangle. Otherwise, if you’re like me, you might have scrolled up to the Triangle
and Rectangle
class definitions and seen this same code in Triangle.area()
.
Method Resolution Order
The method resolution order (or MRO) tells Python how to search for inherited methods. This comes in handy when you’re using super()
because the MRO tells you exactly where Python will look for a method you’re calling with super()
and in what order.
Every class has an .__mro__
attribute that allows us to inspect the order, so let’s do that:
>>> RightPyramid.__mro__
(<class '__main__.RightPyramid'>, <class '__main__.Triangle'>,
<class '__main__.Square'>, <class '__main__.Rectangle'>,
<class 'object'>)
This tells us that methods will be searched first in Rightpyramid
, then in Triangle
, then in Square
, then Rectangle
, and then, if nothing is found, in object
, from which all classes originate.
The problem here is that the interpreter is searching for .area()
in Triangle
before Square
and Rectangle
, and upon finding .area()
in Triangle
, Python calls it instead of the one you want. Because Triangle.area()
expects there to be a .height
and a .base
attribute, Python throws an AttributeError
.
Luckily, you have some control over how the MRO is constructed. Just by changing the signature of the RightPyramid
class, you can search in the order you want, and the methods will resolve correctly:
class RightPyramid(Square, Triangle):
def __init__(self, base, slant_height):
self.base = base
self.slant_height = slant_height
super().__init__(self.base)
def area(self):
base_area = super().area()
perimeter = super().perimeter()
return 0.5 * perimeter * self.slant_height + base_area
Notice that RightPyramid
initializes partially with the .__init__()
from the Square
class. This allows .area()
to use the .length
on the object, as is designed.
Now, you can build a pyramid, inspect the MRO, and calculate the surface area:
>>> pyramid = RightPyramid(2, 4)
>>> RightPyramid.__mro__
(<class '__main__.RightPyramid'>, <class '__main__.Square'>,
<class '__main__.Rectangle'>, <class '__main__.Triangle'>,
<class 'object'>)
>>> pyramid.area()
20.0
You see that the MRO is now what you’d expect, and you can inspect the area of the pyramid as well, thanks to .area()
and .perimeter()
.
There’s still a problem here, though. For the sake of simplicity, I did a few things wrong in this example: the first, and arguably most importantly, was that I had two separate classes with the same method name and signature.
This causes issues with method resolution, because the first instance of .area()
that is encountered in the MRO list will be called.
When you’re using super()
with multiple inheritance, it’s imperative to design your classes to cooperate. Part of this is ensuring that your methods are unique so that they get resolved in the MRO, by making sure method signatures are unique—whether by using method names or method parameters.
In this case, to avoid a complete overhaul of your code, you can rename the Triangle
class’s .area()
method to .tri_area()
. This way, the area methods can continue using class properties rather than taking external parameters:
class Triangle:
def __init__(self, base, height):
self.base = base
self.height = height
super().__init__()
def tri_area(self):
return 0.5 * self.base * self.height
Let’s also go ahead and use this in the RightPyramid
class:
class RightPyramid(Square, Triangle):
def __init__(self, base, slant_height):
self.base = base
self.slant_height = slant_height
super().__init__(self.base)
def area(self):
base_area = super().area()
perimeter = super().perimeter()
return 0.5 * perimeter * self.slant_height + base_area
def area_2(self):
base_area = super().area()
triangle_area = super().tri_area()
return triangle_area * 4 + base_area
The next issue here is that the code doesn’t have a delegated Triangle
object like it does for a Square
object, so calling .area_2()
will give us an AttributeError
since .base
and .height
don’t have any values.
You need to do two things to fix this:
-
All methods that are called with
super()
need to have a call to their superclass’s version of that method. This means that you will need to addsuper().__init__()
to the.__init__()
methods ofTriangle
andRectangle
. -
Redesign all the
.__init__()
calls to take a keyword dictionary. See the complete code below.
class Rectangle:
def __init__(self, length, width, **kwargs):
self.length = length
self.width = width
super().__init__(**kwargs)
def area(self):
return self.length * self.width
def perimeter(self):
return 2 * self.length + 2 * self.width
# Here we declare that the Square class inherits from
# the Rectangle class
class Square(Rectangle):
def __init__(self, length, **kwargs):
super().__init__(length=length, width=length, **kwargs)
class Cube(Square):
def surface_area(self):
face_area = super().area()
return face_area * 6
def volume(self):
face_area = super().area()
return face_area ** 3
class Triangle:
def __init__(self, base, height, **kwargs):
self.base = base
self.height = height
super().__init__(**kwargs)
def tri_area(self):
return 0.5 * self.base * self.height
class RightPyramid(Square, Triangle):
def __init__(self, base, slant_height, **kwargs):
self.base = base
self.slant_height = slant_height
kwargs["height"] = slant_height
kwargs["length"] = base
super().__init__(base=base, **kwargs)
def area(self):
base_area = super().area()
perimeter = super().perimeter()
return 0.5 * perimeter * self.slant_height + base_area
def area_2(self):
base_area = super().area()
triangle_area = super().tri_area()
return triangle_area * 4 + base_area
There are a number of important differences in this code:
-
kwargs
is modified in some places (such asRightPyramid.__init__()
): This will allow users of these objects to instantiate them only with the arguments that make sense for that particular object. -
Setting up named arguments before
**kwargs
: You can see this inRightPyramid.__init__()
. This has the neat effect of popping that key right out of the**kwargs
dictionary, so that by the time that it ends up at the end of the MRO in theobject
class,**kwargs
is empty.
Note: Following the state of kwargs
can be tricky here, so here’s a table of .__init__()
calls in order, showing the class that owns that call, and the contents of kwargs
during that call:
Class | Named Arguments | kwargs |
---|---|---|
RightPyramid |
base , slant_height |
|
Square |
length |
base , height |
Rectangle |
length , width |
base , height |
Triangle |
base , height |
Now, when you use these updated classes, you have this:
>>> pyramid = RightPyramid(base=2, slant_height=4)
>>> pyramid.area()
20.0
>>> pyramid.area_2()
20.0
It works! You’ve used super()
to successfully navigate a complicated class hierarchy while using both inheritance and composition to create new classes with minimal reimplementation.
Multiple Inheritance Alternatives
As you can see, multiple inheritance can be useful but also lead to very complicated situations and code that is hard to read. It’s also rare to have objects that neatly inherit everything from more than multiple other objects.
If you see yourself beginning to use multiple inheritance and a complicated class hierarchy, it’s worth asking yourself if you can achieve code that is cleaner and easier to understand by using composition instead of inheritance.
With composition, you can add very specific functionality to your classes from a specialized, simple class called a mixin.
Since this article is focused on inheritance, I won’t go into too much detail on composition and how to wield it in Python, but here’s a short example using VolumeMixin
to give specific functionality to our 3D objects—in this case, a volume calculation:
class Rectangle:
def __init__(self, length, width):
self.length = length
self.width = width
def area(self):
return self.length * self.width
class Square(Rectangle):
def __init__(self, length):
super().__init__(length, length)
class VolumeMixin:
def volume(self):
return self.area() * self.height
class Cube(VolumeMixin, Square):
def __init__(self, length):
super().__init__(length)
self.height = length
def area(self):
return super().area() * 6
In this example, the code was reworked to include a mixin called VolumeMixin
. The mixin is then used by Cube
and gives Cube
the ability to calculate its volume, which is shown below:
>>> cube = Cube(2)
>>> cube.area()
24
>>> cube.volume()
48
This mixin can be used the same way in any class that has an area defined for it and for which the formula area * height
returns the correct volume.
A super()
Recap
In this tutorial, you learned how to supercharge your classes with super()
. Your journey started with a review of single inheritance and then showed how to call superclass methods easily with super()
.
You then learned how multiple inheritance works in Python, and techniques to combine super()
with multiple inheritance. You also learned about how Python resolves method calls using the method resolution order (MRO), as well as how to inspect and modify the MRO to ensure appropriate methods are called at appropriate times.
For more information about object-oriented programming in Python and using super()
, check out these resources:
- Official
super()
documentation - Python’s
super()
Considered Super by Raymond Hettinger - Object-Oriented Programming in Python 3
[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]
February 12, 2019 09:01 PM UTC
Codementor
Python, For The ❤ of It - part 3 (What I Built With It)
My Journey Into One of World's Most Awesome Languages
February 12, 2019 08:42 PM UTC
PyCoder’s Weekly
Issue #355 (Feb. 12, 2019)
#355 – FEBRUARY 12, 2019
View in Browser »
Goodbye Virtual Environments?
Could an npm-style __pypackages__
in your app’s project folder be an alternative to using virtual environments? Chad explores this question in his article and goes over PEP 582 as a sample implementation of this idea.
CHAD SMITH
The State of Python Packaging
Describes where Python packaging ecosystem is today, and where the Python Packaging Authority hopes will move next.
BERNAT.TECH
Find a Python Job Through Vettery
Vettery specializes in developer roles and is completely free for job seekers. Interested? Submit your profile, and if accepted, you can receive interview requests directly from top companies seeking Python devs. Get started →
VETTERY sponsor
Python Elects a Steering Council
“After a two-week voting period, which followed a two-week nomination window, Python now has its governance back in place—with a familiar name in the mix.”
JAKE EDGE
Incrementally Migrating Over One Million Lines of Code From Python 2 to Python 3
How the Dropbox team handled their migration from Python 2 → 3. Great list of lessons learned at the end!
DROPBOX.COM
PyCon 2020-2021 Location Announced
The PSF announced that PyCon will be held in Pittsburgh in 2020 and 2021!
PYCON.BLOGSPOT.COM • Shared by Ricky White
Discussions
Conventions for the Order of Methods in Class Definitions?
Dunder methods all first? Alphabetical? How do you do it?
TWITTER.COM/PURPLEDIANE88
Why Does Python Live On Land…?
Ba-dum-tss!
TWITTER.COM/REALPYTHON
Python Jobs
Senior Systems Engineer (Hamilton, ON, Canada)
Python Web Developer (Remote)
Software Developer (Herndon, VA)
Tech Lead / Senior Software Engineer (Seattle, WA)
Python Software Engineer (London, UK)
Pole Star Space Applications Ltd.
Senior Engineer Python & More (Winterthur, Switzerland)
Sr Enterprise Python Developer (Toronto, ON, Canada)
Senior Software Engineer (Santa Monica, CA)
Computer Science Teacher (Pasadena, CA)
Senior Python Engineer (New York, NY)
Software Engineer (Herndon, VA)
Web UI Developer (Herndon, VA)
Articles & Tutorials
The Ultimate List of Data Science Podcasts
Over a dozen shows that discuss topics in big data, data analysis, statistics, machine learning, and artificial intelligence. What’s your pick?
REAL PYTHON
Python Exceptions Considered an Anti-Pattern
Nikita goes over the drawbacks of Python exceptions and makes a case for why they could be considered an anti-pattern in some cases. Oh, and he also proposes a solution… Worth a read!
NIKITA SOBOLEV opinion
Take Control of Your Job Search With Indeed Prime
With Indeed Prime, you’re in the driver’s seat. Tell us about your skills, career goals, and salary requirements and we’ll match you with top companies looking to hire candidates like you. Apply today to get started!
INDEED sponsor
A Successful Python 3 Migration Story
How the Zato engineering team migrated 130,000 lines of code from Python 2 to Python 3.
ZATO.IO
Python in Education: Request for Ideas
The PSF wants to hear your ideas on ways it can fund work to improve Python in education.
PSF
Python 3 Template Strings Instead of External Template Engine
I’ve been a fan of Python’s template strings and this article demonstrates a good use case for them.
ESHLOX.NET
Python Architecture Stuff: Do We Need More?
Some good resources linked in this article if you’re looking to improve the architecture of your Python apps, in order to make them easier to test, for example.
OBEYTHETESTINGGOAT.COM
Trying Out the :=
“Walrus Operator” in Python 3.8
The first alpha of Python 3.8 was just released. With that comes a mayor new feature in the form of PEP 572 (Assignment Expressions). Alexander demos this new feature in this short & sweet article.
ALEXANDER HULTNÉR
Bayesian Analysis With Python (Interview With Osvaldo Martin)
Osvaldo Martin is one of the developers of PyMC3 and ArviZ. He is a researcher specialized in Bayesian statistics and data science.
FEDERICO CARRONE
Master Intermediate Python Skills With “Python 201”
If you already know the basics of Python and now you want to go to the next level, then this is the book for you. This book is for intermediate level Python programmers only—there won’t be any beginner chapters here. Learn More →
MIKE DRISCOLL book sponsor
Projects & Code
PythonEXE: How to Create an Executable File From a Python Script?
A simple project that demonstrates how to create an executable from a Python project.
GITHUB.COM/JABBALACI
demoji: Accurately Remove Emojis From Text Strings
Accurately find or remove emojis from a blob of text.
BRAD SOLOMON
Events
Python North East
February 13, 2019
PYTHONNORTHEAST.COM
Python Atlanta
February 14, 2019
MEETUP.COM
PyCon Belarus 2019
February 15 to February 17, 2019
PYCON.ORG
Dominican Republic Python User Group
February 19, 2019
PYTHON.DO
PyCon Namibia 2019
February 19 to February 22, 2019
PYCON.ORG
Happy Pythoning!
This was PyCoder’s Weekly Issue #355.
View in Browser »
[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]