skip to navigation
skip to content

Planet Python

Last update: February 17, 2019 07:49 PM UTC

February 17, 2019


Reuven Lerner

Python’s str.isdigit vs. str.isnumeric

Let’s say that I want to write some Python code that invites the user to enter a number, and then prints that number, tripled. We could say:

>>> n = input("Enter a number: ")
>>> print(f"{n} * 3 = {n*3}")

The good news is that this code works just fine. The bad news is that it probably doesn’t do what you might expect. If I run this program, I’ll see:

Enter a number: 5
5 * 3 = 555

The reason for this output is that the “input” function always returns a string. So sure, we asked the user for a number, but we got the string ‘5’, rather than the integer 5. The ‘555’ output is thanks to the fact that you can multiply strings in Python by integers, getting a longer string back. So ‘a’ * 5 will give us ‘aaaaa’.

Of course, we can always create an integer from a string by applying the “int” class to the user’s input:

>>> n = input("Enter a number: ")
>>> n = int(n)
>>> print(f"{n} * 3 = {n*3}")

Sure enough, we get the following output:

Enter a number: 5
5 * 3 = 15

Great, right? But what happens if the user gives us something that’s no longer numeric? The program will blow up:

Enter a number: abcd

ValueError: invalid literal for int() with base 10: 'abcd'

Clearly, we want to avoid this problem. You could make a good argument that in this case, it’s probably best to run the conversion inside of a “try” block, and trap any exception that we might get.

But there’s another way to test this, one which I use with my into Python classes, before we’ve covered exceptions: Strings have a great method called “isdigit” that we can run, to tell us whether a string contains only digits (0-9), or if it contains something else. For example:

>>> '1234'.isdigit()
True

>>> '1234 '.isdigit() # space at the end
False

>>> '1234a'.isdigit() # letter at the end
False

>>> 'a1234'.isdigit() # letter at the start
False

>>> '12.34'.isdigit() # decimal point
False

>>> ''.isdigit() # empty string
False

If you know regular expressions, then you can see that str.isdigit returns True for ‘^\d+$’. Which can be very useful, as we can see here:

>>> n = input("Enter a number: ")
>>> if n.isdigit():
n = int(n)
print(f"{n} * 3 = {n*3}")

But wait: Python also includes another method, str.isnumeric. And it’s not at all obvious, at least at first, what the difference is between them, because they would seem to give the same results:

>>> n = input("Enter a number: ")
>>> if n.numeric():
n = int(n)
print(f"{n} * 3 = {n*3}")

So, what’s the difference? It’s actually pretty straightforward, but took some time for me to find out: Bascially, str.isdigit only returns True for what I said before, strings containing solely the digits 0-9.

By contrast, str.isnumeric returns True if it contains any numeric characters. When I first read this, I figured that it would mean decimal points and minus signs — but no! It’s just the digits 0-9, plus any character from another language that’s used in place of digits.

For example, we’re used to writing numbers with Arabic numerals. But there are other languages that traditionally use other characters. For example, in Chinese, we count 1, 2, 3, 4, 5 as 一,二,三,四, 五. It turns out that the Chinese characters for numbers will return False for str.isdigit, but True for str.isnumeric, behaving differently from their 0-9 counterparts:

>>> '12345'.isdigit()
True

>>> '12345'.isnumeric()
True

>>> '一二三四五'.isdigit()
False

>>> '一二三四五'.isnumeric()
True

So, which should you use? For most people, “isdigit” is probably a better choice, simply because it’s more clearly what you likely want. Of course, if you want to accept other types of numerals and numeric characters, then “isnumeric” is better. But if you’re interested in turning strings into integers, then you’re probably safer using “isdigit”, just in case someone tries to enter something else:

>>> int('二')
ValueError: invalid literal for int() with base 10: '二'

I tried to determine whether there was any difference in speed between the two methods, just in case, but after numerous tests with “%timeit” in Jupyter, I found that I was getting roughly the same speeds from both methods.

If you’re like me, and ever wondered how a language that claims to have “one obvious way to do it” can have two different, seemingly identical methods… well, now you know!

The post Python’s str.isdigit vs. str.isnumeric appeared first on Lerner Consulting Blog.

February 17, 2019 07:32 PM UTC


Stefan Behnel

Speeding up basic object operations in Cython

Raymond Hettinger published a nice little micro-benchmark script for comparing basic operations like attribute or item access in CPython and comparing the performance across Python versions. Unsurprisingly, Cython performs quite well in comparison to the latest CPython 3.8-pre development version, executing most operations 30-50% faster. But the script allowed me to tune some more performance out of certain less well performing operations. The timings are shown below, first those for CPython 3.8-pre as a baseline, then (for comparison) the Cython timings with all optimisations disabled that can be controlled by C macros (gcc -DCYTHON_...=0), the normal (optimised) Cython timings, and the now improved version at the end.

CPython 3.8 (pre) Cython 3.0 (no opt) Cython 3.0 (pre) Cython 3.0 (tuned)
Variable and attribute read access:        
  read_local
            5.5 ns
            0.2 ns
            0.2 ns
            0.2 ns
  read_nonlocal
            6.0 ns
            0.2 ns
            0.2 ns
            0.2 ns
  read_global
           17.9 ns
           13.3 ns
            2.2 ns
            2.2 ns
  read_builtin
           21.0 ns
            0.2 ns
            0.2 ns
            0.1 ns
  read_classvar_from_class
           23.7 ns
           16.1 ns
           14.1 ns
           14.1 ns
  read_classvar_from_instance
           20.9 ns
           11.9 ns
           11.2 ns
           11.0 ns
  read_instancevar
           31.7 ns
           22.3 ns
           20.8 ns
           22.0 ns
  read_instancevar_slots
           25.8 ns
           16.5 ns
           15.3 ns
           17.0 ns
  read_namedtuple
           23.6 ns
           16.2 ns
           13.9 ns
           13.5 ns
  read_boundmethod
           32.5 ns
           23.4 ns
           22.2 ns
           21.6 ns
Variable and attribute write access:        
  write_local
            6.4 ns
            0.2 ns
            0.1 ns
            0.1 ns
  write_nonlocal
            6.8 ns
            0.2 ns
            0.1 ns
            0.1 ns
  write_global
           22.2 ns
           13.2 ns
           13.7 ns
           13.0 ns
  write_classvar
          114.2 ns
          103.2 ns
          113.9 ns
           94.7 ns
  write_instancevar
           49.1 ns
           34.9 ns
           28.6 ns
           29.8 ns
  write_instancevar_slots
           33.4 ns
           22.6 ns
           16.7 ns
           17.8 ns
Data structure read access:        
  read_list
           23.1 ns
            5.5 ns
            4.0 ns
            4.1 ns
  read_deque
           24.0 ns
            5.7 ns
            4.3 ns
            4.4 ns
  read_dict
           28.7 ns
           21.2 ns
           16.5 ns
           16.5 ns
  read_strdict
           23.3 ns
           10.7 ns
           10.5 ns
           12.0 ns
Data structure write access:        
  write_list
           28.0 ns
            8.2 ns
            4.3 ns
            4.2 ns
  write_deque
           29.5 ns
            8.2 ns
            6.3 ns
            6.4 ns
  write_dict
           32.9 ns
           24.0 ns
           21.7 ns
           22.6 ns
  write_strdict
           29.2 ns
           16.4 ns
           15.8 ns
           16.0 ns
Stack (or queue) operations:        
  list_append_pop
           63.6 ns
           67.9 ns
           20.6 ns
           20.5 ns
  deque_append_pop
           56.0 ns
           81.5 ns
          159.3 ns
           46.0 ns
  deque_append_popleft
           58.0 ns
           56.2 ns
           88.1 ns
           36.4 ns
Timing loop overhead:        
  loop_overhead
            0.4 ns
            0.2 ns
            0.1 ns
            0.2 ns

Some things that are worth noting:

  • There is always a bit of variance across the runs, so don't get excited about a couple of percent difference.
  • The read/write access to local variables is not reasonably measurable in Cython since it uses local/global C variables, and the C compiler discards any useless access to them. But don't worry, they are really fast.
  • Builtins (and module global variables in Py3.6+) are cached, which explains the "close to nothing" timings for them above.
  • Even with several optimisations disabled, Cython code is still visibly faster than CPython.
  • The write_classvar benchmark revealed a performance problem in CPython that is being worked on.
  • The deque related benchmarks revealed performance problems in Cython that are now fixed, as you can see in the last column.

February 17, 2019 06:24 PM UTC


gamingdirectional

Continue with the boy boundary detection mechanism

In the previous article, we have successfully made the boy climbing up the ladder but the boy will continue climbing even though there is no more ladder for him to climb. In this article, we will solve the previous problem by introducing the following rules. When the boy is climbing the ladder he can only move in either the upward or the downward direction but not side-way.There will be no return...

Source

February 17, 2019 10:42 AM UTC


PyBites

PyBites Twitter Digest - Issue 01, 2019

It has been too long 😞 but we're excited to bring you today: 🐍 PyBites Twitter Digest - Issue 01, 2019 😎

Python Developers Survey 2018 results

Cool: Jupyter for every high schooler (4 steps of inquiry-based learning)

The Ultimate List of Data Science Podcasts by Real Python

Always interesting interviews on Talk Python

We started to use carbon to share out our tips on Twitter

We hit 171 Bite exercises on our platform (= 3 new ones / week streak since New Year!)

Virtual envs and more news on Python Bytes

Hope you had a happy Valentine's day!

And some more @python_tip goodness

Congrats to Ant and others that had their talks accepted. If rejected: never give up!

Confused by super() in Python?

The state of Python Packaging

Anybody tried out 3.8's new Walrus Operator?

Python 2->3 migration of Dropbox's desktop client

Interesting question on Twitter by Diane Chen:

Do you follow any convention for the order of methods in class definitions?

Teachingpython podcast: using turtle (stdlib!) to teach Python

VIM and Python – A Match Made in Heaven

Still referring back to this article to configure proper PEP 8 indentation in Vim

Nice Hacktoberfest t-shirt Bryan :)

Test & Code 64: Practicing Programming!

You are a knowledge worker. Your tool is your brain.

(sponsored by PyBites)

And finally a good reminder:


>>> from pybites import Bob, Julian

Keep Calm and Code in Python!

February 17, 2019 09:00 AM UTC


Thomas Guest

Aligning the first line of a triple-quoted string in Python

Python’s triple-quoted strings are a convenient syntax for strings where the contents span multiple lines. Unescaped newlines are allowed in triple-quoted strings. So, rather than write:

song = ("Happy birthday to you\n"
        "Happy birthday to you\n"
        "Happy birthday dear Gail\n"
        "Happy birthday to you\n")

you can write:

song = """Happy birthday to you
Happy birthday to you
Happy birthday dear Gail
Happy birthday to you
"""

The only downside here is that the first line doesn’t align nicely with the lines which follow. The way around this is to embed a \newline escape sequence, meaning both backslash and newline are ignored.

song = """\
Happy birthday to you
Happy birthday to you
Happy birthday dear Gail
Happy birthday to you
"""

February 17, 2019 12:00 AM UTC

February 16, 2019


Python Sweetness

Threadless mode in Mitogen 0.3

Mitogen has been explicitly multi-threaded since the design was first conceived. This choice is hard to regret, as it aligns well with the needs of operating systems like Windows, makes background tasks like proxying possible, and allows painless integration with existing programs where the user doesn't have to care how communication is implemented. Easy blocking APIs simply work as documented from any context, and magical timeouts, file transfers and routing happen in the background without effort.

The story has for the most part played out well, but as work on the Ansible extension revealed, this thread-centric worldview is more than somewhat idealized, and scenarios exist where background threads are not only problematic, but a serious hazard that works against us.

For that reason a new operating mode will hopefully soon be included, one where relatively minor structural restrictions are traded for no background thread at all. This article documents the reasoning behind threadless mode, and a strange set of circumstances that allow such a major feature to be supported with the same blocking API as exists today, and surprisingly minimal disruption to existing code.

Recap

Above is a rough view of Mitogen's process model, revealing a desirable symmetry as it currently exists. In the master program and replicated children, the user's code maintains full control of the main thread, with library communication requirements handled by a background thread using an identical implementation in every process.

Keeping the user in control of the main thread is important, as it possesses certain magical privileges. In Python it is the only thread from which signal handlers can be installed or executed, and on Linux some niche system interfaces require its participation.

When a method like remote_host.call(myfunc) is invoked, an outgoing message is constructed and enqueued with the Broker thread, and a callback handler is installed to cause any return value response message to be posted to another queue created especially to receive it. Meanwhile the thread that invoked Context.call(..) sleeps waiting for a message on the call's dedicated reply queue.

Latches

Those queues aren't simply Queue.Queue, but a custom reimplementation added early during Ansible extension development, as deficiencies in Python 2.x threading began to manifest. Python 2 permits the choice between up to 50 ms latency added to each Queue.get(), or for waits to execute with UNIX signals masked, thus preventing CTRL+C from interrupting the program. Given these options a reimplementation made plentiful sense.

The custom queue is called Latch, a name chosen simply because it was short and vaguely fitting. To say its existence is a great discomfort would be an understatement: reimplementing synchronization was never desired, even if just by leveraging OS facilities. True to tribal wisdom, the folly of Latch has been a vast time sink, costing many days hunting races and subtle misbehaviours, yet without it, good performance and usability is not possible on Python 2, and so it remains.

Due to this, when any thread blocks waiting for a result from a remote process, it always does so within Latch, a detail that will soon become important.

The Broker

Threading requirements are mostly due to Broker, a thread that has often changed role over time. Today its main function is to run an I/O multiplexer, like Twisted or asyncio. Except for some local file IO in master processes, broker thread code is asynchronous and non-reentrant, regardless of whether it is communicating with a remote machine via an SSH subprocess or a local thread via a Latch.

When a user's thread is blocked on a reply queue, that thread isn't really blocked on a remote process - it is waiting for the broker thread to receive and decode any reply, then post it to the queue (or Latch) the thread is sleeping on.

Performance

Having a dedicated IO thread in a multi-threaded environment simplifies reasoning about communication, as events like unexpected disconnection always occur in a consistent location far from user code. But as is evident, it means every IO requires interaction of two threads in the local process, and when that communication is with a remote Mitogen process, a further two in the remote process.

It may come as no surprise that poor interaction with the OS scheduler often manifests, where load balancing pushes related communicating threads out across distinct cores, where their execution schedule bears no resemblance to the inherent lock-step communication pattern caused by the request-reply structure of RPCs, and between threads of the same process due to the Global Interpreter Lock. The range of undesirable effects defies simple description, it is sufficient to say that poor behaviour here can be disastrous.

To cope with this, the Ansible extension introduced CPU pinning. This feature locks related threads to one core, so that as a user thread enters a wait on the broker after sending it a message, the broker has much higher chance of being scheduled expediently, and for its use of shared resources (like the GIL) to be uncontended and exist in the cache of the CPU it runs on.

Runs of tests/bench/roundtrip.py with and without pinning.
Pinned? Round-trip delay
No 960 usec Average 848 usec ± 111 usec
782 usec
803 usec
Yes 198 usec Average 197 usec ± 1 usec
197 usec
197 usec

It is hard to overstate the value of pinning, as revealed by the 20% speedup visible in this stress test, but enabling it is a double-edged sword, as the scheduler loses the freedom to migrate processes to balance load, and no general pinning strategy is possible that does not approach the complexity of an entirely new scheduler. As a simple example, if two uncooperative processes (such as Ansible and, say, a database server) were to pin their busiest workers to the same CPU, both will suffer disastrous contention for resources that a scheduler could alleviate if it were permitted.

While performance loss due to scheduling could be considered a scheduler bug, it could be argued that expecting consistently low latency lock-step communication between arbitrary threads is unreasonable, and so it is desirable that threading rather than scheduling be considered at fault, especially as one and not the other is within our control.

The desire is not to remove threading entirely, but instead provide an option to disable it where it makes sense. For example in Ansible, it is possible to almost halve the running threads if worker processes were switched to a threadless implementation, since there is no benefit in the otherwise single-threaded WorkerProcess from having a distinct broker thread.

UNIX fork()

In its UNIX manifestation, fork() is a defective abstraction protected by religious symbolism and dogma, conceived at a time long predating the 1984 actualization of the problem it failed to solve. It has remained obsolete ever since. A full description of this exceeds any one paragraph, and an article in drafting since October already in excess of 8,000 words has not yet succeeded in fully capturing it.

For our purposes it is sufficient to know that, as when mixed with most UNIX facilities, mixing fork() with threads is extremely unsafe, but many UNIX programs presently rely on it, such as in Ansible's forking of per-task worker processes. For that reason in the Ansible extension, Mitogen cannot be permanently active in the top-level process, but only after fork within a "connection multiplexer" subprocess, and within the per-task workers.

In upcoming work, there is a renewed desire for a broker to be active in the top-level process, but this is extremely difficult while remaining compatible with Ansible's existing forking model. A threadless mode would be immediately helpful there.

Python 2.4

Another manifestation of fork() trouble comes in Python 2.4, where the youthful implementation makes no attempt to repair its threading state after fork, leading to incurable deadlocks across the board. For this reason when running on Python 2.4, the Ansible extension disables its internal use of fork for isolation of certain tasks, but it is not enough, as deadlocks while starting subprocesses are also possible.

A common idea would be to forget about Python 2.4 as it is too old, much as it is tempting to imagine HTTP 0.9 does not exist, but as in that case, Mitogen treats Python not just as a language runtime, but as an established network protocol that must be complied with in order to communicate with infrastructure that will continue to exist long into the future.

Implementation Approach

Recall it is not possible for a user thread to block without waiting on a Latch. With threadless mode, we can instead reinterpret the presence of a waiting Latch as the user's indication some network IO is pending, and since the user cannot become unblocked until that IO is complete, and has given up forward execution in favour of waiting, Latch.get() becomes the only location where the IO loop must run, and only until the Latch that caused it to run has some result posted to it by the previous iteration.

@mitogen.main(threadless=True)
def main(router):
    host1 = router.ssh(hostname='a.b.c')
    host2 = router.ssh(hostname='c.b.a')

    call1 = host1.call_async(os.system, 'hostname')
    call2 = host2.call_async(os.system, 'hostname')

    print call1.get().unpickle()
    print call2.get().unpickle()

In the example, after the (presently blocking) connection procedure completes, neither call_async() wakes any broker thread, as none exists. Instead they enqueue messages for the broker to run, but the broker implementation does not start execution until call1.get(), where get() is internally synchronized using Latch.

The broker loop ceases after a result becomes available for the Latch that is executing it, only to be restarted again for call2.get(), where it again runs until its result is available. In this way asynchronous execution progresses opportunistically, and only when the calling thread indicated it cannot progress until a result is available.

Owing to the inconvenient existence of Latch, an initial prototype was functional with only a 30 line change. In this way, an ugly and undesirable custom synchronization primitive has accidentally become the centrepiece of an important new feature.

Size Benefit

The intention is that threadless mode will become the new default in a future version. As it has much lower synchronization requirements, it becomes possible to move large pieces of code out of the bootstrap, including any relating to implementing the UNIX self-pipe trick, as required by Latch, and to wake the broker thread from user threads.

Instead this code can be moved to a new mitogen.threads module, where it can progressively upgrade an existing threadless mitogen.core, much like mitogen.parent already progressively upgrades it with an industrial-strength Poller as required.

Any code that can be removed from the bootstrap has an immediate benefit on cold start performance with large numbers of targets, as the bottleneck during cold start is often a restriction on bandwidth.

Performance Benefit

Threadless mode tallies in well with existing desires to lower latency and resource consumption, such as the plan to reduce context switches.

.right-aligned td, .right-aligned th { text-align: right; }
Runs of tests/bench/roundtrip.py with and without threadless
Threaded+Pinned Threadless
Average Round-trip Time 201 usec 131 usec (-34.82%)
Elapsed Time 4.220 sec 3.243 sec (-23.15%)
Context Switches 304,330 40,037 (-86.84%)
Instructions 10,663,813,051 8,876,096,105 (-16.76%)
Branches 2,146,781,967 1,784,930,498 (-15.85%)
Page Faults 6,412 17,529 (+173.37%)

Because no broker thread exists, no system calls are required to wake it when a message is enqueued, nor are any necessary to wake the user thread when a reply is received, nor any futex() calls due to one just-woke thread contending on a GIL that has not yet been released by a just-about-to-sleep peer. The effect across two communicating processes is a huge reduction in kernel/user mode switches, contributing to vastly reduced round-trip latency.

In the table an as-yet undiagnosed jump in page faults is visible. One possibility is that either the Python or C library allocator employs a different strategy in the absence of threads, the other is that a memory leak exists in the prototype.

Restrictions

Naturally this will place some restraints on execution. Transparent routing will no longer be quite so transparent, as it is not possible to execute a function call in a remote process that is also acting as a proxy to another process: proxying will not run while Dispatcher is busy executing the function call.

One simple solution is to start an additional child of the proxying process in which function calls will run, leaving its parent dedicated just to routing, i.e. exclusively dedicated to running what was previously the broker thread. It is expected this will require only a few lines of additional code to support in the Ansible extension.

For children of a threadless master, import statements will hang while the master is otherwise busy, but this is not much of a problem, since import statements usually happen once shortly after the first parent->child call, when the master will be waiting in a Latch.

For threadless children, no background thread exists to notice a parent has disconnected, and to ensure the process shuts down gracefully in case the main thread has hung. Some options are possible, including starting a subprocess for the task, or supporting SIGIO-based asynchronous IO, so the broker thread can run from the signal handler and notice the parent is gone.

Another restriction is that when threadless mode is enabled, Mitogen primitives cannot be used from multiple threads. After some consideration, while possible to support, it does not seem worth the complexity, and would prevent the aforementioned reduction of bootstrap code size.

Ongoing Work

Mitogen has quite an ugly concept of Services, added in a hurry during the initial Ansible extension development. Services represent a bundle of a callable method exposed to the network, a security policy determining who may call it, and an execution policy governing its concurrency requirements. Service execution always happens in a background thread pool, and is used to implement things like file transfer in the Ansible extension.

Despite heavy use, it has always been an ugly feature as it partially duplicates the normal parent->child function call mechanism. Looking at services from the perspective of threadless mode reveals some notion of a "threadless service", and how such a threadless service looks even more similar to a function call than previously.

It is possible that as part of the threadless work, the unification of function calls and services may finally happen, although no design for it is certain yet.

Summary

There are doubtlessly many edge cases left to discover, but threadless mode looks very doable, and promises to make Mitogen suitable in even more scenarios than before.

Until next time!

Just tuning in?

February 16, 2019 10:00 PM UTC


Python Insider

Python 2.7.16 release candidate 1 available

A release candidate for the upcoming 2.7.16 bug fix release is now available for download.

February 16, 2019 07:56 PM UTC


Weekly Python StackOverflow Report

(clxv) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2019-02-16 19:24:24 GMT


  1. Why does Python allow out-of-range slice indexes for sequences? - [55/2]
  2. How to efficiently use asyncio when calling a method on a BaseProxy? - [11/2]
  3. Python multiprocessing crashes docker container - [10/2]
  4. Python 3 pandas.groupby.filter - [9/5]
  5. Apply list of regex pattern on list python - [9/2]
  6. max/min function on list with strings and integers - [8/3]
  7. re.sub(".*", ", "(replacement)", "text") doubles replacement on Python 3.7 - [8/1]
  8. Fill Bounding Boxes in 2d array - [7/2]
  9. Slider button click in selenium python - [7/1]
  10. One line, three variables - [6/2]

February 16, 2019 07:25 PM UTC


Codementor

Getting Started with Pathlib

Introduction This tutorial will guide you on how to use the Pathlib module for working with filesystem paths, the benefits, and understand the problem it solves since the Python standard library…

February 16, 2019 07:23 PM UTC


gamingdirectional

Detect the player’s boundary

In this article, we will start to create the boundary detection mechanism which can be used to help the boy moving around the canvas. We will go slowly where this topic will take a few chapters to complete. In this chapter, we will focus on below issues. The boy will not be able to move past the horizontal boundary of either 0 or 576 pixels which is the physical boundary for the boy sprite.

Source

February 16, 2019 12:49 PM UTC


Vasudev Ram

pprint.isrecursive: Check if object requires recursive representation



- By Vasudev Ram - Online Python training / SQL training / Linux training


Tree image attribution

Hi, readers,

I was using the pprint module to pretty-print some Python data structures in a program I was writing. Then saw that it has a function called isrecursive.

The docstring for pprint.isrecursive says:
>>> print pprint.isrecursive.__doc__
Determine if object requires a recursive representation.
Here is a Python 3 shell session that shows what the isrecursive function does, with a list:
>>> import pprint
>>> print(pprint.pprint.__doc__)
Pretty-print a Python object to a stream [default is sys.stdout].
>>> a = []
>>> a
[]
>>> pprint.isrecursive(a)
False
>>>
>>> a.append(a)
>>> a
[[...]]
>>>
>>> pprint.isrecursive(a)
True
How about for a dict?
>>> b = {}
>>> pprint.isrecursive(b)
False
>>>
>>> b[1] = b
>>> b
{1: {...}}
>>> id(b) == id(b[1])
True
>>> pprint.isrecursive(b)
True
How about if an object is recursive, but not directly, like in the above two examples? Instead, it is recursive via a chain of objects:
>>> c = []
>>> d = []
>>> e = []
>>> c.append(d)
>>> d.append(e)
>>> c
[[[]]]
>>> pprint.isrecursive(c)
False
>>> e.append(c)
>>> c
[[[[...]]]]
>>> pprint.isrecursive(c)
True
So we can see that isrecursive is useful to detect some recursive Python object structures.
Interestingly, if I compare c with c[0] (after making c a recursive structure), I get:
>>> c == c[0]
Traceback (most recent call last):
File "", line 1, in
RecursionError: maximum recursion depth exceeded in comparison
>>>
In Python 2, I get:
RuntimeError: maximum recursion depth exceeded in cmp

Also, relevant XKCD-clone.

The image at the top of the post is of a tree created in the LOGO programming language using recursion.

- Enjoy.

- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Sell your digital products via DPD: Digital Publishing for Ebooks and Downloads.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Get a fast web site with A2 Hosting.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:



February 16, 2019 01:47 AM UTC

February 15, 2019


PyCharm

PyCharm 2019.1 EAP 4

Our fourth Early Access Program (EAP) version for PyCharm 2019.1 is now available on our website.

New in This Version

Parallel and concurrent testing with pytest

695a0367-eed2-45e1-96ed-46c70fc6067e

PyCharm makes it easy to run tests quickly using multiprocessing (parallelism) and multithreading (concurrency). All you need to do in order to run your pytest tests in parallel is to install the pytest-xdist plugin as a normal python package using the PyCharm’s package manager, specify pytest as the project testing framework, create a pytest run/debug configuration where you can specify the number of CPUs to run the tests on, and you’re good to go.

Read more about setting up and running pytest tests in parallel in our help

Further Improvements

Interested?

Download this EAP from our website. Alternatively, you can use the JetBrains Toolbox App to stay up to date throughout the entire EAP.

With PyCharm 2019.1 we’re moving to a new runtime environment: this EAP build already bundles the brand new JetBrains Runtime Environment (a customized version of JRE 11). Unfortunately, since this build uses the brand-new platform, the patch-update from previous versions is not available this time. Please use the full installation method instead.

If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm EAP, and stay up to date. You can find the installation instructions on our website.

PyCharm 2019.1 is in development during the EAP phase, therefore not all new features are already available. More features will be added in the coming weeks. As PyCharm 2019.1 is pre-release software, it is not as stable as the release versions. Furthermore, we may decide to change and/or drop certain features as the EAP progresses.

All EAP versions will ship with a built-in EAP license, which means that these versions are free to use for 30 days after the day that they are built. As EAPs are released weekly, you’ll be able to use PyCharm Professional Edition EAP for free for the duration of the EAP program, as long as you upgrade at least once every 30 days.

February 15, 2019 10:54 AM UTC

February 14, 2019


Continuum Analytics Blog

Intake released on Conda-Forge

Intake is a package for cataloging, finding and loading your data. It has been developed recently by Anaconda, Inc., and continues to gain new features. To read general information about Intake and how to use…

The post Intake released on Conda-Forge appeared first on Anaconda.

February 14, 2019 09:26 PM UTC


PyCon

Eighth Annual PyLadies Auction at PyCon 2019

Photo Courtesy of Mike Pirnat
PyLadies is an international mentorship community for women that use Python. Started with the help of a grant provided by The Python Software Foundation (PSF)  in 2011, PyLadies has continued to bring women into the Python community through a variety of methods, including hosting events in local PyLadies chapters and offering grant opportunities to attend PyCon. Their mission is to promote, educate and advance a diverse Python community through outreach, education, conferences, events, and social gatherings.

The Python Software Foundation (PSF) is proud to announce the Eighth Annual PyCon Charity Auction for 2019.

PyCon 2018’s auction was a huge success raising over $30K! More than 40 items from sponsors and fellow attendees were auctioned. Attendance was overwhelming and, rather than turn more people away for 2019, we have decided to increase capacity this year!

The PSF subsidizes this event each year by covering the cost of the venue, food, and beverages. In addition, the PSF adds a substantial donation to the event after everything is auctioned off.  

If you are interested in donating and item for the auction send the information to pycon-auction@python.org

Thinking about becoming a Sponsor?

We are hoping to find one or two companies who love what the PyLadies are doing and are willing to sponsor this wonderful event. It’s a great opportunity to let our community know that you support them. Sponsorships start at $7500. More information can be found here, or you can contact pycon-sponsors@python.org.


Photo Courtesy of Mike Pirnat
PyLadies also aims to provide a friendly support network for women and a bridge to the larger Python world. Anyone with an interest in Python is encouraged to participate! Check out local meetups here.

The auction is a fun and entertaining way to support the PyLadies community.

We hope to see you in Cleveland!









February 14, 2019 06:43 PM UTC


Made With Mu

A GPIOZero Theramin for Valentine’s Day

Thanks to Ben Nuttall, one of the maintainers of the wonderful, GPIOZero library, love is in the air.

Why not serenade your Valentine with Mu, a distance sensor, speaker, Raspberry Pi and some nifty Python code which uses GPIOZero to turn the hardware into a romantic Theramin..?

It’s only four lines of code, thus showing how easy it is to make cool hardware hacks with Mu, Python and GPIOZero:

from gpiozero import TonalBuzzer, DistanceSensor

buzzer = TonalBuzzer(20)
ds = DistanceSensor(14, 26)

buzzer.source = ds

The end result is full of love (for GPIO related shenanigans):


February 14, 2019 09:00 AM UTC


Talk Python to Me

#199 Automate all the things with Python at Zapier

Do your applications call a lot of APIs? Maybe you have a bunch of microservices driving your app. You probably don't have the crazy combinatorial explosion that Zapier does for connecting APIs! They have millions of users automating things with 1,000s of APIs. It's pretty crazy. And they are doing it all with Python. Join me and Bryan Helmig, the CTO and co-founder of Zapier as we discuss how they pull this off with Python.

February 14, 2019 08:00 AM UTC


Python Bytes

#117 Is this the end of Python virtual environments?

February 14, 2019 08:00 AM UTC


Codementor

What You Don't Know About Python Variables

The first time you get introduced to Python’s variable, it is usually defined as “parts of your computer’s memory where you store some information.” Some define it as a “storage placeholder for texts and numbers." Python variables is more than he above definition.

February 14, 2019 12:14 AM UTC

February 13, 2019


Dataquest

How to Learn Python for Data Science In 5 Steps

Why Learn Python For Data Science?

How to Learn Python for Data Science In 5 Steps

Before we explore how to learn Python for data science, we should briefly answer why you should learn Python in the first place.

In short, understanding Python is one of the valuable skills needed for a data science career.

Though it hasn’t always been, Python is the programming language of choice for data science. Here’s a brief history:

  • In 2016, it overtook R on Kaggle, the premier platform for data science competitions.
  • In 2017, it overtook R on KDNuggets’s annual poll of data scientists’ most used tools.
  • In 2018, 66% of data scientists reported using Python daily, making it the number one tool for analytics professionals.

Data science experts expect this trend to continue with increasing development in the Python ecosystem. And while your journey to learn Python programming may be just beginning, it’s nice to know that employment opportunities are abundant (and growing) as well.

According to Indeed, the average salary for a Data Scientist is $127,918.

The good news? That number is only expected to increase. The experts at IBM predicted a 28% increase in demand for data scientists by the year 2020.

So, the future is bright for data science, and Python is just one piece of the proverbial pie. Fortunately, learning Python and other programming fundamentals is as attainable as ever. We’ll show you how in five simple steps.

But remember – just because the steps are simple doesn’t mean you won’t have to put in the work. If you apply yourself and dedicate meaningful time to learning Python, you have the potential to not only pick up a new skill, but potentially bring your career to a new level.

How to Learn Python for Data Science

How to Learn Python for Data Science In 5 Steps

First, you’ll want to find the right course to help you learn Python programming. Dataquest’s courses are specifically designed for you to learn Python for data science at your own pace.

In addition to learning Python in a course setting, your journey to becoming a data scientist should also include soft skills. Plus, there are some complimentary technical skills we recommend you learn along the way.

Step 1: Learn Python Fundamentals

Everyone starts somewhere. This first step is where you’ll learn Python programming basics. You’ll also want an introduction to data science.

One of the important tools you should start using early in your journey is Jupyter Notebook, which comes prepackaged with Python libraries to help you learn these two things.

Kickstart your learning by: Joining a community

By joining a community, you’ll put yourself around like-minded people and increase your opportunities for employment. According to the Society for Human Resource Management, employee referrals account for 30% of all hires.

Create a Kaggle account, join a local Meetup group, and participate in Dataquest’s members-only Slack discussions with current students and alums.

Related skills: Try the Command Line Interface

The Command Line Interface (CLI) lets you run scripts more quickly, allowing you to test programs faster and work with more data.

Step 2: Practice Mini Python Projects

We truly believe in hands-on learning. You may be surprised by how soon you’ll be ready to build small Python projects.

Try programming things like calculators for an online game, or a program that fetches the weather from Google in your city. Building mini projects like these will help you learn Python. programming projects like these are standard for all languages, and a great way to solidify your understanding of the basics.

You should start to build your experience with APIs and begin web scraping. Beyond helping you learn Python programming, web scraping will be useful for you in gathering data later.

Kickstart your learning by: Reading

Enhance your coursework and find answers to the Python programming challenges you encounter. Read guidebooks, blog posts, and even other people’s open source code to learn Python and data science best practices - and get new ideas.

Automate The Boring Stuff With Python by Al Sweigart is an excellent and entertaining resource.

Related skills: Work with databases using SQL

SQL is used to talk to databases to alter, edit, and reorganize information. SQL is a staple in the data science community, as 40% of data scientists report consistently using it.*

Step 3: Learn Python Data Science Libraries

Unlike some other programming languages, in Python, there is generally a best way of doing something. The three best and most important Python libraries for data science are NumPy, Pandas, and Matplotlib.

NumPy and Pandas are great for exploring and playing with data. Matplotlib is a data visualization library that makes graphs like you’d find in Excel or Google Sheets.

Kickstart your learning by: Asking questions

You don’t know what you don’t know!

Python has a rich community of experts who are eager to help you learn Python. Resources like Quora, Stack Overflow, and Dataquest’s Slack are full of people excited to share their knowledge and help you learn Python programming. We also have an FAQ for each mission to help with questions you encounter throughout your programming courses with Dataquest.

Related skills: Use Git for version control

Git is a popular tool that helps you keep track of changes made to your code, which makes it much easier to correct mistakes, experiment, and collaborate with others.

Step 4: Build a Data Science Portfolio as you Learn Python

For aspiring data scientists, a portfolio is a must.

These projects should include several different datasets and should leave readers with interesting insights that you’ve gleaned. Your portfolio doesn’t need a particular theme; find datasets that interest you, then come up with a way to put them together.

Displaying projects like these gives fellow data scientists something to collaborate on and shows future employers that you’ve truly taken the time to learn Python and other important programming skills.

One of the nice things about data science is that your portfolio doubles as a resume while highlighting the skills you’ve learned, like Python programming.

Kickstart your learning by: Communicating, collaborating, and focusing on technical competence

During this time, you’ll want to make sure you’re cultivating those soft skills required to work with others, making sure you really understand the inner workings of the tools you’re using.

Related skills: Learn beginner and intermediate statistics

While learning Python for data science, you’ll also want to get a solid background in statistics. Understanding statistics will give you the mindset you need to focus on the right things, so you’ll find valuable insights (and real solutions) rather than just executing code.

Step 5: Apply Advanced Data Science Techniques

Finally, aim to sharpen your skills. Your data science journey will be full of constant learning, but there are advanced courses you can complete to ensure you’ve covered all the bases.

You’ll want to be comfortable with regression, classification, and k-means clustering models. You can also step into machine learning - bootstrapping models and creating neural networks using scikit-learn.

At this point, programming projects can include creating models using live data feeds. Machine learning models of this kind adjust their predictions over time.

Remember to: Keep learning!

Data science is an ever-growing field that spans numerous industries.

At the rate that demand is increasing, there are exponential opportunities to learn. Continue reading, collaborating, and conversing with others, and you’re sure to maintain interest and a competitive edge over time.

How Long Will It Take To Learn Python?

After reading these steps, the most common question we have people ask us is: “How long does all this take?”

There are a lot of estimates for the time it takes to learn Python. For data science specifically, estimates a range from 3 months to a year of consistent practice.

We’ve watched people move through our courses at lightning speed and others who have taken it much slower.

Really, it all depends on your desired timeline, free time that you can dedicate to learn Python programming and the pace at which you learn.

Dataquest’s courses are created for you to go at your own speed. Each path is full of missions, hands-on learning and opportunities to ask questions so that you get can an in-depth mastery of data science fundamentals.

Get started for free. Learn Python with our Data Scientist path and start mastering a new skill today.

Resources and studies cited:

February 13, 2019 03:57 PM UTC


Kushal Das

Tracking my phone's silent connections

My phone has more friends than me. It talks to more peers (computers) than the number of human beings I talk on an average. In this age of smartphones and mobile apps for A-Z things, we are dependent on these technologies. However, at the same time, we don’t know much of what is going on in the computers equipped with powerful cameras, GPS device, microphone we are carrying all the time. All these apps are talking to their respective servers (or can we call them masters?), but, there is no easy way to track them.

These questions bothered me for a long time: I wanted to see the servers my phone is connecting to, and I want to block those connections as I wish. However, I never managed to work on this. A few weeks ago, I finally sat down to start working to build up a system by reusing already available open source projects and tools to create the system, which will allow me to track what my phone is doing. Maybe not in full details, but, at least shed some light on the network traffic from the phone.

Initial trial

I tried to create a wifi hotspot at home using a Raspberry Pi and then started capturing all the packets from the device using standard tools (dumpcap) and later reading through the logs using Wireshark. This procedure meant that I could only capture when I am connected to the network at home. What about when I am not at home?

Next round

This time I took a bit different approach. I chose algo to create a VPN server. Using WireGuard, it became straightforward to connect my iPhone to the VPN. This process also allows capturing all the traffic from the phone very easily on the VPN server. A few days in the experiment, Kashmir started posting her experiment named Life Without the Tech Giants, where she started blocking all the services from 5 big technology companies. With her help, I contacted Dhruv Mehrotra, who is a technologist behind the story. After talking to him, I felt that I am going in the right direction. He already posted details on how they did the blocking, and you can try that at home :)

Looking at the data after 1 week

After capturing the data for the first week, I moved the captured pcap files into my computer. Wrote some Python code to put the data into a SQLite database, enabling me to query the data much faster.

Domain Name System (DNS) data

The Domain Name System (DNS) is a decentralized system which helps to translate the human memory safe domain names (like kushaldas.in) into Internet Protocol (IP) addresses (like 192.168.1.1 ). Computers talk to each other using these IP addresses, we, don’t have to worry to remember so many names. When the developers develop their applications for the phone, they generally use those domain names to specify where the app should connect.

If I plot all the different domains (including any subdomain) which got queried at least 10 times in a week, we see the following graph.

The first thing to notice is how the phone is trying to find servers from Apple, which makes sense as this is an iPhone. I use the mobile Twitter app a lot, so we also see many queries related to Twitter. Lookout is a special mention there, it was suggested to me by my friends who understand these technologies and security better than me. The 3rd position is taken by Google, though sometimes I watch Youtube videos, but, the phone queried for many other Google domains.

There are also many queries to Akamai CDN service, and I could not find any easy way to identify those hosts, the same with Amazon AWS related hosts. If you know any better way, please drop me a note.

You can see a lot of data analytics related companies were also queried. dev.appboy.com is a major one, and thankfully algo already blocked that domain in the DNS level. I don’t know which app is trying to connect to which all servers, I found about a few of the apps in my phone by searching about the client list of the above-mentioned analytics companies. Next, in coming months, I will start blocking those hosts/domains one by one and see which all apps stop working.

Looking at data flow

The number of DNS queries is an easy start, but, next I wanted to learn more about the actual servers my phone is talking to. The paranoid part inside of me was pushing for discovering these servers.

If we put all of the major companies the phone is talking to, we get the following graph.

Apple is leading the chart by taking 44% of all the connections, and the number is 495225 times. Twitter is in the second place, and Edgecastcdn is in the third. My phone talked to Google servers 67344 number of times, which is like 7 times less than the number of times Apple itself.

In the next graph, I removed the big players (including Google and Amazon). Then, I can see that analytics companies like nflxso.net and mparticle.com have 31% of the connections, which is a lot. Most probably I will start with blocking these two first. The 3 other CDN companies, Akamai, Cloudfront, and Cloudflare has 8%, 7%, and 6% respectively. Do I know what all things are these companies tracking? Nope, and that is scary enough that one of my friend commented “It makes me think about throwing my phone in the garbage.”

What about encrypted vs unencrypted traffic? What all protocols are being used? I tried to find the answer for the first question, and the answer looks like the following graph. Maybe the number will come down if I try to refine the query and add other parameters, that is a future task.

What next?

As I said earlier, I am working on creating a set of tools, which then can be deployed on the VPN server, that will provide a user-friendly way to monitor, and block/unblock traffic from their phone. The major part of the work is to make sure that the whole thing is easy to deploy, and can be used by someone with less technical knowledge.

How can you help?

The biggest thing we need is the knowledge of “How to analyze the data we are capturing?”. It is one thing to make reports for personal user, but, trying to help others is an entirely different game altogether. We will, of course, need all sorts of contributions to the project. Before anything else, we will have to join the random code we have, into a proper project structure. Keep following this blog for more updates and details about the project.

Note to self

Do not try to read data after midnight, or else I will again think a local address as some random dynamic address in Bangkok and freak out (thank you reverse-dns).

February 13, 2019 02:47 AM UTC

February 12, 2019


The No Title® Tech Blog

Why and how I have just redesigned my (other) website

Going through a moment of change in my life, I have decided to redesign my other website, using Pelican and other open source tools. The older version was starting to look a bit aged, especially on mobile devices, so it seemed like a good idea to start a complete makeover. As they use to say, new year… new website.

February 12, 2019 11:15 PM UTC


Codementor

Testing isn't everything, but it's important

A talk that I've been thinking about for the last little while is one by Gary Bernhardt called Ideology (https://www.destroyallsoftware.com/talks/ideology). I highly recommend that you go watch it...

February 12, 2019 09:32 PM UTC


Real Python

Supercharge Your Classes With Python super()

While Python isn’t purely an object-oriented language, it’s flexible enough and powerful enough to allow you to build your applications using the object-oriented paradigm. One of the ways in which Python achieves this is by supporting inheritance, which it does with super().

In this tutorial, you’ll learn about the following:

Free Bonus: 5 Thoughts On Python Mastery, a free course for Python developers that shows you the roadmap and the mindset you'll need to take your Python skills to the next level.

An Overview of Python’s super() Function

If you have experience with object-oriented languages, you may already be familiar with the functionality of super().

If not, don’t fear! While the official documentation is fairly technical, at a high level super() gives you access to methods in a superclass from the subclass that inherits from it.

super() alone returns a temporary object of the superclass that then allows you to call that superclass’s methods.

Why would you want to do any of this? While the possibilities are limited by your imagination, a common use case is building classes that extend the functionality of previously built classes.

Calling the previously built methods with super() saves you from needing to rewrite those methods in your subclass, and allows you to swap out superclasses with minimal code changes.

super() in Single Inheritance

If you’re unfamiliar with object-oriented programming concepts, inheritance might be an unfamiliar term. Inheritance is a concept in object-oriented programming in which a class derives (or inherits) attributes and behaviors from another class without needing to implement them again.

For me at least, it’s easier to understand these concepts when looking at code, so let’s write classes describing some shapes:

class Rectangle:
    def __init__(self, length, width):
        self.length = length
        self.width = width

    def area(self):
        return self.length * self.width

    def perimeter(self):
        return 2 * self.length + 2 * self.width

class Square:
    def __init__(self, length):
        self.length = length

    def area(self):
        return self.length * self.length

    def perimeter(self):
        return 4 * self.length

Here, there are two similar classes: Rectangle and Square.

You can use them as below:

>>>
>>> square = Square(4)
>>> square.area()
16
>>> rectangle = Rectangle(2,4)
>>> rectangle.area()
8

In this example, you have two shapes that are related to each other: a square is a special kind of rectangle. The code, however, doesn’t reflect that relationship and thus has code that is essentially repeated.

By using inheritance, you can reduce the amount of code you write while simultaneously reflecting the real-world relationship between rectangles and squares:

class Rectangle:
    def __init__(self, length, width):
        self.length = length
        self.width = width

    def area(self):
        return self.length * self.width

    def perimeter(self):
        return 2 * self.length + 2 * self.width

# Here we declare that the Square class inherits from the Rectangle class
class Square(Rectangle):
    def __init__(self, length):
        super().__init__(length, length)

Here, you’ve used super() to call the __init__() of the Rectangle class, allowing you to use it in the Square class without repeating code. Below, the core functionality remains after making changes:

>>>
>>> square = Square(4)
>>> square.area()
16

In this example, Rectangle is the superclass, and Square is the subclass.

Because the Square and Rectangle .__init__() methods are so similar, you can simply call the superclass’s .__init__() method (Rectangle.__init__()) from that of Square by using super(). This sets the .length and .width attributes even though you just had to supply a single length parameter to the Square constructor.

When you run this, even though your Square class doesn’t explicitly implement it, the call to .area() will use the .area() method in the superclass and print 16. The Square class inherited .area() from the Rectangle class.

Note: To learn more about inheritance and object-oriented concepts in Python, be sure to check out Object-Oriented Programming (OOP) in Python 3.

What Can super() Do for You?

So what can super() do for you in single inheritance?

Like in other object-oriented languages, it allows you to call methods of the superclass in your subclass. The primary use case of this is to extend the functionality of the inherited method.

In the example below, you will create a class Cube that inherits from Square and extends the functionality of .area() (inherited from the Rectangle class through Square) to calculate the surface area and volume of a Cube instance:

class Square(Rectangle):
    def __init__(self, length):
        super().__init__(length, length)

class Cube(Square):
    def surface_area(self):
        face_area = super().area()
        return face_area * 6

    def volume(self):
        face_area = super().area()
        return face_area * self.length

Now that you’ve built the classes, let’s look at the surface area and volume of a cube with a side length of 3:

>>>
>>> cube = Cube(3)
>>> cube.surface_area()
54
>>> cube.volume()
27

Caution: Note that in our example above, super() alone won’t make the method calls for you: you have to call the method on the proxy object itself.

Here you have implemented two methods for the Cube class: .surface_area() and .volume(). Both of these calculations rely on calculating the area of a single face, so rather than reimplementing the area calculation, you use super() to extend the area calculation.

Also notice that the Cube class definition does not have an .__init__(). Because Cube inherits from Square and .__init__() doesn’t really do anything differently for Cube than it already does for Square, you can skip defining it, and the .__init__() of the superclass (Square) will be called automatically.

super() returns a delegate object to a parent class, so you call the method you want directly on it: super().area().

Not only does this save us from having to rewrite the area calculations, but it also allows us to change the internal .area() logic in a single location. This is especially in handy when you have a number of subclasses inheriting from one superclass.

A super() Deep Dive

Before heading into multiple inheritance, let’s take a quick detour into the mechanics of super().

While the examples above (and below) call super() without any parameters, super() can also take two parameters: the first is the subclass, and the second parameter is an object that is an instance of that subclass.

First, let’s see two examples showing what manipulating the first variable can do, using the classes already shown:

class Rectangle:
    def __init__(self, length, width):
        self.length = length
        self.width = width

    def area(self):
        return self.length * self.width

    def perimeter(self):
        return 2 * self.length + 2 * self.width

class Square(Rectangle):
    def __init__(self, length):
        super(Square, self).__init__(self, length, length)

In Python 3, the super(Square, self) call is equivalent to the parameterless super() call. The first parameter refers to the subclass Square, while the second parameter refers to a Square object which, in this case, is self. You can call super() with other classes as well:

class Cube(Square):
    def surface_area(self):
        face_area = super(Square, self).area()
        return face_area * 6

    def volume(self):
        face_area = super(Square, self).area()
        return face_area * self.length

In this example, you are setting Square as the subclass argument to super(), instead of Cube. This causes super() to start searching for a matching method (in this case, .area()) at one level above Square in the instance hierarchy, in this case Rectangle.

In this specific example, the behavior doesn’t change. But imagine that Square also implemented an .area() function that you wanted to make sure Cube did not use. Calling super() in this way allows you to do that.

Caution: While we are doing a lot of fiddling with the parameters to super() in order to explore how it works under the hood, I’d caution against doing this regularly.

The parameterless call to super() is recommended and sufficient for most use cases, and needing to change the search hierarchy regularly could be indicative of a larger design issue.

What about the second parameter? Remember, this is an object that is an instance of the class used as the first parameter. For an example, isinstance(Cube, Square) must return True.

By including an instantiated object, super() returns a bound method: a method that is bound to the object, which gives the method the object’s context such as any instance attributes. If this parameter is not included, the method returned is just a function, unassociated with an object’s context.

For more information about bound methods, unbound methods, and functions, read the Python documentation on its descriptor system.

Note: Technically, super() doesn’t return a method. It returns a proxy object. This is an object that delegates calls to the correct class methods without making an additional object in order to do so.

super() in Multiple Inheritance

Now that you’ve worked through an overview and some examples of super() and single inheritance, you will be introduced to an overview and some examples that will demonstrate how multiple inheritance works and how super() enables that functionality.

Multiple Inheritance Overview

There is another use case in which super() really shines, and this one isn’t as common as the single inheritance scenario. In addition to single inheritance, Python supports multiple inheritance, in which a subclass can inherit from multiple superclasses that don’t necessarily inherit from each other (also known as sibling classes).

I’m a very visual person, and I find diagrams are incredibly helpful to understand concepts like this. The image below shows a very simple multiple inheritance scenario, where one class inherits from two unrelated (sibling) superclasses:

A diagrammed example of multiple inheritanceA diagrammed example of multiple inheritance (Image: Kyle Stratis)

To better illustrate multiple inheritance in action, here is some code for you to try out, showing how you can build a right pyramid (a pyramid with a square base) out of a Triangle and a Square:

class Triangle:
    def __init__(self, base, height):
        self.base = base
        self.height = height

    def area(self):
        return 0.5 * self.base * self.height

class RightPyramid(Triangle, Square):
    def __init__(self, base, slant_height):
        self.base = base
        self.slant_height = slant_height

    def area(self):
        base_area = super().area()
        perimeter = super().perimeter()
        return 0.5 * perimeter * self.slant_height + base_area

Note: The term slant height may be unfamiliar, especially if it’s been a while since you’ve taken a geometry class or worked on any pyramids.

The slant height is the height from the center of the base of an object (like a pyramid) up its face to the peak of that object. You can read more about slant heights at WolframMathWorld.

This example declares a Triangle class and a RightPyramid class that inherits from both Square and Triangle.

You’ll see another .area() method that uses super() just like in single inheritance, with the aim of it reaching the .perimeter() and .area() methods defined all the way up in the Rectangle class.

Note: You may notice that the code above isn’t using any inherited properties from the Triangle class yet. Later examples will fully take advantage of inheritance from both Triangle and Square.

The problem, though, is that both superclasses (Triangle and Square) define a .area(). Take a second and think about what might happen when you call .area() on RightPyramid, and then try calling it like below:

>>>
>> pyramid = RightPyramid(2, 4)
>> pyramid.area()
Traceback (most recent call last):
  File "shapes.py", line 63, in <module>
    print(pyramid.area())
  File "shapes.py", line 47, in area
    base_area = super().area()
  File "shapes.py", line 38, in area
    return 0.5 * self.base * self.height
AttributeError: 'RightPyramid' object has no attribute 'height'

Did you guess that Python will try to call Triangle.area()? This is because of something called the method resolution order.

Note: How did we notice that Triangle.area() was called and not, as we hoped, Square.area()? If you look at the last line of the traceback (before the AttributeError), you’ll see a reference to a specific line of code:

return 0.5 * self.base * self.height

You may recognize this from geometry class as the formula for the area of a triangle. Otherwise, if you’re like me, you might have scrolled up to the Triangle and Rectangle class definitions and seen this same code in Triangle.area().

Method Resolution Order

The method resolution order (or MRO) tells Python how to search for inherited methods. This comes in handy when you’re using super() because the MRO tells you exactly where Python will look for a method you’re calling with super() and in what order.

Every class has an .__mro__ attribute that allows us to inspect the order, so let’s do that:

>>>
>>> RightPyramid.__mro__
(<class '__main__.RightPyramid'>, <class '__main__.Triangle'>, 
 <class '__main__.Square'>, <class '__main__.Rectangle'>, 
 <class 'object'>)

This tells us that methods will be searched first in Rightpyramid, then in Triangle, then in Square, then Rectangle, and then, if nothing is found, in object, from which all classes originate.

The problem here is that the interpreter is searching for .area() in Triangle before Square and Rectangle, and upon finding .area() in Triangle, Python calls it instead of the one you want. Because Triangle.area() expects there to be a .height and a .base attribute, Python throws an AttributeError.

Luckily, you have some control over how the MRO is constructed. Just by changing the signature of the RightPyramid class, you can search in the order you want, and the methods will resolve correctly:

class RightPyramid(Square, Triangle):
    def __init__(self, base, slant_height):
        self.base = base
        self.slant_height = slant_height
        super().__init__(self.base)

    def area(self):
        base_area = super().area()
        perimeter = super().perimeter()
        return 0.5 * perimeter * self.slant_height + base_area

Notice that RightPyramid initializes partially with the .__init__() from the Square class. This allows .area() to use the .length on the object, as is designed.

Now, you can build a pyramid, inspect the MRO, and calculate the surface area:

>>>
>>> pyramid = RightPyramid(2, 4)
>>> RightPyramid.__mro__
(<class '__main__.RightPyramid'>, <class '__main__.Square'>, 
<class '__main__.Rectangle'>, <class '__main__.Triangle'>, 
<class 'object'>)
>>> pyramid.area()
20.0

You see that the MRO is now what you’d expect, and you can inspect the area of the pyramid as well, thanks to .area() and .perimeter().

There’s still a problem here, though. For the sake of simplicity, I did a few things wrong in this example: the first, and arguably most importantly, was that I had two separate classes with the same method name and signature.

This causes issues with method resolution, because the first instance of .area() that is encountered in the MRO list will be called.

When you’re using super() with multiple inheritance, it’s imperative to design your classes to cooperate. Part of this is ensuring that your methods are unique so that they get resolved in the MRO, by making sure method signatures are unique—whether by using method names or method parameters.

In this case, to avoid a complete overhaul of your code, you can rename the Triangle class’s .area() method to .tri_area(). This way, the area methods can continue using class properties rather than taking external parameters:

class Triangle:
    def __init__(self, base, height):
        self.base = base
        self.height = height
        super().__init__()

    def tri_area(self):
        return 0.5 * self.base * self.height

Let’s also go ahead and use this in the RightPyramid class:

class RightPyramid(Square, Triangle):
    def __init__(self, base, slant_height):
        self.base = base
        self.slant_height = slant_height
        super().__init__(self.base)

    def area(self):
        base_area = super().area()
        perimeter = super().perimeter()
        return 0.5 * perimeter * self.slant_height + base_area

    def area_2(self):
        base_area = super().area()
        triangle_area = super().tri_area()
        return triangle_area * 4 + base_area

The next issue here is that the code doesn’t have a delegated Triangle object like it does for a Square object, so calling .area_2() will give us an AttributeError since .base and .height don’t have any values.

You need to do two things to fix this:

  1. All methods that are called with super() need to have a call to their superclass’s version of that method. This means that you will need to add super().__init__() to the .__init__() methods of Triangle and Rectangle.

  2. Redesign all the .__init__() calls to take a keyword dictionary. See the complete code below.

class Rectangle:
    def __init__(self, length, width, **kwargs):
        self.length = length
        self.width = width
        super().__init__(**kwargs)

    def area(self):
        return self.length * self.width

    def perimeter(self):
        return 2 * self.length + 2 * self.width

# Here we declare that the Square class inherits from 
# the Rectangle class
class Square(Rectangle):
    def __init__(self, length, **kwargs):
        super().__init__(length=length, width=length, **kwargs)

class Cube(Square):
    def surface_area(self):
        face_area = super().area()
        return face_area * 6

    def volume(self):
        face_area = super().area()
        return face_area ** 3

class Triangle:
    def __init__(self, base, height, **kwargs):
        self.base = base
        self.height = height
        super().__init__(**kwargs)

    def tri_area(self):
        return 0.5 * self.base * self.height

class RightPyramid(Square, Triangle):
    def __init__(self, base, slant_height, **kwargs):
        self.base = base
        self.slant_height = slant_height
        kwargs["height"] = slant_height
        kwargs["length"] = base
        super().__init__(base=base, **kwargs)

    def area(self):
        base_area = super().area()
        perimeter = super().perimeter()
        return 0.5 * perimeter * self.slant_height + base_area

    def area_2(self):
        base_area = super().area()
        triangle_area = super().tri_area()
        return triangle_area * 4 + base_area

There are a number of important differences in this code:

Note: Following the state of kwargs can be tricky here, so here’s a table of .__init__() calls in order, showing the class that owns that call, and the contents of kwargs during that call:

Class Named Arguments kwargs
RightPyramid base, slant_height
Square length base, height
Rectangle length, width base, height
Triangle base, height

Now, when you use these updated classes, you have this:

>>>
>>> pyramid = RightPyramid(base=2, slant_height=4)
>>> pyramid.area()
20.0
>>> pyramid.area_2()
20.0

It works! You’ve used super() to successfully navigate a complicated class hierarchy while using both inheritance and composition to create new classes with minimal reimplementation.

Multiple Inheritance Alternatives

As you can see, multiple inheritance can be useful but also lead to very complicated situations and code that is hard to read. It’s also rare to have objects that neatly inherit everything from more than multiple other objects.

If you see yourself beginning to use multiple inheritance and a complicated class hierarchy, it’s worth asking yourself if you can achieve code that is cleaner and easier to understand by using composition instead of inheritance.

With composition, you can add very specific functionality to your classes from a specialized, simple class called a mixin.

Since this article is focused on inheritance, I won’t go into too much detail on composition and how to wield it in Python, but here’s a short example using VolumeMixin to give specific functionality to our 3D objects—in this case, a volume calculation:

class Rectangle:
    def __init__(self, length, width):
        self.length = length
        self.width = width

    def area(self):
        return self.length * self.width

class Square(Rectangle):
    def __init__(self, length):
        super().__init__(length, length)

class VolumeMixin:
    def volume(self):
        return self.area() * self.height

class Cube(VolumeMixin, Square):
    def __init__(self, length):
        super().__init__(length)
        self.height = length

    def area(self):
        return super().area() * 6

In this example, the code was reworked to include a mixin called VolumeMixin. The mixin is then used by Cube and gives Cube the ability to calculate its volume, which is shown below:

>>>
>>> cube = Cube(2)
>>> cube.area()
24
>>> cube.volume()
48

This mixin can be used the same way in any class that has an area defined for it and for which the formula area * height returns the correct volume.

A super() Recap

In this tutorial, you learned how to supercharge your classes with super(). Your journey started with a review of single inheritance and then showed how to call superclass methods easily with super().

You then learned how multiple inheritance works in Python, and techniques to combine super() with multiple inheritance. You also learned about how Python resolves method calls using the method resolution order (MRO), as well as how to inspect and modify the MRO to ensure appropriate methods are called at appropriate times.

For more information about object-oriented programming in Python and using super(), check out these resources:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

February 12, 2019 09:01 PM UTC


Codementor

Python, For The ❤ of It - part 3 (What I Built With It)

My Journey Into One of World's Most Awesome Languages

February 12, 2019 08:42 PM UTC


PyCoder’s Weekly

Issue #355 (Feb. 12, 2019)

#355 – FEBRUARY 12, 2019
View in Browser »

The PyCoder’s Weekly Logo


Goodbye Virtual Environments?

Could an npm-style __pypackages__ in your app’s project folder be an alternative to using virtual environments? Chad explores this question in his article and goes over PEP 582 as a sample implementation of this idea.
CHAD SMITH

The State of Python Packaging

Describes where Python packaging ecosystem is today, and where the Python Packaging Authority hopes will move next.
BERNAT.TECH

Find a Python Job Through Vettery

alt

Vettery specializes in developer roles and is completely free for job seekers. Interested? Submit your profile, and if accepted, you can receive interview requests directly from top companies seeking Python devs. Get started →
VETTERY sponsor

Python Elects a Steering Council

“After a two-week voting period, which followed a two-week nomination window, Python now has its governance back in place—with a familiar name in the mix.”
JAKE EDGE

Incrementally Migrating Over One Million Lines of Code From Python 2 to Python 3

How the Dropbox team handled their migration from Python 2 → 3. Great list of lessons learned at the end!
DROPBOX.COM

PyCon 2020-2021 Location Announced

The PSF announced that PyCon will be held in Pittsburgh in 2020 and 2021!
PYCON.BLOGSPOT.COM • Shared by Ricky White

Discussions

Conventions for the Order of Methods in Class Definitions?

Dunder methods all first? Alphabetical? How do you do it?
TWITTER.COM/PURPLEDIANE88

Which Python Packages Should I Study in Order to Develop My Python Skills?

REDDIT

Moving Away From Pipenv

REDDIT

Why Does Python Live On Land…?

Ba-dum-tss!
TWITTER.COM/REALPYTHON

Python Jobs

Senior Systems Engineer (Hamilton, ON, Canada)

Preteckt

Python Web Developer (Remote)

Premiere Digital Services

Software Developer (Herndon, VA)

L2T, LLC

Tech Lead / Senior Software Engineer (Seattle, WA)

Indeed.com Incubator

Python Software Engineer (London, UK)

Pole Star Space Applications Ltd.

Senior Engineer Python & More (Winterthur, Switzerland)

DEEP IMPACT AG

Sr Enterprise Python Developer (Toronto, ON, Canada)

Kognitiv

Senior Software Engineer (Santa Monica, CA)

GoodRX

Computer Science Teacher (Pasadena, CA)

ArtCenter College of Design

Senior Python Engineer (New York, NY)

15Five

Software Engineer (Herndon, VA)

Charon Technologies

Web UI Developer (Herndon, VA)

Charon Technologies

More Python Jobs >>>

Articles & Tutorials

The Ultimate List of Data Science Podcasts

Over a dozen shows that discuss topics in big data, data analysis, statistics, machine learning, and artificial intelligence. What’s your pick?
REAL PYTHON

Python Exceptions Considered an Anti-Pattern

Nikita goes over the drawbacks of Python exceptions and makes a case for why they could be considered an anti-pattern in some cases. Oh, and he also proposes a solution… Worth a read!
NIKITA SOBOLEV opinion

Take Control of Your Job Search With Indeed Prime

alt

With Indeed Prime, you’re in the driver’s seat. Tell us about your skills, career goals, and salary requirements and we’ll match you with top companies looking to hire candidates like you. Apply today to get started!
INDEED sponsor

A Successful Python 3 Migration Story

How the Zato engineering team migrated 130,000 lines of code from Python 2 to Python 3.
ZATO.IO

Python in Education: Request for Ideas

The PSF wants to hear your ideas on ways it can fund work to improve Python in education.
PSF

Python 3 Template Strings Instead of External Template Engine

I’ve been a fan of Python’s template strings and this article demonstrates a good use case for them.
ESHLOX.NET

Python Architecture Stuff: Do We Need More?

Some good resources linked in this article if you’re looking to improve the architecture of your Python apps, in order to make them easier to test, for example.
OBEYTHETESTINGGOAT.COM

Trying Out the := “Walrus Operator” in Python 3.8

The first alpha of Python 3.8 was just released. With that comes a mayor new feature in the form of PEP 572 (Assignment Expressions). Alexander demos this new feature in this short & sweet article.
ALEXANDER HULTNÉR

Python Itertools: For a Faster and Memory Efficient Code

KANOKI.ORG

Bayesian Analysis With Python (Interview With Osvaldo Martin)

Osvaldo Martin is one of the developers of PyMC3 and ArviZ. He is a researcher specialized in Bayesian statistics and data science.
FEDERICO CARRONE

Master Intermediate Python Skills With “Python 201”

If you already know the basics of Python and now you want to go to the next level, then this is the book for you. This book is for intermediate level Python programmers only—there won’t be any beginner chapters here. Learn More →
MIKE DRISCOLL book sponsor

Projects & Code

PythonEXE: How to Create an Executable File From a Python Script?

A simple project that demonstrates how to create an executable from a Python project.
GITHUB.COM/JABBALACI

Django Bugfix Releases: 2.1.7, 2.0.12 and 1.11.20

DJANGOPROJECT.COM

Dry-Python: Libraries for Pluggable Business Logic Components

DRY-PYTHON.ORG

PyPy V7.0.0: Triple Release of 2.7, 3.5 and 3.6-Alpha

MOREPYPY.BLOGSPOT.COM

python-o365: Interact With Microsoft Graph and Office 365 API

GITHUB.COM/O365

UnrealEnginePython: Embed Python in Unreal Engine 4

GITHUB.COM/20TAB • Shared by Mike Kennedy

art: ASCII Art Library for Python

GITHUB.COM/SEPANDHAGHIGHI

demoji: Accurately Remove Emojis From Text Strings

Accurately find or remove emojis from a blob of text.
BRAD SOLOMON

pipelines: Scripting Massively Parallel Pipelines With Python

GITHUB.COM/CALEBWIN

Events

Python North East

February 13, 2019
PYTHONNORTHEAST.COM

Python Atlanta

February 14, 2019
MEETUP.COM

PyCon Belarus 2019

February 15 to February 17, 2019
PYCON.ORG

Dominican Republic Python User Group

February 19, 2019
PYTHON.DO

PyCon Namibia 2019

February 19 to February 22, 2019
PYCON.ORG


Happy Pythoning!
This was PyCoder’s Weekly Issue #355.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

February 12, 2019 08:30 PM UTC