skip to navigation
skip to content

Planet Python

Last update: December 14, 2018 07:48 AM UTC

December 14, 2018

Test and Code

58: Testing REST APIs with Docker containers and pytest

Let's say you've got a web application you need to test.
It has a REST API that you want to use for testing.

Can you use Python for this testing even if the application is written in some other language? Of course.
Can you use pytest? duh. yes. what else?
What if you want to spin up docker instances, get your app running in that, and run your tests against that environment?
How would you use pytest to do that?
Well, there, I'm not exactly sure. But I know someone who does.

Dima Spivak is the Director of Engineering at StreamSets, and he and his team are doing just that.
He's also got some great advice on utilizing code reviews across teams for test code, and a whole lot more.

Special Guest: Dima Spivak.

Sponsored By:

Support Test & Code


<p>Let&#39;s say you&#39;ve got a web application you need to test.<br> It has a REST API that you want to use for testing. </p> <p>Can you use Python for this testing even if the application is written in some other language? Of course.<br> Can you use pytest? duh. yes. what else?<br> What if you want to spin up docker instances, get your app running in that, and run your tests against that environment? <br> How would you use pytest to do that?<br> Well, there, I&#39;m not exactly sure. But I know someone who does.</p> <p>Dima Spivak is the Director of Engineering at StreamSets, and he and his team are doing just that.<br> He&#39;s also got some great advice on utilizing code reviews across teams for test code, and a whole lot more.</p><p>Special Guest: Dima Spivak.</p><p>Sponsored By:</p><ul><li><a rel="nofollow" href="">DigitalOcean</a>: <a rel="nofollow" href="">Get started with a free $100 credit </a></li></ul><p><a rel="payment" href="">Support Test &amp; Code</a></p><p>Links:</p><ul><li><a title="Introducing the StreamSets Test Framework" rel="nofollow" href="">Introducing the StreamSets Test Framework</a></li><li><a title="pytest-benchmark · PyPI" rel="nofollow" href="">pytest-benchmark · PyPI</a></li><li><a title="StreamSets Test Framework-based tests for StreamSets Data Collector" rel="nofollow" href="">StreamSets Test Framework-based tests for StreamSets Data Collector</a></li><li><a title="StreamSets: Where DevOps Meets Data Integration" rel="nofollow" href="">StreamSets: Where DevOps Meets Data Integration</a></li></ul>

December 14, 2018 07:00 AM UTC

December 13, 2018

Python Engineering at Microsoft

Python in Visual Studio Code – December 2018 Release

We are pleased to announce that the December 2018 release of the Python Extension for Visual Studio Code is now available. You can download the Python extension from the Marketplace, or install it directly from the extension gallery in Visual Studio Code. You can learn more about Python support in Visual Studio Code in the documentation.

This release was a short release, where we primarily focused on two top-requested features for the data science experience shipped in November: remote Jupyter support and export Python files as Jupyter Notebooks. We have also fixed many issues reported on GitHub, and you can see the full list in our changelog.

Keep on reading to learn more!

Remote Jupyter Support

This release enabled connecting to remote Jupyter servers for execution, so you can offload intensive computation to other machines that have more compute powers or those that have the right hardware spec you need.

To connect to a remote Jupyter server, run this new command Python: Specify Jupyter server URI in the Visual Studio Code Command Palette.

Then you will be asked to enter the hostname and a token for authentication.

You can find the token in the terminal when you start a Jupyter Notebook server with token authentication enabled. The token is generated and logged to the terminal. For example:

[I 11:59:16.597 NotebookApp] The Jupyter Notebook is running at:

Copy the URL in the terminal and then paste into VS Code.

Cells will now be executed using the remote Jupyter server, and you can see that information in the Interactive window. Here is an example of a Windows machine connecting to a Linux VM for execution (the real URI is blurred out for security reasons 😊):

Token-based authentication is on by default for notebook 4.3 and later. This document has more details on Security in the Jupyter notbeook server.

Export Python files as Jupyter Notebooks

This update added two commands for exporting Python files as Jupyter Notebooks. Along with the export run results command that was shipped in the previous release, the Python extension now offers three Export as Jupyter Notebook options and you can choose the one that is right for your use-case.

After exported, you will see a message box with a button “Open in browser”, which can open the exported notebook file locally in a browser.

Other Changes and Enhancements

We have also added small enhancements and fixed issues requested by users that should improve your experience working with Python in Visual Studio Code. The full list of improvements is listed in our changelog; some notable changes include:

Be sure to download the Python extension for Visual Studio Code now to try out the above improvements. If you run into any problems, please file an issue on the Python VS Code GitHub page.


December 13, 2018 10:53 PM UTC


Create an exit button on the main game menu scene

In this article we will continue with our previous pygame project by creating an exit button on the main game menu scene which when the user clicks on it the program will close the game. We will create the score scene in the next chapter as I have promised in the previous article but in this article we will create the exit button which is also very important for this project. This button will only...


December 13, 2018 10:58 AM UTC

Jeff Knupp

How To Do Just About Anything With Python Lists

Python's list is one of the built-in sequence types (i.e. "it holds a sequence of things") is a wonderfully useful tool. I figured I'd try to determine what people are most often trying to do with lists (by analyzing Google's query data on the topic) and just bang out examples of "How do I do X with a list in Python?"

Reverse/Sort A List In Python

There are two ways to reverse a list in Python, and which one you use depends on what you want to do with the resulting reversed data. If you're only going iterate over the items in the reversed list (say, to print them out), use the Python built-in function reversed(seq). Here's an example of reversed in action:

original_list = [1, 2, 3, 4, 5]
for element in reversed(original_list):

If you need the reversed list itself, use the Python built-in function sorted(iterable, *, key=None, reverse=False). Let's see some examples:

In [10]: original_list = [3, 1, 2, 5, 4]

In [11]: sorted(original_list)
Out[11]: [1, 2, 3, 4, 5]

In [15]: sorted(original_list, reverse=True)
Out[15]: [5, 4, 3, 2, 1]

But what if your list contains more than just simple integers? How does one sort, say, a list of temperature readings over a given time if those daily readings are each stored as a tuples of the form (<date>, <daily high>, <daily low>)? Look at the following example:

readings = [('1202', 45.0, 28.1), ('1201', 44.0, 33.0), ('1130', 45.0, 32.6)]

Calling sorted(readings) will give us a new list with the elements ordered by the <date> portion of the tuple (the 0-th element, since Python compares tuples lexicographically; each item is compared in order, starting with the first elements). But what if we wanted to sort by <daily high> or <daily low>? Simple! Just pass the key parameter a function that takes a single argument and returns the key for sorted() to use for comparisons. For example, I could sort by daily low temperatures like so:

In [25]: sorted(readings, key=lambda reading: reading[2])
Out[25]: [('1202', 45.0, 28.1), ('1130', 45.0, 32.6), ('1201', 44.0, 33.0)]

In that example, we passed the key parameter a lambda function which accepted one argument and returned a value, in our case the third part of our temperature recording tuple (the reading[2] part). If we had wanted to sort by the daily high in reverse order, we would just change the call like so:

In [26]: sorted(readings, key=lambda reading: reading[1], reverse=True)
Out[26]: [('1202', 45.0, 28.1), ('1130', 45.0, 32.6), ('1201', 44.0, 33.0)]

Accessing elements of a tuple or class is such a common task that Python provides a set of convenience functions in the operator built-in module. To access a specific field in the tuple (as we did above for daily high as reading[1] and daily low as reading[2]), use the field's index in the tuple as an argument to operator.itemgetter:

In [29]: sorted(readings, key=itemgetter(1), reverse=True)
Out[29]: [('1202', 45.0, 28.1), ('1130', 45.0, 32.6), ('1201', 44.0, 33.0)]

But notice that the first two entries have the same high temp recordings (45.0). What if we wanted to first sort by high temp and then by low temp? itemgetter allows for multiple levels of sorting by simply passing in multiple index values. So let's sort by high temp first, then low temp:

In [31]: sorted(readings, key=itemgetter(1,2), reverse=True)
Out[31]: [('1130', 45.0, 32.6), ('1202', 45.0, 28.1), ('1201', 44.0, 33.0)]

Notice that the ('1130', 45.0, 32.6) tuple is now first, as it had an equal high temp and a greater low temp than ('1202', 45.0, 28.1).

Split A Python List Into Chunks

Splitting a list into equally sized sub-lists (for processing data in parallel, perhaps) is a common task. It's so common, in fact, that the itertools module (a module practically begging to be used in these kinds of tasks, by the way) gives actual code for how to accomplish this in Python in the itertools recpipes section of the docs. Here is the relevant code:

def grouper(iterable, n, fillvalue=None):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

The code may look confusing at first, but it's simply creating n (or 3 in the example in the comments) separate iterators over the iterable argument and then cleverly zipping them back together using zip_longest from the itertools module, to collect the elements of iterable in a series of n-sized chunks.

Flatten A Python List Of Lists Into One List

itertools recipes FTW again! Straight out of the section of the Python docs that gave us grouper above, the recipe for "flatten" is:

def flatten(list_of_lists):
    "Flatten one level of nesting."
    return chain.from_iterable(list_of_lists)

It simply calls itertools.chain.from_iterable() on the list_of_lists. A call to flatten([[1, 2, 3], [4, 5], [6, 7, 8]]) will give us an iterator that yields each element individually. We can say flattened_list = list(flatten([[1, 2, 3], [4, 5], [6, 7, 8]])) if what we need is an actual list and not just an iterator.

Insert Into A List In Python

The word "insert" here is vague (insert where?), but let's roll with it. Here are some of the flavors of list assignment mentioned in the Python docs about operations on "Mutable Sequence Types":

Operation   Result

s[i] = x    item i of s is replaced by x     
s[i:j] = t  slice of s from i to j is replaced by the contents of the iterable t     
s[i:j:k] = t    the elements of s[i:j:k] are replaced by those of t (as long as t is the same length as the slice it's replacing)
s.append(x) appends x to the end of the sequence (same as s[len(s):len(s)] = [x])
s.extend(t) or s += t   extends s with the contents of t (for the most part the same as s[len(s):len(s)] = t)    
s.insert(i, x)  inserts x into s at the index given by i (same as s[i:i] = [x])

So six different ways to insert into a list in Python. That doesn't seem very Pythonic!? Let's see which we might use on a case by case basis:

Insert Into The Beginning Of A List In Python

s.insert(0, value)

Insert Into The End Of A List In Python


Insert Into An Existing Index Of A Python List

s[index] = value

Concatenating Two Python Lists

s += t  # Where s and t are both lists

Python "List Index Out Of Range" Error Message

This entry is less a "how-to" and more of a "what to do when things go wrong" type of entry, but it's nonetheless searched for very often in conjunction with Python lists (for obvious reasons). The message "list index out of range" is brought to you by a specific type of built-in Exception, namely the IndexError. It is simply saying that you tried to access a list, perhaps using code like the following:

def my_function(some_list):
    """Return something really interesting..."""
    third_value = some_list[2]
    # ... possibly more code following

2 is not a valid index to use because some_list does not have three values (we don't know how many it does have, just less than 3 in this case).

Want to know how many values some_list has? The Python built-in len(s) function will give you exactly that.

A simple debugging exercise of the "list index out of range" message might look like this:

def my_function(some_list):
        third_value = some_list[2]
    except IndexError:
        print(f'some_list only has {len(some_list)} entries')

Here we put our code in a try...except block and catch any IndexError exceptions that are raised. When we see one, we just print out the length of some_list and re-raise the exception (since we can't exactly handle that in any useful way). That gives me the following output:

In [50]: def my_function(some_list):
    ...:         try:
    ...:             third_value = some_list[2]
    ...:         except IndexError:
    ...:             print(f'some_list only has {len(some_list)} entries')
    ...:             raise

In [51]: my_function([1,2])
some_list only has 2 entries
IndexError                                Traceback (most recent call last)
<ipython-input-51-3f735c70ddb9> in <module>
----> 1 my_function([1,2])

<ipython-input-50-137ffeaa7c70> in my_function(some_list)
    1 def my_function(some_list):
    2         try:
----> 3             third_value = some_list[2]
    4         except IndexError:
    5             print(f'some_list only has {len(some_list)} entries')

IndexError: list index out of range

How To Make A List In Python

The section topic is taken directly from search query volume, so rest assured this isn't me just throwing random facts about Python lists at you. There are a few ways to create a list in Python, but the two you'll use the most is using the "square-brace" syntax:

Generally speaking, prefer the use of the first two methods unless you can think of a very good reason not to.

Apply A Function To A List In Python

Remember when I said list comprehensions were out of scope for this article. I lied. Even though they're a little trickier to grasp, they're very powerful and applying a function to each element in a list in Python is basically their bread and butter. Imagine we have a list of the numbers one through five and want to create a second list whose elements exactly each element in the original list multiplied by two. To construct such a list using a for loop, we'd need the following code:

original_list = [1,2,3,4,5]
double_list = []
for element in original_list:
    double_list.append(element * 2)

The need to do this occurs so frequently that (especially for performance-sensitive applications) a highly optimized general form of the above as part of the Python language in list comprehensions. Rather than go over the topic in great detail here, I'll simply link to the Python documentation on list comprehensions and note that we could achieve the same as code above using a list comprehension, like so:

original_list = [1,2,3,4,5]
double_list = [element * 2 for element in original_list]

I've grown to appreciate the precision of the grammar when reading it aloud, but it definitely takes some getting used to (and this is only the simplest use of a list comprehension).

Of course, there's always the Python built-in function map(), which will do much the same thing in a slightly easier to grok way, but there is an important difference between the two: map() yields the values back one-at-a-time (it's a generator) whereas a list comprehension will always create a new list from the existing list given as an argument.

It may seem difficult, but try to get to a place where you're using list comprehensions rather than map() as the former are both more powerful and more Pythonic.


This article was a quick hit, but it's good to go back to basics occasionally and see if there's anything you might have missed your first time through (I'll wager that very few Python developers know and use the itertools module as well and often as they should). I tentatively plan on adding to this article over time (a change log would be provided). Let me know if you think there's a common "Python list" Google query I missed or with suggestions on the other parts of Python lists that often trip up newcomers.

December 13, 2018 08:02 AM UTC

Yasoob Khalid

Research Writeup: Deanonymization and Proximity Detection Using Wi-Fi

Hi everyone! If you have been following my blog for a while you will know that I did research at Colgate University over the summers. My research was on Wi-Fi and how I can do some interesting stuff using it. The university just published its annual catalogue of all the research projects which happened over the summer.  My research was done under the mentorship of Aaron Gember-Jacobson. I could not have asked for a better advisor. Here is the writeup of my project:

According to RAINN (Rape, Abuse & Incest National Network), 23.1% of female and 5.4% of male undergraduate students experience rape or sexual assault, with only a minute percentage reporting their assault to law enforcement1. In certain cases, survivors can forget who the perpetrator was due to trauma and/or intoxication. I want to use technology to counter this problem. My hope is to reduce the number of potential culprits when such an incident occurs to make it easier for the survivor to identify the perpetrator.

This can be made possible by using a device that most people carry at all times – a smartphone. The idea is to save the device identifier and the distance between your phone and that of each person who comes near you in a searchable database. This allows you, the user, to search for which device was near you at a particular time. The research is further divided into two parts. The first involved finding a way to effectively calculate the relative distance between two smartphones and the second involved information storage and querying. I focused mainly on the first part, which turned out to be more difficult and involved than I anticipated.

The cornerstone of this idea is Wi-Fi and the information your smartphone emits when the Wi-Fi is turned on, though not necessarily connected to an access point. The formal requirements of this system are as follows: it should be passive so you don’t have to actively monitor it; it shouldn’t require other people’s smartphones to run any specific application; the error in distance estimation should be less than 1 meter so the algorithm can accurately identify a human interaction; the system needs to work in NLOS (Non-line-of-sight) scenarios since people often have their smartphones in their pockets; finally, it should not require more than three devices, including your smartphone, a nearby smartphone, and a Wi-Fi Access Point to which both phones are connected, because the system should be portable.

Previous research in relative distance estimation offers varying levels of precision. One method involves using RSSI (Received Signal Strength Indication) readings from multiple access points (4+ for accuracy) and triangulating smartphone position based on that. We cannot use this method because 4+ devices are required. Another method involves using Time-of-Flight (ToF) measurements.  There are multiple variations of this method, but the basic idea is to send data from your device to the device being localized, and recording the time taken for the data to travel from one device to another and for an acknowledgment to be received. Based on this timing measurement and the required time delay (known as SIFS, or Short Interframe Space) between a device receiving data and sending an acknowledgement, we can estimate the distance between two devices. This gives the best accuracy but is not directly applicable to this situation, because it requires a direct connection between the two smartphones.

I sought to develop a modified version of the ToF method, because it offers the best precision and requires the least number of devices to work effectively. The method I developed was to send unsolicited control packets (a special type of data frame) to the target mobile device and force it to send an acknowledgement (see figure). The major research question is: how do we force the target device to send an acknowledgement even if we are not directly connected to it?

I set up a testbed with three desktops equipped with Wi-Fi cards and running Ubuntu Linux. I used Scapy (a Python program for generating network packets) to generate and send control packets from one desktop to another and tcpdump on the third desktop to monitor and analyze the wireless communication taking place. I was able to send the control packets and solicit an acknowledgment from the target mobile (Ubuntu desktop) without being directly connected to it.

However, there was a bug in the networking drivers of Ubuntu that generated acknowledgments even in cases where no acknowledgment was supposed to be sent by the target device. Currently, I am investigating the bug and trying to figure out the most suitable way forward. Through this research, I found that the process of distance estimation is more complicated than it seems. There are several variables and timing issues that need to be taken into account.  In the future, I plan on finding a workaround for this bug, with the eventual goal of making this system usable in everyday life.

If you have any questions about my research or anything in general please write them in the comments below. Looking forward to hearing your views! Have a great day/night! 🙂 

December 13, 2018 05:54 AM UTC

Matt Layman

Django Tutorial Adventure Part 2

What happens when you take a novice web developer and put him in front of an audience equipped with the Django tutorial only? That’s what we did at Python Frederick. The conversation and learning that resulted was awesome. Check out the video on YouTube to see what you can learn too! Our presenter was Patrick Pierson. Patrick works for IronNet Cybersecurity as a Senior Software Engineer. He uses Python daily to deploy AWS resources and test the IronNet platform.

December 13, 2018 12:00 AM UTC

December 12, 2018

Jean-Louis Fuchs

libchirp or Software is infinite

I want to thank Adfinis-SyGroup who have supported me and allowed me to develop libchirp.


Writing a message-passing library and some of the things I learned. In multiple installments.


It takes always longer

Four years ago I decided we need something like libchirp. I wanted a core, that is safe, light-weight, high-performance and as portable as possible. I had this polyglot cloud-software-toolkit in mind that bridges all language barriers in software. I was inspired by what I learned about Erlang. I thought by now we will have many daemons, bindings and upper-layer protocols, but in fact, I have an awesome C99-implementation and good bindings for python. I have created the foundation.

You can't have the penny and the bun

I wanted to have two conflicting properties on more than one occasion. For example:

  • Have a bounded message-queue
  • All peers are synchronous

For performance, memory-safety, safety against spamming a peer I designed libchirp using a fixed-size message-queue. It allocates the message-queue with a fixed-size when a connection opens and then it doesn't have to allocate anything during operation, this means existing connections stay operational in low-memory situations. Generally, libchirp handles failing malloc() gracefully.

Now if all peers are synchronous you can end-up with a lock-up situation:

  • Two peers want to send an ACK message to each other
  • But both peers have a full queue, so they can't accept the ACK message
  • When they can't accept the ACK message they can't free a slot in the queue [1]

I had about 5 plans to remedy the situation, but everything I tried only moved the problem farther away. I do randomized testing using hypothesis, and it was always able to find new lock-up situations. Yes, hypothesis rules! Other plans would violate my fixed-memory property and open the door for DoS-attacks somehow. You just can't have both.

Since I value the performance, memory-safety and DoS-prevention properties of libchirp, I pondered if it is indispensable for all peers to be synchronous, Synchronous in this context means you can't lose a message without an error (exception in python).

It turns out that most of the time an asymmetric approach is absolutely sufficient and if it isn't you can always use timeout-based bookkeeping of messages/requests/responses.

Rule of thumb:

  • Consumers (workers) are not synchronous
  • Producers are synchronous if they don't do bookkeeping
  • If you route messages from a synchronous producer, you want to be synchronous too: Timeouts get propagated to the producer.

If the producer requests an acknowledge (it is synchronous), then the consumer signals that it has finished the job after sending the response. So the response is put on the wire before the acknowledge, therefore by the time the consumer reads the acknowledge and there is no response it is clearly an error.

If there is no response needed an acknowledge means that consumer has done its work, for example, committed the data to a database. So for the producer, no error means the transaction was successful.

[1]You might ask, why can't we release the slot of the message that triggered the acknowledge? 1. The information where to send the acknowledge to is stored in that slot. 2. The user will get a callback when the acknowledge has been sent, to identitify the callback he needs an identity which is stored in the slot. Yeah, the memory-safety property really makes things complicated, but its so worth it, believe me. Chirp is more or less as fast only calling the needed syscalls, almost no overhead. Also it keeps on sending messages when you are out-of-memory. Did I mention that we wanted to use chirp for monitoring. If your server is out-of-memory, it will be able to tell you about it. By default libchirp will allocate more memory if the message is larger than the allocated slot, but you can disable this.

December 12, 2018 11:00 PM UTC

Data School

What's the future of the pandas library?

pandas is a powerful, open source Python library for data analysis, manipulation, and visualization. I've been teaching data scientists to use pandas since 2014, and in the years since, it has grown in popularity to an estimated 5 to 10 million users and become a "must-use" tool in the Python data science toolkit.

I started using pandas around version 0.14.0, and I've followed the library as it has significantly matured to its current version, 0.23.4. But numerous data scientists have asked me questions like these over the years:

Version numbers can be used to signal the maturity of a product, and so I understand why someone might be hesitant to rely on "pre-1.0" software. But in the world of open source, version numbers don't necessarily tell you anything about the maturity or reliability of a library. (Yes, pandas is both mature and reliable!) Rather, version numbers communicate the stability of the API.

In particular, version 1.0 signals to the user: "We've figured out what the API should look like, and so API-breaking changes will only occur with major releases (2.0, 3.0, etc.)" In other words, version 1.0 marks the point at which your code should never break just by upgrading to the next minor release.

So the question remains: What's coming in pandas 1.0, and when is it coming?

Towards pandas 1.0

I recently watched a talk from PyData London called Towards pandas 1.0, given by pandas core developer Marc Garcia. It was an enlightening talk about the future of pandas, and so I wanted to highlight and comment on a few of the items that were mentioned:

If you want to follow along with the full talk slides, they can be found in this Jupyter notebook.

Method chaining 👍

The pandas core team now encourages the use of "method chaining". This is a style of programming in which you chain together multiple method calls into a single statement. This allows you to pass intermediate results from one method to the next rather than storing the intermediate results using variables.

Here's the example Marc used that does not use method chaining:

import pandas  
df = pandas.read_csv('data/titanic.csv.gz')  
df = df[df.Age < df.Age.quantile(.99)]  
df['Age'].fillna(df.Age.median(), inplace=True)  
df['Age'] = pandas.cut(df['Age'],  
                       bins=[df.Age.min(), 18, 40, df.Age.max()],
                       labels=['Underage', 'Young', 'Experienced'])
df['Sex'] = df['Sex'].replace({'female': 1, 'male': 0})  
df = df.pivot_table(values='Sex', columns='Pclass', index='Age', aggfunc='mean')  
df = df.rename_axis('', axis='columns')  
df = df.rename('Class {}'.format, axis='columns')'{:.2%}')  

Here is the equivalent code that uses method chaining:

import pandas  
       .query('Age < Age.quantile(.99)')
       .assign(Sex=lambda df: df['Sex'].replace({'female': 1, 'male': 0}),
               Age=lambda df: pandas.cut(df['Age'].fillna(df.Age.median()),
                                         bins=[df.Age.min(), 18, 40, df.Age.max()],
                                         labels=['Underage', 'Young', 'Experienced']))
       .pivot_table(values='Sex', columns='Pclass', index='Age', aggfunc='mean')
       .rename_axis('', axis='columns')
       .rename('Class {}'.format, axis='columns')

Their primary reasons for preferring method chains are:

Here are my thoughts:

Tom Augspurger, another pandas core developer, also noted:

"One drawback to excessively long chains is that debugging can be harder. If something looks wrong at the end, you don't have intermediate values to inspect."

To be clear, method chaining has always been available in pandas, but support for chaining has increased through the addition of new "chain-able" methods. For example, the query() method (used in the chain above) was previously tagged as "experimental" in the documentation, which is why I haven't been using it or teaching it. That tag was removed in pandas 0.23, which may indicate that the core team is now encouraging the use of query().

I don't think you will ever be required to use method chains, but I presume that the documentation may eventually migrate to using that style.

For a longer discussion of this topic, see Tom Augspurger's Method Chaining post, which was part 2 of his Modern pandas series.

inplace 👎

The pandas core team discourages the use of the inplace parameter, and eventually it will be deprecated (which means "scheduled for removal from the library"). Here's why:

Personally, I'm a fan of inplace and I happen to prefer writing df.reset_index(inplace=True) instead of df = df.reset_index(), for example. That being said, lots of beginners do get confused by inplace, and it's nice to have one clear way to do things in pandas, so ultimately I'd be fine with deprecation.

If you'd like to learn more about how memory is managed in pandas, I recommend watching this 5-minute section of Marc's talk.

Apache Arrow 👍

Apache Arrow is a "work in progress" to become the pandas back-end. Arrow was created in 2015 by Wes McKinney, the founder of pandas, to resolve many of the underlying limitations of the pandas DataFrame (as well as similar data structures in other languages).

The goal of Arrow is to create an open standard for representing tabular data that natively supports complex data formats and is highly optimized for performance. Although Arrow was inspired by pandas, it's designed to be a shared computational infrastructure for data science work across multiple languages.

Because Arrow is an infrastructure layer, its eventual use as the pandas back-end (likely coming after pandas 1.0) will ideally be transparent to pandas end users. However, it should result in much better performance as well as support for working with "larger-than-RAM" datasets in pandas.

For more details about Arrow, I recommend reading Wes McKinney's 2017 blog post, Apache Arrow and the "10 Things I Hate About pandas", as well as watching his talk (with slides) from SciPy 2018. For details about how Arrow will be integrated into pandas, I recommend watching Jeff Reback's talk (with slides) from PyData NYC 2017.

Extension Arrays 👍

Extension Arrays allow you to create custom data types for use with pandas. The documentation provides a nice summary:

Pandas now supports storing array-like objects that aren’t necessarily 1-D NumPy arrays as columns in a DataFrame or values in a Series. This allows third-party libraries to implement extensions to NumPy’s types, similar to how pandas implemented categoricals, datetimes with timezones, periods, and intervals.

In other words, previously the pandas team had to write a lot of custom code to implement data types that were not natively supported by NumPy (such as categoricals). With the release of Extension Arrays, there is now a generalized interface for creating custom types that anyone can use.

The pandas team has already used this interface to write an integer data type that supports missing values, also known as "NA" or "NaN" values. Previously, integer columns would be converted to floats if you marked any values as missing. The development documentation indicates that the "Integer NA" type will be available in the next release (version 0.24).

Another compelling use for this interface would be a native string type, since strings in pandas are currently represented using NumPy's "object" data type. The fletcher library has already used the interface to enable a native string type in pandas, though the pandas team may eventually build its own string type directly into pandas.

For a deeper look into this topic, check out the following resources:

Other deprecations 👎

Here are a few other deprecations which were discussed in the talk:


According to the talk, here's the roadmap to pandas 1.0:

More details about the roadmap are available in the pandas sprint notes from July 2018, though all of these plans are subject to change.

Learning pandas?

If you're new to pandas, I recommend watching my video tutorial series, Easier data analysis in Python with pandas.

If you're an intermediate pandas user, I recommend watching my tutorial from PyCon 2018, Best practices with pandas.

Want to know when I release new pandas tutorials? Subscribe to my email newsletter.

Let me know your thoughts or questions in the comments section below!

P.S. There is also a discussion of this post on Reddit.

December 12, 2018 03:51 PM UTC

Real Python

Thonny: The Beginner-Friendly Python Editor

Are you a Python beginner looking for a tool that can support your learning? This article is for you! Every programmer needs a place to write their code. This article will discuss an awesome tool called Thonny that will enable you to start working with Python in a beginner-friendly environment.

In this article, you’ll learn:

By the end of this article, you’ll be comfortable with the development workflow in Thonny and ready to use it for your Python learning.

So what is Thonny? Great question!

Thonny is a free Python Integrated Development Environment (IDE) that was especially designed with the beginner Pythonista in mind. Specifically, it has a built-in debugger that can help when you run into nasty bugs, and it offers the ability to do step through expression evaluation, among other really awesome features.

Free Sample Chapter: Download a free sample chapter from the Real Python course and gain practical Python programming skills.

Installing Thonny

This article assumes that you have Python 3 installed on your computer. If not, please review Python 3 Installation & Setup.

Web Download

The web download can be accessed via a web browser by visiting the Thonny website. Once on the page, you will see a light gray box in the top right corner like this:

Thonny's Web Download Widget

Once you’ve found the gray box, click the appropriate link for your operating system. This tutorial assumes you’ve downloaded version 3.0.1.

Command Line Download

You can also install Thonny via your system’s command line. On Windows, you can do this by starting a program called Command Prompt, while on macOS and Linux you start a program called Terminal. Once you’ve done that, enter the following command:

$ pip install thonny

The User Interface

Let’s make sure you understand what Thonny has to offer. Think of Thonny as the workroom in which you will create amazing Python projects. Your workroom contains a toolbox containing many tools that will enable you to be a rock star Pythonista. In this section, you’ll learn about each of the features of the UI that’ll help you use each of the tools in your Thonny toolbox.

The Code Editor and Shell

Now that you have Thonny installed, open the application. You should see a window with several icons across the top, and two white areas:

Thonny's Main UI

Notice the two main sections of the window. The top section is your code editor, where you will write all of your code. The bottom half is your Shell, where you will see outputs from your code.

The Icons

Across the top you’ll see several icons. Let’s explore what each of them does. You’ll see an image of the icons below, with a letter above each one. We will use these letters to talk about each of the icons:

The Icons Across the Top of Thonny's UI

Working our way from left to right, below is a description of each of the icons in the image.

A: The paper icon allows you to create a new file. Typically in Python you want to separate your programs into separate files. You’ll use this button later in the tutorial to create your first program in Thonny!

B: The open folder icon allows you to open a file that already exists on your computer. This might be useful if you come back to a program that you worked on previously.

C: The floppy disk icon allows you to save your code. Press this early and often. You’ll use this later to save your first Thonny Python program.

D: The play icon allows you to run your code. Remember that the code you write is meant to be executed. Running your code means you’re telling Python, “Do what I told you to do!” (In other words, “Read through my code and execute what I wrote.”)

E: The bug icon allows you to debug your code. It’s inevitable that you will encounter bugs when you’re writing code. A bug is another word for a problem. Bugs can come in many forms, sometimes appearing when you use inappropriate syntax and sometimes when your logic is incorrect.

Thonny’s bug button is typically used to spot and investigate bugs. You’ll work with this later in the tutorial. By the way, if you’re wondering why they’re called bugs, there’s also a fun story of how it came about!

F-H: The arrow icons allow you to run your programs step by step. This can be very useful when you’re debugging or, in other words, trying to find those nasty bugs in your code. These icons are used after you press the bug icon. You’ll notice as you hit each arrow, a yellow highlighted bar will indicate which line or section Python is currently evaluating:

I: The resume icon allows you to return to play mode from debug mode. This is useful in the instance when you no longer want to go step by step through the code, and instead want your program to finish running.

J: The stop icon allows you to stop running your code. This can be particularly useful if, let’s say, your code runs a program that opens a new window, and you want to stop that program. You’ll use the stop icon later in the tutorial.

Let’s Try It!

Get ready to write your first official Python program in Thonny:

  1. Enter the following code into the code editor:

    print("Hello World")
  2. Click the play button to run your program.

  3. See the output in the Shell window.

  4. Click the play button again to see that it says hello one more time.

Congratulations! You’ve now completed your first program in Thonny! You should see Hello world! printed inside the Shell, also known as the console. This is because your program told Python to print this phrase, and the console is where you see the output of this execution.

Other UI Features

To see more of the other features that Thonny has to offer, navigate to the menu bar and select the View dropdown. You should see that Shell has a check mark next to it, which is why you see the Shell section in Thonny’s application window:


Let’s explore some of the other offerings, specifically those that will be useful to a beginning Pythonista:

  1. Help: You’ll select the Help view if you want more information about working with Thonny. Currently this section offers more reading on the following topics: Running Programs Step-wise, how to install 3rd Party Packages, or using Scientific Python Packages.

  2. Variables: This feature can be very valuable. A variable in Python is a value that you define in code. Variables can be numbers, strings, or other complex data structures. This section allows you to see the values assigned to all of the variables in your program.

  3. Assistant: The Assistant is there to give you helpful hints when you hit Exceptions or other types of errors.

The other features will become useful as you advance your skills. Check them out once you get more comfortable with Thonny!

The Code Editor

Now that you have an understanding of the UI, let’s use Thonny to write another little program. In this section, you’ll go through the features of Thonny that will help guide you through your development workflow.

Write Some Code

In the code editor (top portion of the UI), add the following function:

def factorial(num):
    if num == 1:
        return 1
        return num * factorial(num - 1)


Save Your Code

Before we move on, let’s save your program. Last time, you were prompted to do this after pressing the play button. You can also do this by clicking the blue floppy disk icon or by going to the menu bar and selecting File > Save. Let’s call the program

Run Your Code

In order to run your code, find and press the play icon. The output should look like this:

Output of factorial function

Debug Your Code

To truly understand what this function is doing, try the step feature. Take a few large and small steps through the function to see what is happening. Remember you can do this by pressing the arrow icons:

Step by step windows

As you can see, the steps will show how the computer is evaluating each part of the code. Each pop up window is like a piece of scratch paper that the computer is using to compute each portion of the code. Without this awesome feature, this may have been hard to conceptualize—but now you’ve got it!

Stop Running Your Code

So far, there hasn’t been a need to hit the stop icon for this program, particularly because it exits as soon as it has executed print(). Try increasing the number being passed to the factorial function to 100:

def factorial(num):
    if num == 1:
        return 1
        return num * factorial(num - 1)


Then step through the function. After a while, you will notice that you will be clicking for a long time to reach the end. This is a good time to use the stop button. The stop button can be really useful to stop a program that is either intentionally or unintentionally running continuously.

Find Syntax Errors in Your Code

Now that you have a simple program that works, let’s break it! By intentionally creating an error in your factorial program, you’ll be able to see how Thonny handles these types of issues.

We will be creating what is called a syntax error. A syntax error is an error that indicates that your code is syntactically incorrect. In other words, your code does not follow the proper way to write Python. When Python notices the error, it will display a syntax error to complain about your invalid code.

Above the print statement, let’s add another print statement that says print("The factorial of 100 is:"). Now let’s go ahead and create syntax errors. In the first print statement, remove the second quotation mark, and in the other remove the second parenthesis.

As you do this, you should see that Thonny will highlight your SyntaxErrors. Missing quotations are highlighted in green, and missing parenthesis are in gray:

syntax errors for factorial function

For beginners, this is a great resource that will allow you to help spot any typos while you’re writing. Some of the most common and frustrating errors when you start programming are missing quotes and mismatched parentheses.

If you have your Assistant View turned on, you will also notice that it will give you a helpful message to guide you in the right direction when you are debugging:

Shows assistant showing syntax error help text

As you get more comfortable with Thonny, the Assistant can be a useful tool to help you get unstuck!

The Package Manager

As you continue to learn Python, it can be quite useful to download a Python package to use inside of your code. This allows you to use code that someone else has written inside of your program.

Consider an example where you want to do some calculations in your code. Instead of writing your own calculator, you might want to use a third-party package called simplecalculator. In order to do this, you’ll use Thonny’s package manager.

The package manager will allow you to install packages that you will need to use with your program. Specifically, it allows you to add more tools to your toolbox. Thonny has the built-in benefit of handling any conflicts with other Python interpreters.

To access the package manager, go to the menu bar and select Tools > Manage Packages… This should pop open a new window with a search field. Type simplecalculator into that field and click the Search button.

The output should look similar to this:

Installed simplecalculator package

Go ahead and click Install to install this package. You will see a small window pop up showing the system’s logs while it installs the package. Once it completes, you are ready to use simplecalculator in your code!

In the next section, you will use the simplecalculator package along with some of the other skills you’ve learned in this tutorial to create a simple calculator program.

Check Your Understanding

You’ve learned so much about Thonny so far! Here’s what you’ve learned:

Let’s check your understanding of these concepts.

Now that you have simplecalculator installed, let’s create a simple program that will use this package. You’ll also use this as an opportunity to check that you understand some of the UI and development features that you’ve learned thus far in the tutorial.

Part 1: Create a File, Add Some Code, and Understand the Code

In Part 1, you will create a file, and add some code to it! Do your best to try to dig into what the code is actually doing. If you get stuck, check out the Take a Deeper Look window. Let’s get started:

  1. Start a new file.
  2. Add the following code into your Thonny code editor:
 1 from calculator.simple import SimpleCalculator
 3 my_calculator = SimpleCalculator()  
 4'2 * 2')
 5 print(my_calculator.lcd)

This code will print out the result of 2 * 2 to the Thonny Shell in the main UI. To understand what each part of the code is doing, check out the Take a Deeper Look section below.

  • Line 1: This code imports the library calculator inside of the package called simplecalculator. From this library, we import the class called SimpleCalculator from a file called You can see the code here.

  • Lines 2: This is a blank line behind code blocks, which is generally a preferred style. Read more about Python Code Quality in this article.

  • Line 3: Here we create an instance of the class SimpleCalculator and assign it to a variable called my_calculator. This can be used to run different calculators. If you’re new to classes, you can learn more about object-oriented programming here.

  • Line 4: Here we have the calculator run the operation 2 * 2 by calling run() and passing in the expression as a string.

  • Line 5: Here we print the result of the calculation. You’ll notice in order to get the most recent calculation result, we must access the attribute called lcd.

Great! Now that you know exactly what your calculator code is doing, let’s move on to running this code!

Part 2: Save the File, View the Variables, and Run Your Code

Now it’s time to save and run your code. In this section, you’ll make use of two of the icons we reviewed earlier:

  1. Save your new file as
  2. Open the Variables window and make note of the two variables listed. You should see SimpleCalculator and my_calculator. This section also gives you insight into the value that each variable is pointing to.
  3. Run your code! You should see 4.0 in the output:

Calculator with Simple Expression

Great job! Next you’ll explore how Thonny’s debugger can help you to better understand this code.

Other Great Beginner Features

As you get more comfortable with Thonny, the features in this section will come in quite handy.


Using your script, you’re going to use the debugger to investigate what is happening. Update your code in to the following:

from calculator.simple import SimpleCalculator

def create_add_string(x, y):
    '''Returns a string containing an addition expression.'''
    return 'x + y'

my_calculator = SimpleCalculator(), 2))

Hit the save icon to save this version.

You’ll notice the code has a new function called create_add_string(). If you’re unfamiliar with Python functions, learn more in this awesome Real Python course!

As you inspect the function, you may notice why this script will not work as expected. If not, that’s okay! Thonny is going to help you see exactly what is going on, and squash that bug! Go ahead and run your program and see what happens. The Shell output should be the following:

>>> %Run

Oh no! Now you can see there is a bug in your program. The answer should be 4! Next, you’ll use Thonny’s debugger to find the bug.

Let’s Try It!

Now that we have a bug in our program, this is a great chance to use Thonny’s debugging features:

  1. Click the bug icon at the top of the window. This enters debugger mode.

  2. You should see the import statements highlighted. Click the small step arrow icon, the yellow arrow in the middle. Keep pressing this to see how the debugger works. You should notice that it highlights each step that Python takes to evaluate your program. Once it hits create_add_string(), you should see a new window pop up.

  3. Examine the pop up window carefully. You should see that it shows the values for x and y. Keep pressing the small step icon until you see the value that Python will return to your program. It will be enclosed in a light-blue box: Thonny's Function Debug Pop-up Window Oh no! There’s the bug! It looks like Python will return a string containing the letters x and y (meaning 'x + y' and not a string containing the values of those variables, like '2 + 2', which is what the calculator is expecting.) Each time you see a light-blue box, you can think of this as Python replacing subexpressions with their values, step by step. The pop up window can be thought of as a piece of scratch paper that Python uses to figure out those values. Continue to step through the program to see how this bug results in a calculation of 0.

  4. The bug here has to do with string formatting. If you are unfamiliar with string formatting, check out this article on Python String Formatting Best Practices. Inside create_add_string(), the f-string formatting method should be used. Update this function to the following:

    def create_add_string(x, y):
        '''Returns a string containing an addition expression.'''
        return f'{x} + {y}'
  5. Run your program again. You should see the following output:

    >>> %Run

Success! You have just demonstrated how the step-by-step debugger can help you find a problem in your code! Next you’ll learn about some other fun Thonny features.

Variable Scope Highlighting

Thonny offers variable highlighting to remind you that the same name doesn’t always mean the same variable. In order for this feature to work, on the menu bar, go to Thonny > Preferences and ensure that Highlight matching names is checked.

Notice in the code snippet below, that create_add_string() now has a new variable called my_calculator, though this is not the same as the my_calculator on lines 10 and 11. You should be able to tell because Thonny highlights the variables that reference the same thing. This my_calculator inside the function only exists within the scope of that function, which is why it is not highlighted when the cursor is on the other my_calculator variable on line 10:

Calculator with highlighting

This feature can really help you avoid typos and understand the scope of your variables.

Code Completion

Thonny also offers code completion for APIs. Notice in the snapshot below how pressing the Tab key shows the methods available from the random library:

Thonny's Code Complete Feature

This can be very useful when you’re working with libraries and don’t want to look at the documentation to find a method or attribute name.

Working on a Pre-Existing Project

Now that you’ve learned the basic features of Thonny, let’s explore how you can use it to work on a pre-existing project.

Find a File on Your Computer

Opening a file on your computer is as easy as going to the menu bar, selecting File > Open, and using your browser to navigate to the file. You can also use the open folder icon at the top of the screen to do this as well.

If you have a requirements.txt file and pip locally installed, you can pip install these from the Thonny system Shell. If you don’t have pip installed, remember you can use the Package Manager to install it:

$ pip install -r requirements.txt

Work on a Project From Github

Now that you are a Thonny expert, you can use it to work on the exercises from Real Python Course 1: Introduction to Python:

  1. Navigate to the Real Python GitHub repo called book1-exercises.

  2. Click the green button labeled Clone or download and select Download Zip.

  3. Click the opening folder icon to navigate and find the downloaded files. You should find a folder called book1-exercises1.

  4. Open one of the files and start working!

This is useful because there are tons of cool projects available on GitHub!


Awesome job getting through this tutorial on Thonny!

You can now start using Thonny to write, debug, and run Python code! If you like Thonny, you might also like some of the other IDEs we’ve listed in Python IDEs and Code Editors (Guide).

Thonny is actively maintained, and new features are being added all the time. There are several awesome new features that are currently in beta that can be found on the Thonny Blog. Thonny’s main development takes place at the Institute of Computer Science of the University of Tartu, Estonia, as well as by contributors around the world.

[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

December 12, 2018 02:00 PM UTC


Remove duplicate files project is ready

I am glad to inform you all that the remove duplicate file project written with python has finally completed and now it will be uploaded to github for you all to enjoy. This is a free software and it will always remain free. Although I would really love to create a linux version of this remove duplicate file software but because I do not have a linux os’s computer therefore at the moment this software is just for the windows user only. I have packed this software up where you can just click on it to open it up to search and destroy the duplicate files inside your computer. Here are the steps you need to do to search and destroy the duplicate files:

  1. Open up the application on your windows desktop.
  2. Click on the select file button to select a file or use shift and select many files at once.
  3. Select the other folder which you want this program to search and destroy all the duplicate files from, then sit back and enjoy.
  4. This program will search all the files within that selected folder but make sure you are not selecting the same folder which you have selected the original file from.
  5. You can select multiple files at the same time and you can also select another file which you want to remove it’s duplicate version from after you have selected the first batch of files but it is not advisable for you to do multiple times of file selection action at once if this program is running on a slow computer.

All right, now lets look at the images below for the step by step tutorial on how to use this python application.

Start the application then click on the select file button below Click shift and select all the files that you wish to search and destroy their duplicate versions from and click the open button Next is to select the folder which you wish to remove all the duplicate files from then click on the select folder button That is it, now just sit back and enjoy while watching the program doing it’s job

If you spot any bug in this application you can leave a comment below this post and I will fix it as fast as possible. If you are not sure about how to use this program you can create a folder then copy and paste some files from one folder to this new folder and start to practice to delete the duplicate files based on the steps I have shown to you earlier. This program is not perfect and thus your feedback is very important for me to improve on the quality of this program.

The application has been uploaded to github and you can now download the Multitas.exe file through this link.  After you have downloaded the file, click on the Multitas.exe file to open it up and use it.

This program is only for the windows os user and maybe you need to have python installed on your desktop before you can use it.

If you intend to compile and run the program by yourself, then you can download the entire package then open up the windows command prompt and type in ‘python path/to/’ to start the application on your windows os laptop, this application runs well with no problem at all!

December 12, 2018 12:15 PM UTC


PyCharm 2018.3.2 RC

PyCharm 2018.3.2 Release Candidate is now available. This version comes with a couple of small improvements. Get it now from our Confluence page.

Improved in This Version

PyCharm splits F-strings for you

In PyCharm 2018.3, we improved a lot of things in F-strings, and we’ve just improved them a little further. When splitting an F-string in previous versions of PyCharm, this is the behavior:

F-strings 2018.3.1

Although this behavior is in line with what you’d expect from a text editor, wouldn’t it be nice if PyCharm could help you a little further? So we’ve made sure that breaking up an F-string in PyCharm 2018.3.2 leaves you with a valid Python file:

F-strings 2018.3.2

Further improvements


Download the RC from our confluence page

If you’re on Ubuntu 16.04 or later, you can use snap to get PyCharm RC versions and stay up to date. You can find the installation instructions on our website.

The release candidate (RC) is not an early access program (EAP) build, and does not bundle an EAP license. To use PyCharm Professional Edition RC, you will need a currently active PyCharm subscription. If none is available, a free 30-day trial will start.

December 12, 2018 11:47 AM UTC

Mike Driscoll

Python 101: Episode #38 – The Python egg

In this screencast, we learn about the Python egg, which was one of Python’s old formats for distributing code. In modern Python, we now use a wheel.

You can also read the chapter this video is based on here or get the book on Leanpub

Previous Episodes

December 12, 2018 06:05 AM UTC


Open a text file then write the game level onto it with python

Welcome back, let us continue with our previous pygame project. In this article we will first create a text file which will be used to store the game level. This text file will be called level.txt which will be stored within the project folder. We need to create this text file by hand and type in 1 which is the first level of the game onto it, this level number will be rewritten each and every...


December 12, 2018 05:27 AM UTC


Tell the user how many duplicate files have the python program deleted

Welcome back again, in this chapter we will create a message box to tell the user how many duplicate files have been deleted from each selected file. The game plan here is simple, we will attach a message variable to the label widget and pass that widget to each remove thread instance so the program can modify the content of that message by adding in the selected file name as well as the total number of duplicate files that have been remove from the selected folder. That modify message will then be shown by the messagebox on the main thread at the end of each file deleting process. The first file we need to edit is the main file where we will pass in the tkinter’s label object with the message variable to each new remove thread instance which the program has created, the remove thread instance will then modify the content of that same message which includes the total number of duplicate files that have been deleted together with the file name of the original selected file. We need not worry too much about whether those threads will compete with each other to modify the content of the message or not because one process will finish before another so therefore there is always a time gap in between the process in this case. Here is the new version of the main file.

from tkinter import *
from tkinter import filedialog
from Remove import Remove
import os
from tkinter import messagebox

win = Tk() # 1 Create instance
win.title("Multitas") # 2 Add a title
win.resizable(0, 0) # 3 Disable resizing the GUI
win.configure(background='black') # 4 change background color

# 5 Create a label
aLabel = Label(win, text="Remove duplicate", anchor="center")
aLabel.grid(column=0, row=1)
aLabel.message = ''

# 6 Create a selectFile function to be used by button
def selectFile():

    fullfilenames = filedialog.askopenfilenames(initialdir="/", title="Select a file") # select multiple files from the hard drive

    if(fullfilenames != ''):

        fullfilenamelist =
        directory_path = os.path.dirname(os.path.abspath(fullfilenamelist[0]))

        os.chdir(directory_path)  # change the directory to the selected file directory

        folder = filedialog.askdirectory()  # 7 open a folder then create and start a new remove thread to delete the duplicate file
        folder = folder.replace('/', '\\')  # 8 this is for the windows separator only

        if(folder != '' and folder != os.getcwd()):

            for fullfilename in fullfilenamelist:

                if(fullfilename != ''):
                    filename = fullfilename.split('/')[-1]
                    remove = Remove(folder, filename, fullfilename, directory_path, aLabel)
                    messagebox.showinfo('Remove duplicate files', aLabel.message)

            messagebox.showinfo("Error", "Kindly select one folder and it must be a different one")

        messagebox.showinfo("Select file", "You need to select a file!")

# 9 Adding a Button
action = Button(win, text="Select File", command=selectFile)
action.grid(column=0, row=0)

win.mainloop()  # 10 start GUI

And here is the edit version of the remove thread class.

import threading
import os
import filecmp
from tkinter import messagebox

class Remove(threading.Thread):

    def __init__(self, massage,  filename, fullfilename, directory_path, aLabel):

        self.massage = massage
        self.filename, self.file_extension = os.path.splitext(filename)
        self.fullfilename = fullfilename
        self.directory_path = directory_path
        self.count = 0
        self.aLabel = aLabel

    def run(self):

        filepaths = os.listdir(self.massage)

        for filepath in list(filepaths):
            if(os.getcwd() != self.directory_path): # make sure that we will not delete the same file in the selected file directory
                    filename, file_extension = os.path.splitext(filepath)
                    self.remove_file(file_extension, filepath)
                    self.delete_duplicate(os.path.join(self.massage, filepath))

        self.aLabel.message = 'Removed ' + str(self.count) + ' duplicate : ' + self.filename # show this message box each time a set of duplicate files have been remove

    def delete_duplicate(self, folder): # sub method to pass folder to

        filepaths = os.listdir(folder)

        for filepath in list(filepaths):
            if(os.getcwd() != self.directory_path):

                    filename, file_extension = os.path.splitext(filepath)
                    self.remove_file(file_extension, filepath)
                    self.delete_duplicate(os.path.join(folder, filepath))


    def remove_file(self, file_extension, filepath):
        if (file_extension == self.file_extension):
            if filecmp.cmp(filepath, self.fullfilename, shallow=False):
                self.count += 1

Now each remove thread will manage it’s own business so there is no conflict with the count variable and the old message text will be replaced by the new one by the next process which changes it. That is basically it for this remove duplicate project, we are now fully ready to pack it up and use it on our windows’ desktop.

December 12, 2018 03:38 AM UTC

December 11, 2018

Jean-Louis Fuchs

Announcing libchirp


Message-passing for everyone.

I proudly announce libchirp. I believe queues, message-routers and patterns like pub-sub are the way message-passing should be done. However, I also believe they should be optional and tweak-able. libchirp does only one thing: message-passing with encryption. All other building-blocks should be implemented in upper-layer modules or daemons. libchirp is the basis for modular message-passing and actor-based programming.

I want to thank Adfinis-SyGroup who have supported me and allowed me to develop libchirp.


Here the mandatory echo-server example:

import asyncio
from libchirp.asyncio import Chirp, Config, Loop

class EchoChirp(Chirp):
    async def handler(self, msg):
        await self.send(msg)

loop = Loop(); config = Config()
# Workers are usually asynchronous
config.SYNCHRONOUS = False
aio_loop = asyncio.get_event_loop()
    chirp = EchoChirp(loop, config, aio_loop)

There is also a ThreadPoolExecutor- and a Queue-based interface.

By the way libchirp for python are bindings to my C99-implementation. My secondary goal is to build a polyglot message-passing toolkit. Please be welcome to contribute bindings for your favorite language.

Project links

December 11, 2018 11:00 PM UTC

Python Insider

Python 3.7.2rc1 and 3.6.8rc1 now available for testing

Python 3.7.2rc1 and 3.6.8rc1 are now available. 3.7.2rc1 is the release preview of the next maintenance release of Python 3.7, the latest feature release of Python. 3.6.8rc1 is the release preview of the next and last maintenance release of Python 3.6, the previous feature release of Python. Assuming no critical problems are found prior to 2018-12-20, no code changes are planned between these release candidates and the final releases. These release candidates are intended to give you the opportunity to test the new security and bug fixes in 3.7.2 and 3.6.8. We strongly encourage you to test your projects and report issues found to as soon as possible. Please keep in mind that these are preview releases and, thus, their use is not recommended for production environments.

You can find these releases and more information here:

December 11, 2018 10:00 PM UTC

PyCoder’s Weekly

Issue #346 (Dec. 11, 2018)

#346 – DECEMBER 11, 2018
View in Browser »

The PyCoder’s Weekly Logo

PyCon 2019 Proposal Submission Deadline Is Fast Approaching

If you’re interested in giving a talk or hosting a poster session at PyCon 2019, be sure to get your proposals in before the deadline on January 3rd.

SQLAlchemy for MySQL and Pandas

Running SQL queries and loading the results directly into a Pandas dataframe with SQLAlchemy (read_sql_query()). Using this approach, the 4.5+ seconds it took in Eric’s example to grab data, analyze the data, and return the data was reduced to about 1.5 seconds. Impressive gains for just switching out the connection/management method.

How to Grow a Neat Software Architecture Out of Jupyter Notebooks

“Have you ever been in the situation where you’ve got Jupyter notebooks so huge that you were feeling stuck in your code?” It’s easy to get to that point with any notebook or REPL-based workflow, and Guillame shares some interesting ideas in this article.

Get a Data Science Job in 6 Months, Guaranteed


With 1-on-1 mentorship, career coaching, and personalized support, you’ll gain the skills and experience you need to get hired in a new role with Springboard’s Data Science Career Track. The average reported salary increase was $23k. Launch your new career with Springboard. Apply today →

The Hitchhiker’s Guide to Python

An opinionated guide to the Python programming language and a best practice handbook to the installation, configuration, and usage of Python on a daily basis. Community-powered and contributor-friendly (pull requests welcome).

Pipenv: Promises a Lot, Delivers Very Little

Opinionated piece on Pipenv, the Python packaging tool: “In this post, I will explore the problems with Pipenv. Was it really recommended by Can everyone — or at least, the vast majority of people — benefit from it?” Also see the related discussion on Hacker News.

Teaching Kids to Code: I’m a Developer and I Think It Doesn’t Actually Teach Important Skills

The gist of this opinion piece is that sending kids to code clubs and “summer coding camps” may teach them the wrong skills at the wrong time. That said, I think I would’ve loved to go to a summer coding camp as a kid… Worth a read if you’re thinking about teaching your kids how to code.
JOE MORGAN opinion

Sending Emails With Python

Find out how to send plain-text and HTML messages, add files as attachments, and send personalized emails to multiple people using Python. Covers talking directly to an SMTP server and how to use the APIs of a transactional email service like Sendgrid (which is what we do for PyCoder’s Weekly).


(Possibly) Quitting Python Development After 2 Years of Non-Stop Learning

“I just can’t finish a damn project. I’ve been learning entirely on my own. I can honestly say I’ve received zero assistance apart from help on minor bugs.”

When You Try to Choose a Meaningful Variable Name…


A Love Letter to f-strings


What It’s Like to Be a Moderator on a Python Forum

Brian is a project manager by day, and by night he’s one of the moderators of the Pythonista Café. In this interview, he talks about how Python helps him in his role as a project manager, and how moderating a large forum for Python enthusiasts has impacted his coding chops.

Python Jobs

Software Engineer (Munich, Germany)

Stylight GmbH

Senior Software Engineer (Munich, Germany)

Stylight GmbH

Lead Engineer (Munich, Germany)

Stylight GmbH

Backend Software Engineer (Vancouver, BC)

Gasket Games Corp

Head of Engineering (Remote)


Cybersecurity Firm Seeks Backend Lead (NY or LA)

Aon Cyber Solutions

Senior Software Engineer (Los Angeles, CA)


Senior Developer (Chicago, IL)


More Python Jobs >>>

Articles & Tutorials

How to Fix Your Python Code’s Style

This article demonstrates how to run the flake8 code linter only on files that were modified recently. When you inherit Python code that doesn’t follow the style guidelines that your team prefers for new code, this technique will come in handy.

Using Python to Calculate Monthly Car Payments

How interest rates/APR affect monthly payments, and how the length of a loan affects total interest paid. Nice, practical tutorial with lots of examples and illustrations. If you’re thinking about financing a car, why not put those Python skills to use.
MICHAEL GALARNYK • Shared by Michael Galarnyk

How to Become a (Good) Python Podcast Guest

A podcast episode about being a podcast guest: Brian Okken and Michael Kennedy talk about how you can become a podcast guest and what to expect. This is so meta it’s giving me seizures!
TEST & CODE podcast

Love Python? Show It With Some Python Swag


Show your love for the best programming language in the world and make your day more Pythonic with Python mugs, t-shirts, mouse pads, and stickers. Get 20% off this week only with coupon code “PYCODERS” →

Deciphering Python: How to Use Abstract Syntax Trees (AST) to Understand Code

How does the Python interpreter “know” how to run your code? Matt’s article goes into the basics of parsing and working with Abstract Syntax Trees.
MATT LAYMAN • Shared by Matt Layman

Save and Load Your Keras Deep Learning Models

As usual, this is another in-depth tutorial from Adrian. Comes with source code examples.

The Rise of Python for Embedded Systems Continues

Obviously these folks have a horse in the race here, but it’s cool to see that Python is getting traction in the embedded programming space. I’d definitely prefer to write my IoT logic in Python than in C, if the performance constraints allow it.

Python at Microsoft: Flying Under the Radar

Many Microsoft products now include Python support and this is Steve’s story of how this shift came about.

Using Pip in a Conda Environment

What you can do to avoid breakage when using Conda and Pip together in the same Python environment. From personal (workshop) experience I know that this is something that new Pythonistas run into.

Dockerizing a Python Django Web Application

How to build a simple “Hello World”-style web application written in Django and running it inside a Docker container.

Create PDF Files With Python and Google Docs

Interesting “hack” for generating templated PDF files using Google Docs and a Python script.

Implementing a Plugin Architecture in a Python Application


Django Software Foundation (DSF) 2019 Board Election Results


Spinning Up a Pong AI With Deep Reinforcement Learning

Dive into Python Deep RL by coding a simple Vanilla Policy Gradient model that plays the beloved early 1970s classic video game Pong.

From Python Software Engineer to Engineering Manager

Swaroop is the author of “A Byte of Python” and an engineering manager at HelpShift. In this interview he talks about his experience and the lessons learned moving from an individual contributor role into dev management.
DEVTOMANAGER.COM • Shared by Siddhant Goel

All Things Being Equal: “is” vs “==”

To test equality, Python allows one to use “==” to test for values and “is” to test for identity. Find out the difference between the two in this easy to understand beginner’s article.
HARLIN SERITT • Shared by Ricky White

Projects & Code

Terminals Are Sexy

A curated list of Terminal frameworks, extensions, and resources for CLI lovers. I had no idea what was out there…

loguru: Python Logging Made (Stupidly) Simple

This is a brand-new project but the README looks promising. If you ever felt lazy about configuring logging and used print() instead, this might be worth checking out.

molten: A Framework for Building HTTP APIs With Python


nginxpy: Embed Python in NGINX

Allows you to run embedded Python in the NGINX web server.

secure: Secure Headers and Cookies for Python Web Frameworks

HTTP response headers that, when set, can enhance the security of your web application by enabling browser security policies. Things like X-Frame-Options, Referrer-Policy, and so on.

python-colorlog: Colored Formatter for Python’s Logging Module


Python Template Snippets (VS Code Extension)

A snippet extension for Django and Jinja2 templating engines.
VISUALSTUDIO.COM • Shared by Ricky White

MyPy 0.650 Released


PyTorch 1.0 Released


nbdime: Local Diffing and Merging of Jupyter Notebooks


Happy Pythoning!
This was PyCoder’s Weekly Issue #346.
View in Browser »


[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

December 11, 2018 08:30 PM UTC

Stories in My Pocket

Recommendation: Dash for your docs

Part of what I want to do with this site is recommend tools and resources that I've found valuable, in the hopes that you might benefit from them and enjoy them as I have.

There has been one program, in particular, that I use most times I'm in a coding session. Sadly, I can't use it at work, and I miss it dearly. Whenever I open it, I feel a sense of relief, knowing that I am in good hands.

Today, I recommend you try Dash.

Documentation quick reference

Having the documentation at hand for what your working on is a great boost for productivity. I can't tell you how many times I've needed to look up the order of arguments to animate something in CSS or the syntax to copy a file using `scp`. Having a way to quickly find the answer you are looking for is crucial for keeping in your flow, and Dash excels at that.

Dash downloads the documentation for languages and frameworks that you use, provides lightning-fast searching for them, and gives you a great user experience to quickly access what you need.

Also, these docs you've downloaded are accessible when you're not on the Internet, perfect for coding while traveling.

Let's say, for example, you were to type "sort" with the docs I have installed. Dash would immediately bring up the docs for python's `list.sort()` method. I can tap the down arrow to see a collection of JavaScript `sort` functions, SQLAlchemy's `MutableList.sort()`, a `man` page for `sort`, jinja's `sort`, and so on. You can see more in the screenshot.

Dash sort example

An example of searching for sort


Every result from your search can be quickly accessed and further searched through the keyboard.

Continuing with this example, the first result from searching for "sort" is python's `list.sort`. There's a "3" on the right side to signify that there are three total matches for "sort" in the python standard library. If I hit `return` or `option-return`, it'll pop up a menu to choose between them.

The next result is for three options in the JavaScript language. Tapping the down key will bring up the first, and an `return` will bring up the menu to choose between them.

If I know I'm looking for something in python3, I can type "py:",[py]{I believe the default binding is "python:", but who wants to type all that when you've got coding to do?} and Dash will narrow its scope to return the docs I want.

On top of that, if you're still in the search field, you can hit `space`, start typing in more letters, and Dash will search and highlight the first match for that term in the currently active document. Hitting `return` jumps to the next one.

I can't tell you how significant this has been to my coding productivity.

Other options

Before I go further, I want to pause to mention there are other documentation tools out there that can do similar things. The ones I know of are:

These have a similar functionality of downloading the documentation for languages, frameworks, and tools, and make them are available to search offline. Please give them a try. They may fill your needs. I even use DevDocs at work.

But in my eyes, they just don't measure up to the experience that I enjoy in Dash. Most of them don't have anywhere near the number of doc sets available,[doclist]{A partial list of the docs you can download is up on the Dash website.} and none of them have as great of an experience as Dash.

But some of these are free and give you a taste of what you can get if you were to pay for Dash.

Jump around

One of the things I miss the most when I'm not able to use Dash is the way it allows me to quickly explore and jump around the document I'm looking at.

Dash top of window

The top of Dash's interface


The top of the window has a breadcrumb-like navigation that allows you to click on a segment and drill in to other items. For example, if I were looking up the documentation for python's `pathlib.Path`, the navigation would display `pathlib` > `Classes` > `Path`. If I click on "Path", it would display a drop-down where I can choose from the other Classes in the `pathlib` module, like `PosixPath` or `PurePath`. Clicking on "Classes" provides a great way to drill down into the modules and methods in `pathlib`, as well as the sections of the documentation.

Dash UI left side

A scrollable menu with links to jump to the part of the document you want the most.


The same ability to see what else is included in the documentation page is available in the lower-left corner of the window as well.

One of the great things about this solution is that it provides an opportunity for "accidental discovery" of documentation that you might not have known about. I've learned a lot about the libraries I use every day because I went down a documentation rabbit hole that Dash inspired.[pathlib]{My excitement over of python's `pathlib` module is a key example of a library I have discovered, use, and love, thanks to Dash.}

There's more to love

What I've described so far is the basic functionality for Dash, and it has been enough for me to be very happy with Dash for years.

There are other features that are very useful, including:

  • Setting up search profiles, which allow you to set up a collection of docs to search against that can get activated with triggers. Some profiles I have set up include an Arduino profile, that activates when I start the Arduino coding environment, and project-specific profiles with docs for their particular stack.
  • Integration with coding environments and other programs. This is great, since as you're coding you can hit a key combination and immediately get the docs for the word you're on. Most coding environments can also tell Dash what language you're currently in, to improve the chances of having the best result pop up.
  • "Snippets", which give you the ability to type a few characters and have them replaced with the text you set up. Most code editors have this functionality already, but Dash's works just about everywhere.
Keep you in your flow

When it comes down to it, I really appreciate Dash's ability to keep me in the flow.

I don't have to go to use a web browser and run the chance of getting distracted on my way to or from the documentation I'm looking for. I can be in and out of the documentation in a minimal amount of time, and right back at it.

Second to that, the ability to explore documentation or discover new-to-me functionality in the code I use has been just as valuable.

Dash is available to download for macOS. It has a free trial that allows you to use all it's functionality, but after a delay. At the time writing, Dash is $30.

Do you use another solution for your document lookup? Let me know your story.

December 11, 2018 07:47 PM UTC

Python Piedmont Triad User Group

PYPTUG Monthly Meeting (December): Training Tools - Utilizing Jupyter Notebooks


Come join PYPTUG at out next monthly meeting (Deceber 18th 2018) to learn more about the Python programming language, modules and tools. Python is the language to learn if you've never programmed before, and at the other end, it is also a tool that no expert would do without.

Main talk:     Training Tools - Utilizing Jupyter Notebooks
presented by Joan Pharr

Joan Pharr is currently a Lead Business Analyst at Valassis Digital. On the side she seeks to save the world with math, facts and a penchant for comedy. Collaborating with others to solve problems and share stories is one of the her favorite aspects of PyData.

Jupyter Notebooks can be a great tool for teaching and learning in a corporate setting. In this talk I’ll cover three ways I’ve incorporated Jupyter Notebooks into my daily routine:
(1) Using notebooks as templates with information, questions and examples for training new team members. 
(2) Collecting notebooks to share code for infrequent requests across our team. 
(3)  Using notebooks to try new techniques, play with new packages, and quickly test new code.

Lightning talks!

We will have some time for extemporaneous "lightning talks" of 5-10 minute duration. If you'd like to do one, some suggestions of talks were provided here, if you are looking for inspiration. Or talk about a project you are working on.


Tuesday, Decemberr 18th 2018
Meeting starts at 6:00PM


Wake Forest University, close to Polo Rd and University Parkway:
Manchester Hall
room: Manchester 241
Wake Forest University, Winston-Salem, NC 27109
 Map this
See also this campus map (PDF) and also the Parking Map (PDF) (Manchester hall is #20A on the parking map)
And speaking of parking:  Parking after 5pm is on a first-come, first-serve basis.  The official parking policy is:
"Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit."

Mailing List:

Don't forget to sign up to our user group mailing list:
It is the only step required to become a PYPTUG member.

December 11, 2018 07:32 PM UTC

Mike Driscoll

Python 101: Episode #37 – How to Add Your Code to PyPI

In this episode, you will learn ye olde method of adding code to the Python Packaging Index. Note that while some of this video is still relevant, you should be using the twine package now for uploading to PyPI. See for more information.

You can also read the chapter this video is based on here or get the book on Leanpub

Previous Episodes

December 11, 2018 07:06 PM UTC


Reminder: Webinar this Thursday, “Automating Build, Test and Release Workflows with tox” with Oliver Bestwalter

Interested in testing and release automation? We’re doing a webinar with Oliver Bestwalter of tox fame, this Thursday. Join us to learn more about test isolation with tox as well as how it fits in with a larger ecosystem of repeatable release processes.

Webinar Bestwalter Register


We will look at what is necessary to automate all important workflows involved in building, testing and releasing software using tox.

We’ll cover how to use tox to …

All this can be run and debugged locally from the command line or programmatically.

December 11, 2018 04:01 PM UTC

Python Software Foundation

PyConZA 2018 – a beautiful community in South Africa

This year I attended my second PyConZA, which is held in Johannesburg, South Africa. It is the annual gathering of the South African Python community that uses and develops the open source Python programming language. It's organized by the community for the community, fostering unique solutions to the challenges faced in Africa. For the curious: ZA stands for Zuid-Afrika, a Dutch abbreviation for South Africa.

I keep coming back to South Africa to attend PyConZA. I am from Brazil but I struggle to resist a trip to South Africa to visit amazing friends, the beautiful mountains, beaches, wine farms, great food, safaris, and more.

The South African conference, a conference ran entirely by a team of dedicated volunteers, reached its eighth edition this year. As an added success this year the conference reached an outstanding number of attendees.

The Numbers

Over five days – which included tutorials, main conference and sprints – the conference received 255 attendees, boasting 100% growth compared to the last time it was held in Johannesburg in 2015.

The main event counted three simultaneous tracks – plus daily open space sessions. Collectively the conference had 41 speakers, 34 talks, 13 lightning talks and 3 keynotes. The Data Science and Typing tutorials gathered 36 people. Roughly 15 attendees with hacker spirits joined the sprints and ate pizza whilst working on various projects.

Women in Tech ZA & PyConZA gathered 13 attendees for their beginners friendly workshop "Python for Everyone".

Sponsored by 11 entities – including companies such as Microsoft and Oracle – the event had   lunch daily, a lounge with really good coffee, juices and mocktails – freshly made by professionals and available at all times – a speaker's dinner and lots of swag in the Birchwood Hotel Conference Center.

If a Python conference wasn't enough, Johannesburg hosted at the same week and venue, LinuxConf and PostgresConf, bringing in yet more attendees, diversity and people walking around with three different badges.

Speakers photo <3

Running a Conference Ain't Easy!

Here is what the conference organizers had to say about this year's conference:

David Sharpe, chair of the PyConZA 2017, said:
PyConZA is a conference made for the community and by the community. Getting people involved with it is relatively easy – getting people up to speed with how to run a conference is the hard part. The same team has been running the conference for the past seven years, and now our biggest challenge is to spread this knowledge and show other people the ropes, having redundancy in the committee and enabling PyConZA to move around the country more.

Adam Piskorski, chair of this year’s edition, completed:
Finding volunteers and chasing sponsors has been especially difficult when most of the organizers are based in Cape Town – a city near the south most part of the country. For the next year, we want a larger conference with more optimized planning and execution.


The talk "Python Community Development in East Africa" is proof of how the Python programming language and community is changing the world's landscape and people's life.  I’d encourage you to  take 40 minutes of your time and watch this, it's inspirational.

Joshua Kato (PSF Python Ambassador in East Africa), Linus Wamanya and Buwembo Murshid showed us how they are empowering the community in East Africa through training and mentoring kids, students, and people with intellectual or physical disabilities and refugees.

AfroDjango already has trained more than 3000 people since 2015, from basic digital literacy to professional software development. Projects such as home automation, online learning platforms and an online market for hardware and sensors are being currently developed by their students.

Today, AfroDjango has support from a variety of partners, including the PSF. All of this amazing work has been recognized as "Promoting ICT practical skills" by Uganda's Head of State.

Financial Aid

Financial assistance is provided for those who might otherwise not be able to attend the conference. Those potentially eligible were attendees with accepted talks, attendees from South Africa and other African countries (especially those from underprivileged backgrounds) and volunteers helping the conference.

This year, PyConZA was able to provide an amount of R40.000 (about US$2.700) as financial aid for 7 attendees – 2 from South Africa, 3 from Mozambique, 1 from Nigeria and 1 from Uganda, 4 of them being women and 5 being speakers. The organizing team used a points system to reward speakers, giving priority to people from Africa and South Africa. They also wanted to choose people from disadvantaged backgrounds, but the committee mentioned it proved difficult to fairly ascertain that.

The Video Team

Another highlight shared by all three events was the video recording crew. Everything seemed magical and seamless. The video infrastructure organization was led by Carl Karsten, a really cool Pythonista wearing hawaiian shorts from Chicago, and the Next Day Video team.

They were able to record and livestream three simultaneous tracks using open source software and even open source hardware. The recording interface was so simple that volunteers (including me) could help with the job after just a two minutes tutorial. On top of that, the videos were released on and Youtube in couple of hours, with minimal manual intervention.

The Python Software Society of South Africa

The PyConZA organizing committee created PSSSA – a non-profit organization – in May 2017. The objective is to support and grow the Python community and events across the country, as well as manage and run PyConZA.

Today it's being used mainly as a legal and financial entity to support the conference infrastructure, but the plans are to spread its influence and facilitate Python groups throughout South Africa.

PyConZA is awesome!

I'd like to say thanks to the PyConZA organizing committee for helping me gather all the information necessary to put this article together. It is always a pleasure to hang out with you folks.

PyConZA 2019 is expected to be hosted once again in Johannesburg, in October 2019. I hope to see you there!

December 11, 2018 10:46 AM UTC

Talk Python to Me

#190 Teaching Django

You'll find this episode to be part discussion on how to teach and learn Django as well as why learning web development can be hard and part meta where Will Vincent and I discuss the business of creating content and teaching around Python.

December 11, 2018 08:00 AM UTC

Python Bytes

#108 Spilled data? Call the PyJanitor

December 11, 2018 08:00 AM UTC

Test and Code

57: What is Data Science? - Vicki Boykis

Data science, data engineering, data analysis, and machine learning are part of the recent massive growth of Python.

But really what is data science?

Vicki Boykis helps me understand questions like:

Also covered:

I learned a lot about the broad field of data science from talking with Vicki.

Special Guest: Vicki Boykis.

Sponsored By:

Support Test and Code - A Podcast about Software Testing, Software Development, and Python


<p>Data science, data engineering, data analysis, and machine learning are part of the recent massive growth of Python. </p> <p>But really what is data science? </p> <p>Vicki Boykis helps me understand questions like:</p> <ul> <li>No really, what is data science?</li> <li>What does a data pipeline look like?</li> <li>What is it like to do data science, data analysis, data engineering?</li> <li>Can you do analysis on a laptop?</li> <li>How big does data have to be to be considered big?</li> <li>What are the challenges in data science?</li> <li>Does it make sense for software engineers to learn data engineering, data science, pipelines, etc?</li> <li>How could someone start learning data science?</li> </ul> <p>Also covered:</p> <ul> <li>A type work (analysis) vs B type work (building)</li> <li>data lakes and data swamps</li> <li>predictive models</li> <li>data cleaning</li> <li>development vs experimentation</li> <li>Jupyter Notebooks</li> <li>Kaggle</li> <li>ETL pipelines</li> </ul> <p>I learned a lot about the broad field of data science from talking with Vicki.</p><p>Special Guest: Vicki Boykis.</p><p>Sponsored By:</p><ul><li><a rel="nofollow" href="">DigitalOcean</a>: <a rel="nofollow" href="">Get started with a free $100 credit toward your first project on DigitalOcean and experience everything the platform has to offer, such as: cloud firewalls, real-time monitoring and alerts, global datacenters, object storage, and the best support anywhere. Claim your credit today at:</a></li></ul><p><a rel="payment" href="">Support Test and Code - A Podcast about Software Testing, Software Development, and Python</a></p><p>Links:</p><ul><li><a title="How to Lie with Statistics : Darrell Huff" rel="nofollow" href="">How to Lie with Statistics : Darrell Huff</a></li><li><a title="Should you replace Hadoop with your laptop?" rel="nofollow" href="">Should you replace Hadoop with your laptop?</a></li><li><a title="Kaggle" rel="nofollow" href="">Kaggle</a></li><li><a title="Project Jupyter" rel="nofollow" href="">Project Jupyter</a></li><li><a title="Soviet Art Bot" rel="nofollow" href="">Soviet Art Bot</a> &mdash; A bot that finds socialist realism paintings and tweets them out</li></ul>

December 11, 2018 07:45 AM UTC