skip to navigation
skip to content

Planet Python

Last update: February 28, 2017 01:47 PM

February 28, 2017


Ilian Iliev

Django Admin and Celery

A weird race condition

Some background: We have a model that is edited only via the Django admin. The save method of the model fires a celery task to update several other records. The reason for using celery here is that the amount of related objects can be pretty big and we decided that it is best to spawn the update as a background job. We have unit and integration tests, the code was also manually tested, everything looked nice and we deployed it.

On the next day we found out that the code is acting weird. No errors, everything looked like it has worked but the updated records actually contained the old data before the change. The celery task accepts the object ID as an argument and the object is fetched from the database before doing anything else so the problem was not that we were passing some old state of the object. Then what was going on?

Trying to reproduce it: Things are getting even weirder. The issues is happening every now and then.

Hm...? Race condition?! Let's take a look at the code:
class MyModel(models.Model): def save(self, **kwargs): is_existing_object = False if self.pk else True super(MyModel, self).save(**kwargs) if is_existing_object: update_related_objects.delay(self.pk) So the celery task is called after the "super().save()" and the changes should be already stored to the database. Unless...

The bigger picture: (this is simplified version of what is happening for the full one check Django's source) def changeform_view(...): with transaction.atomic(): ... self.save_model(...) # here happens the call to MyModel.save() self.save_related(...) ...

Ok, so the save is actually wrapped in transaction. This explains what is going on. Before the transaction is committed the updated changes are not available for the other connections. This way when the celery task is called we end up in a race condition whether the task will start before or after the transaction is completed. If celery manages to pick the task before the transaction is committed it reads the old state of the object and here is the error.

Solution: Honestly, we picked a quick and ugly fix. We added a 60 seconds countdown to the task call giving the transaction enough time to complete. As the call to the task depends on some logic and which properties of the models instance are changes moving it out of the save method was a problem. Another option could be to pass all the necessary data to the task itself but we decided that it will make it too complicated.

However I am always open to other ideas so if you have hit this issue before I would like to know how you solved it.

February 28, 2017 10:39 AM


Daniel Bader

Context Managers and the “with” Statement in Python

Context Managers and the “with” Statement in Python

The “with” statement in Python is regarded as an obscure feature by some. But when you peek behind the scenes of the underlying Context Manager protocol you’ll see there’s little “magic” involved.

So what’s the with statement good for? It helps simplify some common resource management patterns by abstracting their functionality and allowing them to be factored out and reused.

In turn this helps you write more expressive code and makes it easier to avoid resource leaks in your programs.

A good way to see this feature used effectively is by looking at examples in the Python standard library. A well-known example involves the open() function:

with open('hello.txt', 'w') as f:
    f.write('hello, world!')

Opening files using the with statement is generally recommended because it ensures that open file descriptors are closed automatically after program execution leaves the context of the with statement. Internally, the above code sample translates to something like this:

f = open('hello.txt', 'w')
try:
    f.write('hello, world')
finally:
    f.close()

You can already tell that this is quite a bit more verbose. Note that the try...finally statement is significant. It wouldn’t be enough to just write something like this:

f = open('hello.txt', 'w')
f.write('hello, world')
f.close()

This implementation won’t guarantee the file is closed if there’s an exception during the f.write() call—and therefore our program might leak a file descriptor. That’s why the with statement is so useful. It makes acquiring and releasing resources properly a breeze.

Another good example where the with statement is used effectively in the Python standard library is the threading.Lock class:

some_lock = threading.Lock()

# Harmful:
some_lock.acquire()
try:
    # Do something...
finally:
    some_lock.release()

# Better:
with some_lock:
    # Do something...

In both cases using a with statement allows you to abstract away most of the resource handling logic. Instead of having to write an explicit try...finally statement each time, with takes care of that for us.

The with statement can make code dealing with system resources more readable. It also helps avoid bugs or leaks by making it almost impossible to forget cleaning up or releasing a resource after we’re done with it.

Supporting with in Your Own Objects

Now, there’s nothing special or magical about the open() function or the threading.Lock class and the fact that they can be used with a with statement. You can provide the same functionality in your own classes and functions by implementing so-called context managers.

What’s a context manager? It’s a simple “protocol” (or interface) that your object needs to follow so it can be used with the with statement. Basically all you need to do is add __enter__ and __exit__ methods to an object if you want it to function as a context manager. Python will call these two methods at the appropriate times in the resource management cycle.

Let’s take a look at what this would look like in practical terms. Here’s how a simple implementation of the open() context manager might look like:

class ManagedFile:
    def __init__(self, name):
        self.name = name

    def __enter__(self):
        self.file = open(self.name, 'w')
        return self.file

    def __exit__(self, exc_type, exc_val, exc_tb):
        if self.file:
            self.file.close()

Our ManagedFile class follows the context manager protocol and now supports the with statement, just like the original open() example did:

>>> with ManagedFile('hello.txt') as f:
...    f.write('hello, world!')
...    f.write('bye now')

Python calls __enter__ when execution enters the context of the with statement and it’s time to acquire the resource. When execution leaves the context again, Python calls __exit__ to free up the resource.

Writing a class-based context manager isn’t the only way to support the with statement in Python. The contextlib utility module in the standard library provides a few more abstractions built on top of the basic context manager protocol. This can make your life a little easier if your use cases matches what’s offered by contextlib.

For example, you can use the contextlib.contextmanager decorator to define a generator-based factory function for a resource that will then automatically support the with statement. Here’s what rewriting our ManagedFile context manager with this technique looks like:

from contextlib import contextmanager

@contextmanager
def managed_file(name):
    try:
        f = open(name, 'w')
        yield f
    finally:
        f.close()

>>> with managed_file('hello.txt') as f:
...     f.write('hello, world!')
...     f.write('bye now')

In this case, managed_file() is a generator that first acquires the resource. Then it temporarily suspends its own executing and yields the resource so it can be used by the caller. When the caller leaves the with context, the generator continues to execute so that any remaining clean up steps can happen and the resource gets released back to the system.

Both the class-based implementations and the generator-based are practically equivalent. Depending on which one you find more readable you might prefer one over the other.

A downside of the @contextmanager-based implementation might be that it requires understanding of advanced Python concepts, like decorators and generators.

Once again, making the right choice here comes down to what you and your team are comfortable using and find the most readable.

Writing Pretty APIs With Context Managers

Context managers are quite flexible and if you use the with statement creatively you can define convenient APIs for your modules and classes.

For example, what if the “resource” we wanted to manage was text indentation levels in some kind of report generator program? What if we could write code like this to do it:

with Indenter() as indent:
    indent.print('hi!')
    with indent:
        indent.print('hello')
        with indent:
            indent.print('bonjour')
    indent.print('hey')

This almost reads like a domain-specific language (DSL) for indenting text. Also, notice how this code enters and leaves the same context manager multiple times to change indentation levels. Running this code snippet should lead to the following output and print neatly formatted text:

hi!
    hello
        bonjour
hey

How would you implement a context manager to support this functionality?

By the way, this could be a great exercise to wrap your head around how context managers work. So before you check out my implementation below you might take some time and try to implement this yourself as a learning exercise.

Ready? Here’s how we might implement this functionality using a class-based context manager:

class Indenter:
    def __init__(self):
        self.level = 0

    def __enter__(self):
        self.level += 1
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self.level -= 1

    def print(self, text):
        print('    ' * self.level + text)

Another good exercise would be trying to refactor this code to be generator-based.

Things to Remember

February 28, 2017 12:00 AM

February 27, 2017


GoDjango

You Should do Weird Stuff in Django

Out of the ordinary things are the times that we grow the most as developers. However, a lot of times we don't do enough weird things when we write our software.

I would encourage you to try that odd idea that you have been messing with in your head that you aren't sure you can do. Ironically you will grow more as a developer from that than your day to day getting stuff done development

I am not sure why I ever really thought about this, but give a watch to what keyed me in on this thought.

You Should Do Weird Stuff in Django

February 27, 2017 04:00 PM


Doug Hellmann

os — Portable access to operating system specific features — PyMOTW 3

The os module provides a wrapper for platform specific modules such as posix , nt , and mac . The API for functions available on all platforms should be the same, so using the os module offers some measure of portability. Read more… This post is part of the Python Module of the Week series … Continue reading os — Portable access to operating system specific features — PyMOTW 3

February 27, 2017 02:00 PM


Mike Driscoll

PyDev of the Week: Victor Stinner

This week we welcome Victor Stinner as our PyDev of the Week! Victor is quite active in the Python community and is a core Python developer. You can see some of his contributions here. He is the author of eight accepted PEPs which you can also read about at the previous link. If you’re interested in seeing what else Victor has been up to, then I highly recommend checking Github and ReadTheDocs. Victor also has put together some interesting benchmarks for CPython and about FASTCALL optimization. You might also want to check out his latest talk about Python benchmarks here: https://fosdem.org/2017/schedule/event/python_stable_benchmark/

Now let’s spend some time getting to know Victor better!

Can you tell us a little about yourself (hobbies, education, etc):

Hi, my name is Victor Stinner, I’m working for Red Hat on OpenStack, and I’m a CPython core developer since 2010.

I am an engineer from the engineer school Université de Technologie de Belfort-Montbéliard (UTBM), France. When I don’t hack CPython, I play with my two little cute daughters 🙂

What other programming languages do you know and which is your favorite?

I was always programming. I tried a wide range of programming languages from the lowest level Intel x86 assembler to high level languages like Javascript and BASIC. Even if I now really enjoy writing C code for best performances, Python fits better my requirements for my daily job. Since it’s easy to write Python code, and I’m not annoyed by memory management or analyzing crashes, I use the “free” time to write more unit tests, take care of the coding
style, and all tiny stuffs which make a software a “good software”.

After 10 years of professional programming, I can now say that I spent more time on reading “old” code and fixing old complex corner case, than writing new code from scratch. Having an extensible test suite makes me more cool. Having to work under pressure without straps is likely to lead to burnout, or more simply to quit a job.

What projects are you working on now?

At Red Hat, I have a big project of porting OpenStack to Python 3. OpenStack is made of more than 3 millions of Python code, and it is growing everyday! More than 90% of the unit tests already pass on Python 3, we are now working on fixing last issues on functional and
integration tests.

On CPython, I spent a lot of time on fixing Unicode in the childhood of Python 3. Nowadays, I’m working on multiple projects to make CPython faster. The very good news is that CPython 3.6 is now faster than 2.7 on most benchmarks, and CPython 3.7 is already faster than CPython 3.6! In short, Python 3 is finally faster than Python 2!

Last year, I spent a lot of time on a “FASTCALL” optimization which avoid the creation of a temporary tuple to pass positional arguments and a temporary dictionary to pass keyword arguments. More than 3/4 of my FASTCALL work is now merged into CPython. When a function is converted to FASTCALL, it becomes usually 20% faster, and the conversion is straightforward.

While working on FASTCALL and other optimizations, I was blocked by benchmarks which were not reliable. You can see the “How to run stable benchmarks” talk which I just gave at FOSDEM (Brussels, Belgium) which lists all my findings and explains how to get reproducible and reliable results: https://fosdem.org/2017/schedule/event/python_stable_benchmark/

See also the perf project that I created to make benchmarks more reliable. It’s a Python module to write a benchmark in two lines of code. The module provides many tools to check if a benchmark is reliable, compare two benchmarks and check if an optimization is significant, etc.

Which Python libraries are your favorite (core or 3rd party)?

In the Python standard library, I like the asyncio, argparse and datetime modules.

The datetime does one thing and does it well. It was enhanced recently to support Daylight Saving Time (DST): https://www.python.org/dev/peps/pep-0495/

The argparse module is very complete, it allows to build advanced command line interfaces. I used it in my perf module to get sub commands like “python3 -m perf timeit stmt”, “python3 -m perf show –metadata file.json”, …

The asyncio is a very nice integration of cool things: efficient event loop for network servers and Python 3 new async/await keywords. Not only asyncio has a nice API (no more callback hell!), but it also have a good implementation. For example, few event loop libraries support subprocesses, especially on Windows IOCP (the most efficient way to do asynchronous programming on Windows).

As a core developer, I care mostly on modules of the standard libraries, but in fact the best libraries maintained on PyPI! Just a few examples: pip, jinja2, django, etc. Sorry, the list is too long to fit here 🙂

Where do you see Python going as a programming language?

My hope is that Python will stop evolving, I’m talking about the language itself. During the slow transition to Python 3 which took years, I realized how much users like that Python 2.7 stopped evolving. Not having to touch their code is seen as an advantage, compared to fast-moving libraries or even programming languages.

Since packaging now runs smoothly with pip, it became easy to have external dependencies. The advantage of external code is that it can move much faster than the Python standard library which is basically only updated every two years with a major Python release.

Even if I dislike evolutions, I have to admit that the recent additions to the language are really cool: generalized unpacking, async/await keywords, f-string, allow underscores in numbers, etc.

Is there anything else you’d like to say?

When I listen to Twitter, Go, Rust, Javascript, Elm, etc. seem to be much more active than any other language.

In the meanwhile, I’m always impressed by all the work done in each Python release. Even the Python language is still evolving. Facebook decided to use Python 3.5 only to get the new async and await keywords with asyncio! Python 3.6 adds even more things: f-string (PEP 498), Syntax for Variable Annotations (PEP 526) and Underscores in Numeric Literals (PEP 515).

By the way, many people are complaining against type hintings. Some see them as “non pythonic”. Others fear that Python becomes a boring Java-like language. I also know that type hintings are already used in Python in large companies like Dropbox and Facebook, and they are very helpful for very large code bases.

The cool thing with Python is that it doesn’t enforce anything. For example, you can design a whole application without using objects. You can also ignore completely type hintings, they are fully optional. That’s a strength of Python!

Thanks so much for doing the interview!

February 27, 2017 01:30 PM


William Minchin

Post Stats Plugin 1.1.0 for Pelican Released

Post Stats is a plugin for Pelican, a static site generator written in Python.

Post Stats calculates various statistics about a post and store them in an article.stats dictionary:

Installation

The easiest way to install Post Stats is through the use of pip. This will also install the required dependencies automatically.

pip install minchin.pelican.plugins.post_stats

Then, in your pelicanconf.py file, add Post Stats to your list of plugins:

PLUGINS = [
           # ...
           'minchin.pelican.plugins.post_stats',
           # ...
           ]

You may also need to configure your template to make use of the statistics generated.

Requirements

Post Stats depends on (and is really only useful with) Pelican. The plugin also requries Beautiful Soup 4 to process your content. If the plugin is installed from pip, these will automatically be installed. These can also be manually installed with pip:

pip install pelican
pip install beautifulsoup4

Configuration and Usage

This plugin calculates various statistics about a post and store them in an article.stats dictionary.

Example:

{
    'wc': 2760,
    'fi': '65.94',
    'fk': '7.65',
    'word_counts': Counter({u'to': 98, u'a': 90, u'the': 83, ...}),
    'read_mins': 12
}

This allows you to output these values in your templates, like this, for example:

<p title="~{{ article.stats['wc'] }} words">~{{ article.stats['read_mins'] }} min read</p>
<ul>
    <li>Flesch-kincaid Index/ Reading Ease: {{ article.stats['fi'] }}</li>
    <li>Flesch-kincaid Grade Level: {{ article.stats['fk'] }}</li>
</ul>

The word_counts variable is a Python Counter dictionary and looks something like this, with each unique word and it’s frequency:

Counter({u'to': 98, u'a': 90, u'the': 83, u'of': 50, u'karma': 50, .....

and can be used to create a tag/word cloud for a post.

There are no user-configurable settings.

Known Issues

An issue, as such, is that there is no formal test suite. Testing is currently limited to my in-use observations. I also run a basic check upon uploaded the package to PyPI that it can be downloaded and loaded into Python.

The package is tested in Python 3.6; compatibility with other version of Python is unknown, but there should be nothing particular keeping it from working with other “modern” versions of Python.

Credits

Original plugin by Duncan Lock (@dflock) and posted to the Pelican-Plugins repo.

License

The plugin code is assumed to be under the AGPLv3 license (this is the license of the Pelican-Plugins repo).

February 27, 2017 03:52 AM

Optimize Images Plugin 1.1.0 for Pelican Released

Optimize Images is a plugin for Pelican, a static site generator written in Python.

Optimize Images applies lossless compression on JPEG and PNG images, with no effect on image quality. It uses jpegtran and OptiPNG.

Installation

The easiest way to install Optimize Images is through the use of pip. This will also install the required Python dependencies automatically (currently none beyond Pelican itself).

pip install minchin.pelican.plugins.optimize_images

It is assumed both jpegtran and OptiPNG are installed and available on the system path.

Then, in your pelicanconf.py file, add Optimize Images to your list of plugins:

PLUGINS = [
           # ...
           'minchin.pelican.plugins.optimize_images',
           # ...
           ]

Requirements

Optimize Images depends on (and is really only useful with) Pelican. This can be manually installed with pip:

pip install pelican

It is assumed both jpegtran and OptiPNG are installed on system path. On Windows, installers are available at each respective website. On Ubuntu systems (including Travis-CI), the two can be installed via apt-get.

apt-get install optipng libjpeg-progs

Configuration and Usage

The plugin will activate and optimize images upon finalized signal of Pelican.

The plugin has no user settings.

Known Issues

Image manipulation like this can take some time to run. You may consider only adding this plugin to your publishconf.py (rather than your base pelicanconf.py), which will then only run this image optimization in preparation for site publication.

An issue, as such, is that there is no formal test suite. Testing is currently limited to my in-use observations. I also run a basic check upon uploaded the package to PyPI that it can be downloaded and loaded into Python.

The package is tested in Python 3.6; compatibility with other version of Python is unknown, but there should be nothing particular keeping it from working with other “modern” versions of Python.

Credits

Original plugin from the Pelican-Plugins repo.

License

The plugin code is assumed to be under the AGPLv3 license (this is the license of the Pelican-Plugins repo).

February 27, 2017 03:00 AM

February 26, 2017


DSPIllustrations.com

How does Quantization Noise Sound?

How does Quantization Noise sound?

In a last article, we explained the mathematical effect of quantization and what the resulting quantization noise is. In this article, we will hear, how the quantization noise actually sounds. As a teaser, listen to the following:

# For running this code, the code snippets below need to be run beforehand display(HTML("Original signal:" + Audio(data=data_music, rate=rate)._repr_html_())) showQuantization(data_music, U=1,bits=4, showSignals=False);
Original signal: Your browser does not support the audio element.
Quantized to q=4 bitsQuantization Noise ...

February 26, 2017 11:00 PM


Weekly Python Chat

Emoji: Revisited

Special guest Katie McLaughlin will answer your questions about emoji: unicode, compatibility, support, and more.

February 26, 2017 06:00 PM


Brian Okken

27: Mahmoud Hashemi : unit, integration, and system testing

What is the difference between a unit test, an integration test, and a system test? Mahmoud Hashemi helps me to define these terms, as well as discuss the role of all testing variants in software development. What is the difference between a unit test, an integration test, and a system test? TDD testing pyramid vs […]

The post 27: Mahmoud Hashemi : unit, integration, and system testing appeared first on Python Testing.

February 26, 2017 04:22 PM


Oliver Andrich

cookiecutter-flask-lambda

After my first steps with Lambda, Zappa and Flask I created some small applications to try various Zappa configuration options and play with some AWS services. Setting up the project got pretty boring after the second or third toy project. But for python developers there is a solution to automate these tasks – cookiecutter. I […]

February 26, 2017 02:16 PM


qutebrowser development blog

qutebrowser v0.10.0 released

I'm happy to annouce the release of qutebrowser v0.10.0!

qutebrowser is a keyboard driven browser with a vim-like, minimalistic interface. It's written using PyQt and cross-platform.

I haven't announced the v0.9.0 release in this blog (or any patch releases), but for v0.10.0 it definitely makes sense to do so, as it's mostly centered on QtWebEngine!

The full changelog for this release:

Added

  • Userscripts now have a new $QUTE_COMMANDLINE_TEXT environment variable, containing the current commandline contents
  • New ripbang userscript to create a searchengine from a duckduckgo bang
  • QtWebKit Reloaded (also called QtWebKit-NG) is now fully supported
  • Various new functionality with the QtWebEngine backend:
    • Printing support with Qt >= 5.8
    • Proxy support with Qt >= 5.8
    • The general -> print-element-backgrounds option with Qt >= 5.8
    • The content -> cookies-store option
    • The storage -> cache-size option
    • The colors -> webpage.bg option
    • The HTML5 fullscreen API (e.g. youtube videos) with QtWebEngine
    • :download --mhtml
  • New qute:history URL and :history command to show the browsing history
  • Open tabs are now auto-saved on each successful load and restored in case of a crash
  • :jseval now has a --file flag so you can pass a javascript file
  • :session-save now has a --only-active-window flag to only save the active window
  • OS X builds are back, and built with QtWebEngine

Changed

  • PyQt 5.7/Qt 5.7.1 is now required for the QtWebEngine backend
  • Scrolling with the scrollwheel while holding shift now scrolls sideways
  • New way of clicking hints which solves various small issues
  • When yanking a mailto: link via hints, the mailto: prefix is now stripped
  • Zoom level messages are now not stacked on top of each other anymore
  • qutebrowser now automatically uses QtWebEngine if QtWebKit is unavailable
  • :history-clear now asks for a confirmation, unless it's run with --force.
  • input -> mouse-zoom-divider can now be 0 to disable zooming by mouse wheel
  • network -> proxy can also be set to pac+file://... now to use a local proxy autoconfig file (on QtWebKit)

Fixed

  • Various bugs with Qt 5.8 and QtWebEngine:
    • Segfault when closing a window
    • Segfault when closing a tab with a search active
    • Fixed various mouse actions (like automatically entering insert mode) not working
    • Fixed hints sometimes not working
    • Segfault when opening a URL after a QtWebEngine renderer process crash
  • Other QtWebEngine fixes:
    • Insert mode now gets entered correctly with a non-100% zoom
    • Crash reports are now re-enabled when using QtWebEngine
    • Fixed crashes when closing tabs while hinting
    • Using :undo or :tab-clone with a view-source:// or chrome:// tab is now prevented, as it segfaults
  • :enter-mode now refuses to enter modes which can't be entered manually (which caused crashes)
  • :record-macro (q) now doesn't try to record macros for special keys without a text
  • Fixed PAC (proxy autoconfig) not working with QtWebKit
  • :download --mhtml now uses the new file dialog
  • Word hints are now upper-cased correctly when hints -> uppercase is true
  • Font validation is now more permissive in the config, allowing e.g. "Terminus (TTF)" as font name
  • Fixed starting on newer PyQt/sip versions with LibreSSL
  • When downloading files with QtWebKit, a User-Agent header is set when possible
  • Fixed showing of keybindings in the :help completion
  • :navigate prev/next now detects rel attributes on <a> elements, and handles multiple rel attributes correctly
  • Fixed a crash when hinting with target userscript and spawning a non-existing script
  • Lines in Jupyter notebook now trigger insert mode

February 26, 2017 10:05 AM


Import Python

ImportPython 113 - Interview with Guido aka BDFL

Worthy Read

Talkpython interview with Guido van Rossum aka BDFL.
podcast
,
BDFL

We help companies like Airbnb, Pfizer, and Artsy find great developers. Let us find your next great hire. Get started today.
sponsor

The people who introduced me to Python chose it because of the elegance of the language, and it's aesthetic qualities. Would they choose it again, I wonder? Would I?.
core-python

This book is for people with some experience in an object oriented programming language. This book will help you get better at module/class level design. Hopefully, it will teach you to identify good design from bad.
book

Python material in data science, analysis, and modeling, and optimization. Here is the youtube video channel of the site https://www.youtube.com/user/APMonitorCom
video

This time, it was different though. My distributed web crawler seemed to be slowing down over time. Adding more nodes only had a temporary performance boost; the overall crawling speed gradually declined afterwards. So simply put, it couldn't scale. But why?. In this post, you'll find out what techniques and tools I used to diagnose scaling issues - and to an extent, more general performance issues - in my Python-based web crawler.
debugging

Code reuse is a very common need. It saves you time for writing the same code multiple times, enables leveraging other smart people’s work to make new things happen. Even just for one project, it helps organize code in a modular way so you can maintain each part separately. When it comes to python, it means format your project so it can be easily packaged. This is a simple instruction on how to go from nothing to a package that you can proudly put it in your portfolio to be used by other people.
packaging

closures

Note - The video is old, but worth watching for emacs users.
emacs

deep learning

image processing

PyWren, Tfdeploy, Luigi, Kubelib, PyTorch. Note - We used luigi at my previous workplace and it's a solid library to custom pipelines for batch processing. In our case it was used to enforce database migrations.
machine learning

Type Tracing - as a program runs you trace it and record the types of variables coming in and out of functions, and being assigned to variables.
debugging

I've written a Python package called pdftabextract https://github.com/WZBSocialScienceCenter/pdftabextract that contains several helpful functions for that task and I'm explaining how to use them in that blog post.
data mining


Jobs

Pune, Maharashtra, India
iDatalabs (https://idatalabs.com/) is hiring for a junior data scientist (with 0.5-2 years of work experience).


Projects

tkui - 166 Stars, 11 Fork
A visual introspective GUI maker with live editing of the GUI and its editor at the same time

tweetfeels - 18 Stars, 1 Fork
Real-time sentiment analysis in Python using twitter's streaming api

WallpapersFromReddit - 13 Stars, 3 Fork
Download all the hot images from reddit.com/r/wallpaper subreddit every 24 hours to a local device and set an image from those local files as a wallpaper, which updates automatically every 30 minutes!

lda2vec-tf - 12 Stars, 1 Fork
Tensorflow port of the lda2vec model for unsupervised learning of document + topic + word embeddings.

ieighteen - 10 Stars, 1 Fork
Speed up your Localization/Internationalization efforts by automating translation with single script

fish-hook - 8 Stars, 1 Fork
A console tool which manages your github webhooks efficiently.

ipyaml - 8 Stars, 1 Fork
IPython notebooks with YAML file format

Scrapstagram - 3 Stars, 0 Fork
An Instagram Scrapper

kimo - 3 Stars, 0 Fork
Find OS processes of MySQL queries

February 26, 2017 06:19 AM

February 25, 2017


Catalin George Festila

Linux: OpenCV and using Lucas-Kanade Optical Flow function.

Fist I install OpenCV python module and I try using with Fedora 25.
I used python 2.7 version.

[root@localhost mythcat]# dnf install opencv-python.x86_64 
Last metadata expiration check: 0:21:12 ago on Sat Feb 25 23:26:59 2017.
Dependencies resolved.
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
opencv x86_64 3.1.0-8.fc25 fedora 1.8 M
opencv-python x86_64 3.1.0-8.fc25 fedora 376 k
python2-nose noarch 1.3.7-11.fc25 updates 266 k
python2-numpy x86_64 1:1.11.2-1.fc25 fedora 3.2 M

Transaction Summary
================================================================================
Install 4 Packages

Total download size: 5.6 M
Installed size: 29 M
Is this ok [y/N]: y
Downloading Packages:
(1/4): opencv-python-3.1.0-8.fc25.x86_64.rpm 855 kB/s | 376 kB 00:00
(2/4): opencv-3.1.0-8.fc25.x86_64.rpm 1.9 MB/s | 1.8 MB 00:00
(3/4): python2-nose-1.3.7-11.fc25.noarch.rpm 543 kB/s | 266 kB 00:00
(4/4): python2-numpy-1.11.2-1.fc25.x86_64.rpm 2.8 MB/s | 3.2 MB 00:01
--------------------------------------------------------------------------------
Total 1.8 MB/s | 5.6 MB 00:03
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Installing : python2-nose-1.3.7-11.fc25.noarch 1/4
Installing : python2-numpy-1:1.11.2-1.fc25.x86_64 2/4
Installing : opencv-3.1.0-8.fc25.x86_64 3/4
Installing : opencv-python-3.1.0-8.fc25.x86_64 4/4
Verifying : opencv-python-3.1.0-8.fc25.x86_64 1/4
Verifying : opencv-3.1.0-8.fc25.x86_64 2/4
Verifying : python2-numpy-1:1.11.2-1.fc25.x86_64 3/4
Verifying : python2-nose-1.3.7-11.fc25.noarch 4/4

Installed:
opencv.x86_64 3.1.0-8.fc25 opencv-python.x86_64 3.1.0-8.fc25
python2-nose.noarch 1.3.7-11.fc25 python2-numpy.x86_64 1:1.11.2-1.fc25

Complete!
[root@localhost mythcat]#
This is my test script with opencv to detect flow using Lucas-Kanade Optical Flow function.
This tracks some points in a black and white video.
First you need:
- one black and white video;
- not mp4 file type file;
- the color args need to be under 4 ( see is 3);
- I used this video:
I used cv2.goodFeaturesToTrack().
We take the first frame, detect some Shi-Tomasi corner points in it, then we iteratively track those points using Lucas-Kanade optical flow.
The function cv2.calcOpticalFlowPyrLK() we pass the previous frame, previous points and next frame.
The returns next points along with some status numbers which has a value of 1 if next point is found, else zero.
That iteratively pass these next points as previous points in next step.
See the code below:
import numpy as np
import cv2

cap = cv2.VideoCapture('candle')

# params for ShiTomasi corner detection
feature_params = dict( maxCorners = 77,
qualityLevel = 0.3,
minDistance = 7,
blockSize = 7 )

# Parameters for lucas kanade optical flow
lk_params = dict( winSize = (17,17),
maxLevel = 1,
criteria = (cv2.TERM_CRITERIA_EPS | cv2.TERM_CRITERIA_COUNT, 10, 0.03))

# Create some random colors
color = np.random.randint(0,255,(100,3))

# Take first frame and find corners in it
ret, old_frame = cap.read()
old_gray = cv2.cvtColor(old_frame, cv2.COLOR_BGR2GRAY)
p0 = cv2.goodFeaturesToTrack(old_gray, mask = None, **feature_params)

# Create a mask image for drawing purposes
mask = np.zeros_like(old_frame)

while(1):
ret,frame = cap.read()
frame_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

# calculate optical flow
p1, st, err = cv2.calcOpticalFlowPyrLK(old_gray, frame_gray, p0, None, **lk_params)

# Select good points
good_new = p1[st==1]
good_old = p0[st==1]

# draw the tracks
for i,(new,old) in enumerate(zip(good_new,good_old)):
a,b = new.ravel()
c,d = old.ravel()
mask = cv2.line(mask, (a,b),(c,d), color[i].tolist(), 2)
frame = cv2.circle(frame,(a,b),5,color[i].tolist(),-1)
img = cv2.add(frame,mask)

cv2.imshow('frame',img)
k = cv2.waitKey(30) & 0xff
if k == 27:
break

# Now update the previous frame and previous points
old_gray = frame_gray.copy()
p0 = good_new.reshape(-1,1,2)

cv2.destroyAllWindows()
cap.release()
The output of this file is:

February 25, 2017 03:04 PM


Jaime Buelta

Compendium of Wondrous Links vol XI

It has been a while since the last time. More food for though! Python Python 3 upgrade strategy. The time has come to take migrating to python 3 seriously. Another addition to Python-to-c++ compilers, in a similar way to Cython: Pythran. I tested it with code from my recent post $7.11 in four prices and the … Continue reading Compendium of Wondrous Links vol XI

February 25, 2017 01:22 PM


Coding Diet

unittest.mock small gotcha - a humbling tale of failure

Developing a small web application I recently had reason to upgrade from Python 3.4 to Python 3.6. The reason for the upgrade was regarding ordering of keyword arguments and not related to the bug in my test code that I then found. I should have been more careful writing my test code in the first place, so I am writing this down as some penance for not testing my tests robustly enough.

A Simple example program

So here I have a version of the problem reduced down to the minimum required to demonstrate the issue:

import unittest.mock as mock

class MyClass(object):
    def __init__(self):
        pass
    def my_method(self):
        pass

if __name__ == '__main__':
    with mock.patch('__main__.MyClass') as MockMyClass:
        MyClass().my_method()
        MockMyClass.my_method.assert_called_once()

Of course in reality the line MyClass().my_method() was some test code that indirectly caused the target method to be called.

Output in Python 3.4

$ python3.4 mock_example.py
$

No output, leading me to believe my assertions passed, so I was happy that my code and my tests were working. As it turned out, my code was fine but my test was faulty. Here's the output in two later versions of Python of the exact same program given above.

Output in Python 3.5

$ python3.5 mock_example.py
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    MockMyClass.my_method.assert_called_once()
  File "/usr/lib/python3.5/unittest/mock.py", line 583, in __getattr__
    raise AttributeError(name)
AttributeError: assert_called_once

Assertion error, test failing.

Output in Python 3.6

$ python3.6 mock_example.py
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    MockMyClass.my_method.assert_called_once()
  File "/usr/lib/python3.6/unittest/mock.py", line 795, in assert_called_once
    raise AssertionError(msg)
AssertionError: Expected 'my_method' to have been called once. Called 0 times.

Test also failing with a different error message. Anyone who is (pretty) familiar with the unittest.mock standard library module will know assert_called_once was introduced in version 3.6, which is my version 3.5 is failing with an attribute error.

My test was wrong

The problem was, my original test was not testing anything at all. The 3.4 version of the unittest.mock standard library module did not have a assert_called_once. The mock, just allows you to call any method on it, to see this you can try changing the line:

        MockMyClass.my_method.assert_called_once()

to

        MockMyClass.my_method.blahblah()

With python3.4, python3.5, and python3.6 this yields no error. So in the original program you can avoid the calling MyClass.my_method at all:

if __name__ == '__main__':
    with mock.patch('__main__.MyClass') as MockMyClass:
        # Missing call to `MyClass().my_method()`
        MockMyClass.my_method.assert_called_once() # In 3.4 this still passes.

This does not change the (original) results, python3.4 still raises no error, whereas python3.5 and python3.6 are raising the original errors.

So although my code turned out to be correct (at least in as much as the desired method was called), had it been faulty (or changed to be faulty) my test would not have complained.

The Actual Problem

My mock was wrong. I should instead have been patching the actual method within the class, like so:

if __name__ == '__main__':
    with mock.patch('__main__.MyClass.my_method') as mock_my_method:
        MyClass().my_method()
        mock_my_method.assert_called_once()

Now if we try this in all version 3.4, 3.5, and 3.6 of python we get:

$ python3.4 mock_example.py 
$ python3.5 mock_example.py 
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    mock_my_method.assert_called_once()
  File "/usr/lib/python3.5/unittest/mock.py", line 583, in __getattr__
    raise AttributeError(name)
AttributeError: assert_called_once
$ python3.6 mock_example.py 
$ 

So Python 3.4 and 3.6 pass as we expect. But Python3.5 gives an error stating that there is no assert_called_once method on the mock object, which is true since that method was not added until version 3.6. This is arguably what Python3.4 should have done.

It remains to check that the updated test fails in Python3.6, so we comment out the call to MyClass().my_method:

$ python3.6 mock_example.py 
Traceback (most recent call last):
  File "mock_example.py", line 12, in <module>
    mock_my_method.assert_called_once()
  File "/usr/lib/python3.6/unittest/mock.py", line 795, in assert_called_once
    raise AssertionError(msg)
AssertionError: Expected 'my_method' to have been called once. Called 0 times.

This is the test I should have performed with my original test. Had I done this I would have seen that the test passed in Python3.4 regardless of whether the method in question was actually called or not.

So now my test works in python3.6, fails in python3.5 because I'm using the method assert_called_once which was introduced in python3.6. Unfortunately it incorrectly passes in python3.4. So if I want my code to work properly for python versions earlier than 3.6, then I can essentially implement assert_called_once() with assert len(mock_my_method.mock_calls) == 1. If we do this then my test passes in all three version of python and fails in all three if we comment out the call MyClass().my_method().

Conclusions

I made an error in writing my original test, but my real sin was that I was a bit lazy in that I did not make sure that my tests would fail, when the code was incorrect. In this instance there was no problem with the code only the test, but that was luck. So for me, this served as a reminder to check that your tests can fail. It may be that mutation testing would have caught this error.

February 25, 2017 12:36 PM


Django Weekly

Django Weekly 27 - Advanced Django querying, Django Signals, Elasticbeanstalk and more

Worthy Read

Application of Django's Case, When queryset operators for sorting events by date.
orm
,
Query

We help companies like Airbnb, Pfizer, and Artsy find great developers. Let us find your next great hire. Get started today.
sponsor

Django Signals are extremely useful for decoupling modules. They allow a low-level Django app to send events for other apps to handle without creating a direct dependency. Signals are easy to set up, but harder to test. So In this article, I’m going to walk you through implementing a context manager for testing Django signals, step by step.
signals

Here are the release notes - https://docs.djangoproject.com/en/dev/releases/1.11/
release

Kenneth Love and Trey Hunner will answer your questions about how we use Django's forms.
forms

deployment

What to test and where to test is a common question I get asked. In this video learn about different types of tests you can write in the context of Django.
video

In case you missed the news, DjangoCon US 2017 will take place in beautiful Spokane, Washington, from August 13-18, 2017.
djangocon

In this tutorial I will cover a few strategies to create Django user sign up/registration. Usually I implement it from scratch. You will see it’s very straightforward.
signup

Run your django CMS project as a single-page application (SPA) with vue.js and vue-router.
vuejs

elastic beanstalk


Projects

drf-writable-nested - 11 Stars, 0 Fork
Writable nested model serializer for Django REST Framework.

Django reusable app that uses Celery Inspect command to monitor workers/tasks via the Django REST Framework

djeasy - 5 Stars, 3 Fork
Django simple quick setup

February 25, 2017 10:47 AM


Daniel Bader

Installing Python and Pip on Windows

Installing Python and Pip on Windows

In this tutorial you’ll learn how to set up Python and the Pip package manager on Windows 10, completely from scratch.

Step 1: Download the Python Installer

The best way to install Python on Windows is by downloading the official Python installer from the Python website at python.org.

To do so, open a browser and navigate to https://python.org/. After the page has finished loading, click Downloads.

Under Downloads → Download for Windows, click the “Python 3.X.X” (or “Python 2.X.X”) button to begin downloading the installer.

Sidebar: 64-bit Python vs 32-bit Python

If you’re wondering whether you should use a 32-bit or a 64-bit version of Python then you might want to go with the 32-bit version.

It’s sometimes still problematic to find binary extensions for 64-bit Python on Windows, which means that some third-party modules might not install correctly with a 64-bit version of Python.

My thinking is that it’s best to go with the version currently recommended on python.org. If you click the Python 3 or Python 2 button under “Download for Windows” you’ll get just that.

Remember that if you get this choice wrong and you’d like to switch to another version of Python you can just uninstall Python and then re-install it by downloading another installer from python.org.

Step 2: Run the Python Installer

Once the Python installer file has finished downloading, launch it by double-clicking on it in order to begin the installation.

Be sure to select the Add Python X.Y to PATH checkbox in the setup wizard.

Click Install Now to begin the installation process. The installation should finish quickly and then Python will be ready to go on your system. We’re going to make sure everything was set up correctly in the next step.

Step 3: Verify Python Was Installed Correctly

After the Python installer finished its work Python should be installed on your system. Let’s make sure everything went correctly by testing if Python can be accessed from the Windows Command Prompt:

  1. Open the Windows Command Prompt by launching cmd.exe
  2. Type pip and hit Return
  3. You should see the help text from Python’s “pip” package manager. If you get an error message running pip go through the Python install steps again to make sure you have a working Python installation. Most issues you will encounter here will have something to do with the PATH not being set correctly. Re-installing and making sure that the “Add Python to PATH” option is enabled in the installer should resolve this.

What Now?

Assuming everything went well and you saw the output from Pip in your command prompt window—Congratulations, you just installed Python on your system!

Wondering where to go from here? Click here to get some pointers for Python beginners.

February 25, 2017 12:00 AM

February 24, 2017


DataCamp

Matplotlib Cheat Sheet: Plotting in Python

Data visualization and storytelling with your data are essential skills that every data scientist needs to communicate insights gained from analyses effectively to any audience out there. 

For most beginners, the first package that they use to get in touch with data visualization and storytelling is, naturally, Matplotlib: it is a Python 2D plotting library that enables users to make publication-quality figures. But, what might be even more convincing is the fact that other packages, such as Pandas, intend to build more plotting integration with Matplotlib as time goes on.

However, what might slow down beginners is the fact that this package is pretty extensive. There is so much that you can do with it and it might be hard to still keep a structure when you're learning how to work with Matplotlib.   

DataCamp has created a Matplotlib cheat sheet for those who might already know how to use the package to their advantage to make beautiful plots in Python, but that still want to keep a one-page reference handy. Of course, for those who don't know how to work with Matplotlib, this might be the extra push be convinced and to finally get started with data visualization in Python. 

(By the way, if you want to get started with this Python package, you might want to consider our Matplotlib tutorial.)

You'll see that this cheat sheet presents you with the six basic steps that you can go through to make beautiful plots. 

Check out the infographic by clicking on the button below:

Python Matplotlib cheat sheet

With this handy reference, you'll familiarize yourself in no time with the basics of Matplotlib: you'll learn how you can prepare your data, create a new plot, use some basic plotting routines to your advantage, add customizations to your plots, and save, show and close the plots that you make.

What might have looked difficult before will definitely be more clear once you start using this cheat sheet! 

Also, don't miss out on our other cheat sheets for data science that cover SciPyNumpyScikit-LearnBokehPandas and the Python basics.

February 24, 2017 09:11 PM


Weekly Python StackOverflow Report

(lxii) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2017-02-24 19:53:04 GMT


  1. Why is x**4.0 faster than x**4 in Python 3? - [119/2]
  2. How to limit the size of a comprehension? - [20/4]
  3. How to properly split this list of strings? - [13/5]
  4. Why is the __dict__ of instances so small in Python 3? - [11/1]
  5. Python Asynchronous Comprehensions - how do they work? - [10/2]
  6. Immutability in Python - [10/2]
  7. Order-invariant hash in Python - [9/2]
  8. Assign a number to each unique value in a list - [8/7]
  9. How to add a shared x-label and y-label to a plot created with pandas' plot? - [8/3]
  10. What happens when I inherit from an instance instead of a class in Python? - [8/2]

February 24, 2017 07:53 PM


Will Kahn-Greene

Who uses my stuff?

Summary

I work on a lot of different things. Some are applications, are are libraries, some I started, some other people started, etc. I have way more stuff to do than I could possibly get done, so I try to spend my time on things "that matter".

For Open Source software that doesn't have an established community, this is difficult.

This post is a wandering stream of consciousness covering my journey figuring out who uses Bleach.

Read more… (4 mins to read)

February 24, 2017 04:00 PM


Rene Dudfield

setup.cfg - a solution to python config file soup? A howto guide.

Sick of config file soup cluttering up your repo? Me too. However there is a way to at least clean it up for many python tools.


Some of the tools you might use and the config files they support...
  • flake8 - .flake8, setup.cfg, tox.ini, and config/flake8 on Windows
  • pytest - pytest.ini, tox.ini, setup.cfg
  • coverage.py - .coveragerc, setup.cfg, tox.ini
  • mypy - setup.cfg, mypy.ini
  • tox - tox.ini
 Can mypy use setup.cfg as well?
OK, you've convinced me. -- Guido

With that mypy now also supports setup.cfg, and we can all remove many more config files.

The rules for precedence are easy:
  1. read --config-file option - if it's incorrect, exit
  2. read [tool].ini - if correct, stop
  3. read setup.cfg

 

How to config with setup.cfg?

Here's a list to the configuration documentation for setup.cfg.

What does a setup.cfg look like now?

Here's an example setup.cfg for you with various tools configured. (note these are nonsensical example configs, not what I suggest you use!)

## http://coverage.readthedocs.io/en/latest/config.html
#[coverage:run]
#timid = True

## http://pytest.org/latest/customize.html#adding-default-options
# [tool:pytest]
# addopts=-v --cov pygameweb pygameweb/ tests/

## http://mypy.readthedocs.io/en/latest/config_file.html
#[mypy]
#python_version = 2.7

#[flake8]
#max-line-length = 120
#max-complexity = 10
#exclude = build,dist,docs/conf.py,somepackage/migrations,*.egg-info

## Run with: pylint --rcfile=setup.cfg somepackage
#[pylint]
#disable = C0103,C0111
#ignore = migrations
#ignore-docstrings = yes
#output-format = colorized



February 24, 2017 02:01 PM


Bhishan Bhandari

Implementing Stack using List in Python – Python Programming Essentials

Intro Stack is a collection of objects inserted and removed in a last-in first-out fashion (LIFO). Objects can be inserted onto stack at any time but only the object inserted last can be accessed or removed which coins the object to be top of the stack. Realization of Stack Operations using List   Methods Realization […]

February 24, 2017 05:52 AM


Vasudev Ram

Perl-like "unless" (reverse if) feature in Python

By Vasudev Ram



Flowchart image attribution

I was mentally reviewing some topics to discuss for a Python training program I was running. Among the topics were statements, including the if statement. I recollected that some languages I knew of, such as Perl, have an unless statement, which is like a reverse if statement, in that only the first nested suite (of statements) is executed if the Boolean condition is false, whereas only the second nested suite is executed if the condition is true. See the examples below.

The term "suite" used above, follows the terminology used in Python documentation such as the Python Language Reference; see this if statement definition, for example.

That is, for the if statement:
if condition:
suite1 # nested suite 1
else:
suite2 # nested suite 2
results in suite1 being run if condition is true, and suite2 being run if condition is false, whereas, for the unless statement:
unless condition:
suite1
else:
suite2
, the reverse holds true.

Of course, there is no unless statement in Python. So I got the idea of simulating it, at least partially, with a function, just for fun and as an experiment. Here is the first version, in file unless.py:
# unless.py v1

# A simple program to partially simulate the unless statement
# (a sort of reverse if statement) available in languages like Perl.
# The simulation is done by a function, not by a statement.

# Author: Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Product store: https://gumroad.com

# Define the unless function.
def unless(cond, func_if_false, func_if_true):
if not cond:
func_if_false()
else:
func_if_true()

# Test it.
def foo(): print "this is foo"
def bar(): print "this is bar"

a = 1
# Read the call below as:
# Unless a % 2 == 0, call foo, else call bar
unless (a % 2 == 0, foo, bar)
# Results in foo() being called.

a = 2
# Read the call below as:
# Unless a % 2 == 0, call foo, else call bar
unless (a % 2 == 0, foo, bar)
# Results in bar() being called.
Here is the output:
$ python unless.py
this is foo
this is bar
This simulation of unless works because functions are objects in Python (since almost everything is an object in Python, like almost everything in Unix is a file), so functions can be passed to other functions as arguments (by passing just their names, without following the names with parentheses).

Then, inside the unless function, when you apply the parentheses to those two function names, they get called.

This approach to simulation of the unless statement has some limitations, of course. One is that you cannot pass arguments to the functions [1]. (You could still make them do different things on different calls by using global variables (not good), reading from files, or reading from a database, so that their inputs could vary on each call).

[1] You can actually pass arguments to the functions in a few ways, such as using the *args and **kwargs features of Python, as additional arguments to unless() and then forwarding those arguments to the func_if_false() and func_if_true() calls inside unless().

Another limitation is that this simulation does not support the elif clause.

However, none of the above limitations matter, of course, because you can also get the effect of the unless statement (i.e. a reverse if) by just negating the Boolean condition (with the not operator) of an if statement. As I said, I just tried this for fun.

The image at the top of the post is of a flowchart.

For something on similar lines (i.e. simulating a language feature with some other code), but for the C switch statement simulated (partially) in Python, see this post I wrote a few months ago:

Simulating the C switch statement in Python

And speaking of Python language features, etc., here is a podcast interview with Guido van Rossum (creator of the Python language), about the past, present and future of Python.


Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates (via Gumroad) on my forthcoming apps and content.

Jump to posts: Python * DLang * xtopdf

Subscribe to my blog by email

My ActiveState Code recipes

Follow me on: LinkedIn * Twitter

Managed WordPress Hosting by FlyWheel



February 24, 2017 04:33 AM

February 23, 2017


Catalin George Festila

The bad and good urllib.

This is a simple python script:

import urllib
opener = urllib.FancyURLopener({})
f = opener.open("http://www.ra___aer.ro/")
d=f.read()
fo = open('workfile.txt', 'w')
fo.write(d)
fo.close()
The really bad news come from here:
http://blog.blindspotsecurity.com/2017/02/advisory-javapython-ftp-injections.html

February 23, 2017 03:11 PM