skip to navigation
skip to content

Planet Python

Last update: April 20, 2019 07:47 AM UTC

April 19, 2019


Doug Hellmann

imapautofiler 1.8.0

imapautofiler applies user-defined rules to automatically organize messages on an IMAP server. What’s new in 1.8.0? use yaml safe loader drop python 3.5 and add 3.7 support perform substring matches without regard to case

April 19, 2019 06:47 PM UTC


Python Bytes

#126 WebAssembly comes to Python

April 19, 2019 08:00 AM UTC


Low Kian Seong

The Human in Devops

What was significant this week ?

This week a mild epiphany came to me right after a somewhat heated and tense meeting with a team of developers plus project owner of a web project. They were angry and they were not afraid to show it. They were somewhat miffed about the fact that the head wrote them an email pretty much forcing them to participate to make our DevOps initiative a success. All kinds of expletive words were running through my head in relation to describing this team of flabby, tired looking individuals in front of me, which belied the cool demeanour and composure that I was trying so hard to maintain.

It happened. In the spur of the moment I too got engulfed in a sea of negativity and for a few minutes lost site of what is the most important component or pillar in a successful DevOps initiative. The people. 

"What a bunch of mule heads !" I thought. It's as plain as day, once this initiative is a success everybody can go home earlier and everything will be more predictable and we can do much much more than we could before. "Why are you fighting this ?!" I was ready to throw my hands up in defeat when it finally dawned on me.

"Codes that power DevOps projects don't write themselves. People write those code" 
"Without people powering our initiative now, we are just a few guys with a bunch of code and tools that are irrelevant"

Boom! These thoughts hit me like lightning and in that moment I felt and equal measure of wisdom brought by this realisation as well as disgust at my stupidity of forgetting one of the main tenants and requirements to make the dream of a successful DevOps project a success.

It was then I realised 2 very important mistakes I had made so far:


  1. I was reaching out horizontally to push our agenda across. Developers loved what we proposed and that was pretty much it. It's cool and it's cutting edge. It stopped there. "Hey thanks for sharing that cool tool ! I will try it in my project when I get the chance!" is pretty much the maximum you can expect to get from such an exchange. For you to gain any traction, you have got to sell your proposed solution or improvement to the stakeholders or the decision makers. Efforts that usually require people to do the right thing or go out of their way to do some unplanned kindness or rightness usually results in zilch. 
  2. I did not try to see the tool that I was proposing from the eyes of the beholders. It was too much of a leap. Much like how Abraham it's impossible for you to frog leap from sadness to happiness, so it was how the developers felt. They knew it was good for them, they can see it was good for them, they felt it could have the potential to improve their lives but alas they did not internalise it. The proverbial light bulb did not turn on inside of them, more correctly said, I did not do enough to turn that light on. I could see some people opening up, but when this realisation hit me, I just ended the meeting. I have not done enough of understanding where these people that I hoped to implement DevOps were. I had to do that first. 

Do I miss coding ? Do I miss hunkering down and prototyping my way to showcase a tool or to get something to work ? Of course! Who wouldn't but main thing I keep on going back to is ... what is the main goal and expectation of the people who hired me to lead their DevOps push ? Is it to wire together some tools and configure something so they can use it ? At small enough scale probably that is enough of value, but when you want the horses you lead to the puddle to drink you need to give them a reason and just because you are drinking, you can't expect them to follow suit. 

I am going to reach out more, I am going to understand more and I am going to engage more. All the people pieces needs to be in place before the pieces start falling automatically. Stay tuned if this is interesting ... 


April 19, 2019 07:38 AM UTC


Codementor

Why Django Is The Popular Python Framework Among Web Developers?

The lot of advantages of web development using python Django framework can be easily accessed in small project, better security, less effort and less investment money into a projects.

April 19, 2019 06:27 AM UTC


Vasudev Ram

Python's dynamic nature: sticking an attribute onto an object


- By Vasudev Ram - Online Python training / SQL training / Linux training



Hi, readers,

[This is a beginner-level Python post.]

Python, being a dynamic language, has some interesting features that some static languages may not have (and vice versa too, of course).

One such feature, which I noticed a while ago, is that you can add an attribute to a Python object even after it has been created. (Conditions apply.)

I had used this feature some time ago to work around some implementation issue in a rudimentary RESTful server that I created as a small teaching project. It was based on the BaseHTTPServer module.

Here is a (different) simple example program, stick_attrs_onto_obj.py, that demonstrates this Python feature.
My informal term for this feature is "sticking an attribute onto an object" after the object is created.

Since the program is simple, and there are enough comments in the code, I will not explain it in detail.
# stick_attrs_onto_obj.py

# A program to show:
# 1) that you can "stick" attributes onto a Python object after it is created, and
# 2) one use of this technique, to count the number# of calls to a function.

# Copyright 2019 Vasudev Ram
# Web site: https://vasudevram.github.io
# Blog: https://jugad2.blogspot.com
# Training: https://jugad2.blogspot.com/p/training.html
# Product store: https://gumroad.com/vasudevram
# Twitter: https://twitter.com/vasudevram

from __future__ import print_function

# Define a function.
def foo(arg):
# Print something to show that the function has been called.
print("in foo: arg = {}".format(arg))
# Increment the "stuck-on" int attribute inside the function.
foo.call_count += 1

# A function is also an object in Python.
# So we can add attributes to it, including after it is defined.
# I call this "sticking" an attribute onto the function object.
# The statement below defines the attribute with an initial value,
# which is changeable later, as we will see.
foo.call_count = 0

# Print its initial value before any calls to the function.
print("foo.call_count = {}".format(foo.call_count))

# Call the function a few times.
for i in range(5):
foo(i)

# Print the attribute's value after those calls.
print("foo.call_count = {}".format(foo.call_count))

# Call the function a few more times.
for i in range(3):
foo(i)

# Print the attribute's value after those additional calls.
print("foo.call_count = {}".format(foo.call_count))

And here is the output of the program:
$ python stick_attrs_onto_obj.py
foo.call_count = 0
in foo: arg = 0
in foo: arg = 1
in foo: arg = 2
in foo: arg = 3
in foo: arg = 4
foo.call_count = 5
in foo: arg = 0
in foo: arg = 1
in foo: arg = 2
foo.call_count = 8

There may be other ways to get the call count of a function, including using a profiler, and maybe by using a closure or decorator or other way. But this way is really simple. And as you can see from the code, it is also possible to use it to find the number of calls to the function, between any two points in the program code. For that, we just have to store the call count in a variable at the first point, and subtract that value from the call count at the second point. In the above program, that would be 8 - 5 = 3, which matches the 3 that is the number of calls to function foo made by the 2nd for loop.

Enjoy.

- Vasudev Ram - Online Python training and consulting

I conduct online courses on Python programming, Unix / Linux commands and shell scripting and SQL programming and database design, with course material and personal coaching sessions.

The course details and testimonials are here.

Contact me for details of course content, terms and schedule.

Try FreshBooks: Create and send professional looking invoices in less than 30 seconds.

Getting a new web site or blog, and want to help preserve the environment at the same time? Check out GreenGeeks.com web hosting.

Sell your digital products via DPD: Digital Publishing for Ebooks and Downloads.

Learning Linux? Hit the ground running with my vi quickstart tutorial. I wrote it at the request of two Windows system administrator friends who were given additional charge of some Unix systems. They later told me that it helped them to quickly start using vi to edit text files on Unix. Of course, vi/vim is one of the most ubiquitous text editors around, and works on most other common operating systems and on some uncommon ones too, so the knowledge of how to use it will carry over to those systems too.

Check out WP Engine, powerful WordPress hosting.

Creating online products for sale? Check out ConvertKit, email marketing for online creators.

Teachable: feature-packed course creation platform, with unlimited video, courses and students.

Posts about: Python * DLang * xtopdf

My ActiveState Code recipes

Follow me on:


April 19, 2019 01:48 AM UTC

April 18, 2019


PyCharm

PyCharm at PyCon 2019: The Big Tent

Last week we announced our “big tent” at PyCon 2019 with the blog post PyCharm Hosts Python Content Creators at Expanded PyCon Booth. Next week we’ll announce more on each individual piece.

Today, let’s do an overview of the kinds of activities in “the big tent.”

Workshops

Miguel Grinberg, one of the boothmates, is doing his First Steps in Web Development With Python tutorial Thursday morning, 9AM to 12:20. He’s fantastic at this and a real icon of PyCon tutorials over the years.

Thursday afternoon at 3:30 I’m doing 42 PyCharm Tips and Tricks in Room 13. It’s a hands-on workshop with a secret twist which I’ll reveal at the event (and after.) We’ll have some of the PyCharm team with me to help folks in the audience with questions.

Reception

PyCon’s opening reception starts at 5:30PM on the show floor. It’s got food, it’s got drinks, it’s got…our packed booth with lots of stuff going on. Come meet ten of us from the PyCharm team, along with the Content Creators: Michael Kennedy, Brian Okken, Dan Bader, Miguel Grinberg, Matt Harrison, Anthony Shaw, Luciano Ramalho, Bob Belderbos, Julian Sequeira, and Chris Medina. Perhaps even a FLUFL sighting.

Some activities in the mini-theater:

PyCharm Stand

Come meet the PyCharm team! We’ll have ten of us, most from the core team. We go to events not to do sales but to listen. (Some might say, face the consequences of our decisions.) Want to talk to the main developer of our debugger? She’s there. Ditto for the new Jupyter support, vim emulation, etc.

Or if you just want to say hi, then please come by, take a picture and tweet it, and get a retweet from us.

Content Creators Stands

Podcasts, articles, video courses and training, books…as the previous article mentioned, we have a home for many of the key Python “content creators” to share a presence, use the mini-theater and one-on-one space, and just hang out and have fun.

There are two stands for them to share in timeslots throughout the conference. We’ll make the schedule available closer to PyCon. But they’ll all be around for the reception.

Mini-Theater

This is the second big addition this year: booth space for small talks, both scheduled and impromptu, by the PyCharm team, the Content Creators, and even by some others. We’ll announce this in detail later.

Not just talks…we’ll announce some special events as well.

One-on-Ones

“Can you take a look at my project?” We get this a lot at conferences, as well as “I’m really interested in the new Jupyter support”, or “I heard your pytest support is really neat, can you show me?”

The PyCharm booth will have a dedicated area along with the conference miracle of seating, where we can work one-on-one. Bring your laptop “into the shop” for diagnosis. Show us some big idea you’ve been working on. Get a tour of some PyCharm feature that interests you, from the person that implemented it.

This also applies to the Content Creators as well. Saw an article or listened to a podcast and want more? Pick a time to meet up with them in the one-on-one area. Did I mention seating?

Videography

We have a crew hanging around different times at the booth, doing interviews and producing clips. If you’re around and want to give a shoutout to PyCon for the hard (volunteer!) work putting on a great show, let’s get you on camera.

April 18, 2019 08:08 PM UTC


Mike Driscoll

Mozilla Announces Pyodide – Python in the Browser

Mozilla announced a new project called Pyodide earlier this week. The aim of Pyodide is to bring Python’s scientific stack into the browser.

The Pyodide project will give you a full, standard Python interpreter that runs in your browser and also give you access to the browsers Web APIs. Currently, Pyodide does not support threading or networking sockets. Python is also quite a bit slower to run in the browser, although it is usable for interactive exploration.

The article mentions other projects, such as Brython and Skulpt. These projects are rewrites of Python’s interpreter in Javascript. Their disadvantage to Pyodide is that they cannot use Python extensions that were written in C, such as Numpy or Pandas. Pyodide overcomes this issue.

Anyway, this sounds like a really interesting project. I always thought the demos I used to see of Python running in Silverlight in the browser were cool. That project is basically dead at this point, but Pyodide sounds like a really interesting new hack at getting Python into the browser. Hopefully it will go somewhere.

April 18, 2019 08:02 PM UTC

Creating a GUI Application for NASA’s API with wxPython

Growing up, I have always found the universe and space in general to be exciting. It is fun to dream about what worlds remain unexplored. I also enjoy seeing photos from other worlds or thinking about the vastness of space. What does this have to do with Python though? Well, the National Aeronautics and Space Administration (NASA) has a web API that allows you to search their image library.

You can read all about it on their website.

The NASA website recommends getting an Application Programming Interface (API) key. If you go to that website, the form that you will fill out is nice and short.

Technically, you do not need an API key to make requests against NASA’s services. However they do have rate limiting in place for developers who access their site without an API key. Even with a key, you are limited to a default of 1000 requests per hour. If you go over your allocation, you will be temporarily blocked from making requests. You can contact NASA to request a higher rate limit though.

Interestingly, the documentation doesn’t really say how many requests you can make without an API key.

The API documentation disagrees with NASA’s Image API documentation about which endpoints to hit, which makes working with their website a bit confusing.

For example, you will see the API documentation talking about this URL:

  • https://api.nasa.gov/planetary/apod?api_key=API_KEY_GOES_HERE

But in the Image API documentation, the API root is:

  • https://images-api.nasa.gov

For the purposes of this tutorial, you will be using the latter.

This article is adapted from my book:

Creating GUI Applications with wxPython

Purchase now on Leanpub


Using NASA’s API

When you start out using an unfamiliar API, it is always best to begin by reading the documentation for that interface. Another approach would be to do a quick Internet search and see if there is a Python package that wraps your target API. Unfortunately, there does not seem to be any maintained NASA libraries for Python. When this happens, you get to create your own.

To get started, try reading the NASA Images API document.

Their API documentation isn’t very long, so it shouldn’t take you very long to read or at least skim it.

The next step is to take that information and try playing around with their API.

Here are the first few lines of an experiment at accessing their API:

# simple_api_request.py
 
import requests
 
from urllib.parse import urlencode, quote_plus
 
 
base_url = 'https://images-api.nasa.gov/search'
search_term = 'apollo 11'
desc = 'moon landing'
media = 'image'
query = {'q': search_term, 'description': desc, 'media_type': media}
full_url = base_url + '?' + urlencode(query, quote_via=quote_plus)
 
r = requests.get(full_url)
data = r.json()

If you run this in a debugger, you can print out the JSON that is returned.

Here is a snippet of what was returned:

'items': [{'data': 
              [{'center': 'HQ',
                 'date_created': '2009-07-18T00:00:00Z',
                 'description': 'On the eve of the '
                                'fortieth anniversary of '
                                "Apollo 11's first human "
                                'landing on the Moon, '
                                'Apollo 11 crew member, '
                                'Buzz Aldrin speaks during '
                                'a lecture in honor of '
                                'Apollo 11 at the National '
                                'Air and Space Museum in '
                                'Washington, Sunday, July '
                                '19, 2009. Guest speakers '
                                'included Former NASA '
                                'Astronaut and U.S. '
                                'Senator John Glenn, NASA '
                                'Mission Control creator '
                                'and former NASA Johnson '
                                'Space Center director '
                                'Chris Kraft and the crew '
                                'of Apollo 11.  Photo '
                                'Credit: (NASA/Bill '
                                'Ingalls)',
                 'keywords': ['Apollo 11',
                              'Apollo 40th Anniversary',
                              'Buzz Aldrin',
                              'National Air and Space '
                              'Museum (NASM)',
                              'Washington, DC'],
                 'location': 'National Air and Space '
                             'Museum',
                 'media_type': 'image',
                 'nasa_id': '200907190008HQ',
                 'photographer': 'NASA/Bill Ingalls',
                 'title': 'Glenn Lecture With Crew of '
                          'Apollo 11'}],
       'href': 'https://images-assets.nasa.gov/image/200907190008HQ/collection.json',
       'links': [{'href': 'https://images-assets.nasa.gov/image/200907190008HQ/200907190008HQ~thumb.jpg',
                  'rel': 'preview',
                  'render': 'image'}]}

Now that you know what the format of the JSON is, you can try parsing it a bit.

Let’s add the following lines of code to your Python script:

item = data['collection']['items'][0]
nasa_id = item['data'][0]['nasa_id']
asset_url = 'https://images-api.nasa.gov/asset/' + nasa_id
image_request = requests.get(asset_url)
image_json = image_request.json()
image_urls = [url['href'] for url in image_json['collection']['items']]
print(image_urls)

This will extract the first item in the list of items from the JSON response. Then you can extract the nasa_id, which is required to get all the images associated with this particular result. Now you can add that nasa_id to a new URL end point and make a new request.

The request for the image JSON returns this:

{'collection': {'href': 'https://images-api.nasa.gov/asset/200907190008HQ',
                'items': [{'href': 'http://images-assets.nasa.gov/image/200907190008HQ/200907190008HQ~orig.tif'},
                          {'href': 'http://images-assets.nasa.gov/image/200907190008HQ/200907190008HQ~large.jpg'},
                          {'href': 'http://images-assets.nasa.gov/image/200907190008HQ/200907190008HQ~medium.jpg'},
                          {'href': 'http://images-assets.nasa.gov/image/200907190008HQ/200907190008HQ~small.jpg'},
                          {'href': 'http://images-assets.nasa.gov/image/200907190008HQ/200907190008HQ~thumb.jpg'},
                          {'href': 'http://images-assets.nasa.gov/image/200907190008HQ/metadata.json'}],
                'version': '1.0'}}

The last two lines in your Python code will extract the URLs from the JSON. Now you have all the pieces you need to write a basic user interface!


Designing the User Interface

There are many different ways you could design your image downloading application. You will be doing what is simplest as that is almost always the quickest way to create a prototype. The nice thing about prototyping is that you end up with all the pieces you will need to create a useful application. Then you can take your knowledge and either enhance the prototype or create something new with the knowledge you have gained.

Here’s a mockup of what you will be attempting to create:

NASA Image Search Mockup

As you can see, you will want an application with the following features:

  • A search bar
  • A widget to hold the search results
  • A way to display an image when a result is chosen
  • The ability to download the image

Let’s learn how to create this user interface now!


Creating the NASA Search Application

Rapid prototyping is an idea in which you will create a small, runnable application as quickly as you can. Rather than spending a lot of time getting all the widgets laid out, let’s add them from top to bottom in the application. This will give you something to work with more quickly than creating a series of nested sizers will.

Let’s start by creating a script called nasa_search_ui.py:

# nasa_search_ui.py
 
import os
import requests
import wx
 
from download_dialog import DownloadDialog
from ObjectListView import ObjectListView, ColumnDefn
from urllib.parse import urlencode, quote_plus

Here you import a few new items that you haven’t seen as of yet. The first is the requests package. This is a handy package for downloading files and doing things on the Internet with Python. Many developers feel that it is better than Python’s own urllib. You will need to install it to use it though. You will also need to instal ObjectListView.

Here is how you can do that with pip:

pip install requests ObjectListView

The other piece that is new are the imports from urllib.parse. You will be using this module for encoding URL parameters. Lastly, the DownloadDialog is a class for a small dialog that you will be creating for downloading NASA images.

Since you will be using ObjectListView in this application, you will need a class to represent the objects in that widget:

class Result:
 
    def __init__(self, item):
        data = item['data'][0]
        self.title = data['title']
        self.location = data.get('location', '')
        self.nasa_id = data['nasa_id']
        self.description = data['description']
        self.photographer = data.get('photographer', '')
        self.date_created = data['date_created']
        self.item = item
 
        if item.get('links'):
            try:
                self.thumbnail = item['links'][0]['href']
            except:
                self.thumbnail = ''

The Result class is what you will be using to hold that data that makes up each row in your ObjectListView. The item parameter is a portion of JSON that you are receiving from NASA as a response to your query. In this class, you will need to parse out the information you require.

In this case, you want the following fields:

  • Title
  • Location of image
  • NASA’s internal ID
  • Description of the photo
  • The photographer’s name
  • The date the image was created
  • The thumbnail URL

Some of these items aren’t always included in the JSON response, so you will use the dictionary’s get() method to return an empty string in those cases.

Now let’s start working on the UI:

class MainPanel(wx.Panel):
 
    def __init__(self, parent):
        super().__init__(parent)
        self.search_results = []
        self.max_size = 300
        self.paths = wx.StandardPaths.Get()
        font = wx.Font(12, wx.SWISS, wx.NORMAL, wx.NORMAL)
 
        main_sizer = wx.BoxSizer(wx.VERTICAL)

The MainPanel is where the bulk of your code will be. Here you do some housekeeping and create a search_results to hold a list of Result objects when the user does a search. You also set the max_size of the thumbnail image, the font to be used, the sizer and you get some StandardPaths as well.

Now let’s add the following code to the __init__():

txt = 'Search for images on NASA'
label = wx.StaticText(self, label=txt)
main_sizer.Add(label, 0, wx.ALL, 5)
self.search = wx.SearchCtrl(
    self, style=wx.TE_PROCESS_ENTER, size=(-1, 25))
self.search.Bind(wx.EVT_SEARCHCTRL_SEARCH_BTN, self.on_search)
self.search.Bind(wx.EVT_TEXT_ENTER, self.on_search)
main_sizer.Add(self.search, 0, wx.EXPAND)

Here you create a header label for the application using wx.StaticText. Then you add a wx.SearchCtrl, which is very similar to a wx.TextCtrl except that it has special buttons built into it. You also bind the search button’s click event (EVT_SEARCHCTRL_SEARCH_BTN) and EVT_TEXT_ENTER to a search related event handler (on_search).

The next few lines add the search results widget:

self.search_results_olv = ObjectListView(
    self, style=wx.LC_REPORT | wx.SUNKEN_BORDER)
self.search_results_olv.SetEmptyListMsg("No Results Found")
self.search_results_olv.Bind(wx.EVT_LIST_ITEM_SELECTED,
                             self.on_selection)
main_sizer.Add(self.search_results_olv, 1, wx.EXPAND)
self.update_search_results()

This code sets up the ObjectListView in much the same way as some of my other articles use it. You customize the empty message by calling SetEmptyListMsg() and you also bind the widget to EVT_LIST_ITEM_SELECTED so that you do something when the user selects a search result.

Now let’s add the rest of the code to the __init__() method:

main_sizer.AddSpacer(30)
self.title = wx.TextCtrl(self, style=wx.TE_READONLY)
self.title.SetFont(font)
main_sizer.Add(self.title, 0, wx.ALL|wx.EXPAND, 5)
img = wx.Image(240, 240)
self.image_ctrl = wx.StaticBitmap(self,
                                  bitmap=wx.Bitmap(img))
main_sizer.Add(self.image_ctrl, 0, wx.CENTER|wx.ALL, 5
               )
download_btn = wx.Button(self, label='Download Image')
download_btn.Bind(wx.EVT_BUTTON, self.on_download)
main_sizer.Add(download_btn, 0, wx.ALL|wx.CENTER, 5)
 
self.SetSizer(main_sizer)

These final few lines of code add a title text control and an image widget that will update when a result is selected. You also add a download button to allow the user to select which image size they would like to download. NASA usually gives several different versions of the image from thumbnail all the way up to the original TIFF image.

The first event handler to look at is on_download():

def on_download(self, event):
    selection = self.search_results_olv.GetSelectedObject()
    if selection:
        with DownloadDialog(selection) as dlg:
            dlg.ShowModal()

Here you call GetSelectedObject() to get the user’s selection. If the user hasn’t selected anything, then this method exits. On the other hand, if the user has selected an item, then you instantiate the DownloadDialog and show it to the user to allow them to download something.

Now let’s learn how to do a search:

def on_search(self, event):
    search_term = event.GetString()
    if search_term:
        query = {'q': search_term, 'media_type': 'image'}
        full_url = base_url + '?' + urlencode(query, quote_via=quote_plus)
        r = requests.get(full_url)
        data = r.json()
        self.search_results = []
        for item in data['collection']['items']:
            if item.get('data') and len(item.get('data')) > 0:
                data = item['data'][0]
                if data['title'].strip() == '':
                    # Skip results with blank titles
                    continue
                result = Result(item)
                self.search_results.append(result)
        self.update_search_results()

The on_search() event handler will get the string that the user has entered into the search control or return an empty string. Assuming that the user actually enters something to search for, you use NASA’s general search query, q and hard code the media_type to image. Then you encode the query into a properly formatted URL and use requests.get() to request a JSON response.

Next you attempt to loop over the results of the search. Note that is no data is returned, this code will fail and cause an exception to be thrown. But if you do get data, then you will need to parse it to get the bits and pieces you need.

You will skip items that don’t have the title field set. Otherwise you will create a Result object and add it to the search_results list. At the end of the method, you tell your UI to update the search results.

Before we get to that function, you will need to create on_selection():

def on_selection(self, event):
    selection = self.search_results_olv.GetSelectedObject()
    self.title.SetValue(f'{selection.title}')
    if selection.thumbnail:
        self.update_image(selection.thumbnail)
    else:
        img = wx.Image(240, 240)
        self.image_ctrl.SetBitmap(wx.Bitmap(img))
        self.Refresh()
        self.Layout()

Once again, you get the selected item, but this time you take that selection and update the title text control with the selection’s title text. Then you check to see if there is a thumbnail and update that accordingly if there is one. When there is no thumbnail, you set it back to an empty image as you do not want it to keep showing a previously selected image.

The next method to create is update_image():

def update_image(self, url):
    filename = url.split('/')[-1]
    tmp_location = os.path.join(self.paths.GetTempDir(), filename)
    r = requests.get(url)
    with open(tmp_location, "wb") as thumbnail:
        thumbnail.write(r.content)
 
    if os.path.exists(tmp_location):
        img = wx.Image(tmp_location, wx.BITMAP_TYPE_ANY)
        W = img.GetWidth()
        H = img.GetHeight()
        if W > H:
            NewW = self.max_size
            NewH = self.max_size * H / W
        else:
            NewH = self.max_size
            NewW = self.max_size * W / H
        img = img.Scale(NewW,NewH)
    else:
        img = wx.Image(240, 240)
 
    self.image_ctrl.SetBitmap(wx.Bitmap(img))
    self.Refresh()
    self.Layout()

The update_image() accepts a URL as its sole argument. It takes this URL and splits off the filename. Then it creates a new download location, which is the computer’s temp directory. Your code then downloads the image and checks to be sure the file saved correctly. If it did, then the thumbnail is loaded using the max_size that you set; otherwise you set it to use a blank image.

The last couple of lines Refresh() and Layout() the panel so that the widget appear correctly.

Finally you need to create the last method:

def update_search_results(self):
    self.search_results_olv.SetColumns([
        ColumnDefn("Title", "left", 250, "title"),
        ColumnDefn("Description", "left", 350, "description"),
        ColumnDefn("Photographer", "left", 100, "photographer"),
        ColumnDefn("Date Created", "left", 150, "date_created")
    ])
    self.search_results_olv.SetObjects(self.search_results)

Here you create the frame, set the title and initial size and add the panel. Then you show the frame.

This is what the main UI will look like:

NASA Image Search Main App

Now let’s learn what goes into making a download dialog!


The Download Dialog

The download dialog will allow the user to download one or more of the images that they have selected. There are almost always at least two versions of every image and sometimes five or six.

The first piece of code to learn about is the first few lines:

# download_dialog.py
 
import requests
import wx
 
wildcard = "All files (*.*)|*.*"

Here you once again import requests and set up a wildcard that you will use when saving the images.

Now let’s create the dialog’s __init__():

class DownloadDialog(wx.Dialog):
 
    def __init__(self, selection):
        super().__init__(None, title='Download images')
        self.paths = wx.StandardPaths.Get()
        main_sizer = wx.BoxSizer(wx.VERTICAL)
        self.list_box = wx.ListBox(self, choices=[], size=wx.DefaultSize)
        urls = self.get_image_urls(selection)
        if urls:
            choices = {url.split('/')[-1]: url for url in urls if 'jpg' in url}
            for choice in choices:
                self.list_box.Append(choice, choices[choice])
        main_sizer.Add(self.list_box, 1, wx.EXPAND|wx.ALL, 5)
 
        save_btn = wx.Button(self, label='Save')
        save_btn.Bind(wx.EVT_BUTTON, self.on_save)
        main_sizer.Add(save_btn, 0, wx.ALL|wx.CENTER, 5)
        self.SetSizer(main_sizer)

In this example, you create a new reference to StandardPaths and add a wx.ListBox. The list box will hold the variants of the photos that you can download. It will also automatically add a scrollbar should there be too many results to fit on-screen at once. You call get_image_urls with the passed in selection object to get a list of urls. Then you loop over the urls and extract the ones that have jpg in their name. This does result in you missing out on alternate image files types, such as PNG or TIFF.

This gives you an opportunity to enhance this code and improve it. The reason that you are filtering the URLs is that the results usually have non-image URLs in the mix and you probably don’t want to show those as potentially downloadable as that would be confusing to the user.

The last widget to be added is the “Save” button. You could add a “Cancel” button as well, but the dialog has an exit button along the top that works, so it’s not required.

Now it’s time to learn what get_image_urls() does:

def get_image_urls(self, item):
    asset_url = f'https://images-api.nasa.gov/asset/{item.nasa_id}'
    image_request = requests.get(asset_url)
    image_json = image_request.json()
    try:
        image_urls = [url['href'] for url in image_json['collection']['items']]
    except:
        image_urls = []
    return image_urls

This event handler is activated when the user presses the “Save” button. When the user tries to save something without selecting an item in the list box, it will return -1. Should that happen, you show them a MessageDialog to tell them that they might want to select something. When they do select something, you will show them a wx.FileDialog that allows them to choose where to save the file and what to call it.

The event handler calls the save() method, so that is your next project:

def save(self, path):
    selection = self.list_box.GetSelection()
    r = requests.get(
        self.list_box.GetClientData(selection))
    try:
        with open(path, "wb") as image:
            image.write(r.content)
 
        message = 'File saved successfully'
        with wx.MessageDialog(None, message=message,
                              caption='Save Successful',
                              style=wx.ICON_INFORMATION) as dlg:
            dlg.ShowModal()
    except:
        message = 'File failed to save!'
        with wx.MessageDialog(None, message=message,
                              caption='Save Failed',
                              style=wx.ICON_ERROR) as dlg:
            dlg.ShowModal()

Here you get the selection again and use the requests package to download the image. Note that there is no check to make sure that the user has added an extension, let along the right extension. You can add that yourself when you get a chance.

Anyway, when the file is finished downloading, you will show the user a message letting them know.

If an exception occurs, you can show them a dialog that let’s them know that too!

Here is what the download dialog looks like:

NASA Image Download Dialog

Now let’s add some new functionality!


Adding Advanced Search

There are several fields that you can use to help narrow your search. However you don’t want to clutter your user interface with them unless the user really wants to use those filters. To allow for that, you can add an “Advanced Search” option.

Adding this option requires you to rearrange your code a bit, so let’s copy your nasa_search_ui.py file and your download_dialog.py module to a new folder called version_2.

Now rename nasa_search_ui.py to main.py to make it more obvious which script is the main entry point for your program. To make things more modular, you will be extracting your search results into its own class and have the advanced search in a separate class. This means that you will have three panels in the end:

  • The main panel
  • The search results panel
  • The advanced search panel

Here is what the main dialog will look like when you are finished:

NASA Image Search with Advanced Search Option

Let’s go over each of these separately.


The main.py Script

The main module is your primary entry point for your application. An entry point is the code that your user will run to launch your application. It is also the script that you would use if you were to bundle up your application into an executable.

Let’s take a look at how your main module starts out:

# main.py
 
import wx
 
from advanced_search import RegularSearch
from regular_search import SearchResults
from pubsub import pub
 
 
class MainPanel(wx.Panel):
 
    def __init__(self, parent):
        super().__init__(parent)
        pub.subscribe(self.update_ui, 'update_ui')
 
        self.main_sizer = wx.BoxSizer(wx.VERTICAL)
        search_sizer = wx.BoxSizer()

This example imports both of your search-related panels:

  • AdvancedSearch
  • RegularSearch

It also uses pubsub to subscribe to an update topic.

Let’s find out what else is in the __init__():

txt = 'Search for images on NASA'
label = wx.StaticText(self, label=txt)
self.main_sizer.Add(label, 0, wx.ALL, 5)
self.search = wx.SearchCtrl(
    self, style=wx.TE_PROCESS_ENTER, size=(-1, 25))
self.search.Bind(wx.EVT_SEARCHCTRL_SEARCH_BTN, self.on_search)
self.search.Bind(wx.EVT_TEXT_ENTER, self.on_search)
search_sizer.Add(self.search, 1, wx.EXPAND)
 
self.advanced_search_btn = wx.Button(self, label='Advanced Search',
                            size=(-1, 25))
self.advanced_search_btn.Bind(wx.EVT_BUTTON, self.on_advanced_search)
search_sizer.Add(self.advanced_search_btn, 0, wx.ALL, 5)
self.main_sizer.Add(search_sizer, 0, wx.EXPAND)

Here you add the title for the page along with the search control widget as you did before. You also add the new Advanced Search button and use a new sizer to contain the search widget and the button. You then add that sizer to your main sizer.

Now let’s add the panels:

self.search_panel = RegularSearch(self)
self.advanced_search_panel = AdvancedSearch(self)
self.advanced_search_panel.Hide()
self.main_sizer.Add(self.search_panel, 1, wx.EXPAND)
self.main_sizer.Add(self.advanced_search_panel, 1, wx.EXPAND)

In this example, you instantiate the RegularSearch and the AdvancedSearch panels. Since the RegularSearch is the default, you hide the AdvancedSearch from the user on startup.

Now let’s update on_search():

def on_search(self, event):
    search_results = []
    search_term = event.GetString()
    if search_term:
        query = {'q': search_term, 'media_type': 'image'}
        pub.sendMessage('search_results', query=query)

The on_search() method will get called when the user presses “Enter / Return” on their keyboard or when they press the search button icon in the search control widget. If the user has entered a search string into the search control, a search query will be constructed and then sent off using pubsub.

Let’s find out what happens when the user presses the Advanced Search button:

def on_advanced_search(self, event):
    self.search.Hide()
    self.search_panel.Hide()
    self.advanced_search_btn.Hide()
    self.advanced_search_panel.Show()
    self.main_sizer.Layout()

When on_advanced_search() fires, it hides the search widget, the regular search panel and the advanced search button. Next, it shows the advanced search panel and calls Layout() on the main_sizer. This will cause the panels to switch out and resize to fit properly within the frame.

The last method to create is update_ui():

def update_ui(self):
    """
    Hide advanced search and re-show original screen
 
    Called by pubsub when advanced search is invoked
    """
    self.advanced_search_panel.Hide()
    self.search.Show()
    self.search_panel.Show()
    self.advanced_search_btn.Show()
    self.main_sizer.Layout()

The update_ui() method is called when the user does an Advanced Search. This method is invoked by pubsub. It will do the reverse of on_advanced_search() and un-hide all the widgets that were hidden when the advanced search panel was shown. It will also hide the advanced search panel.

The frame code is the same as it was before, so it is not shown here.

Let’s move on and learn how the regular search panel is created!


The regular_search.py Script

The regular_search module is your refactored module that contains the ObjectListView that will show your search results. It also has the Download button on it.

The following methods / classes will not be covered as they are the same as in the previous iteration:

  • on_download()
  • on_selection()
  • update_image()
  • update_search_results()
  • The Result class

Let’s get started by seeing how the first few lines in the module are laid out:

# regular_search.py
 
import os
import requests
import wx
 
from download_dialog import DownloadDialog
from ObjectListView import ObjectListView, ColumnDefn
from pubsub import pub
from urllib.parse import urlencode, quote_plus
 
base_url = 'https://images-api.nasa.gov/search'

Here you have all the imports you had in the original nasa_search_ui.py script from version_1. You also have the base_url that you need to make requests to NASA’s image API. The only new import is for pubsub.

Let’s go ahead and create the RegularSearch class:

class RegularSearch(wx.Panel):
 
    def __init__(self, parent):
        super().__init__(parent)
        self.search_results = []
        self.max_size = 300
        font = wx.Font(12, wx.SWISS, wx.NORMAL, wx.NORMAL)
        main_sizer = wx.BoxSizer(wx.VERTICAL)
        self.paths = wx.StandardPaths.Get()
        pub.subscribe(self.load_search_results, 'search_results')
 
        self.search_results_olv = ObjectListView(
            self, style=wx.LC_REPORT | wx.SUNKEN_BORDER)
        self.search_results_olv.SetEmptyListMsg("No Results Found")
        self.search_results_olv.Bind(wx.EVT_LIST_ITEM_SELECTED,
                                     self.on_selection)
        main_sizer.Add(self.search_results_olv, 1, wx.EXPAND)
        self.update_search_results()

This code will initialize the search_results list to an empty list and set the max_size of the image. It also sets up a sizer and the ObjectListView widget that you use for displaying the search results to the user. The code is actually quite similar to the first iteration of the code when all the classes were combined.

Here is the rest of the code for the __init__():

main_sizer.AddSpacer(30)
self.title = wx.TextCtrl(self, style=wx.TE_READONLY)
self.title.SetFont(font)
main_sizer.Add(self.title, 0, wx.ALL|wx.EXPAND, 5)
img = wx.Image(240, 240)
self.image_ctrl = wx.StaticBitmap(self,
                                  bitmap=wx.Bitmap(img))
main_sizer.Add(self.image_ctrl, 0, wx.CENTER|wx.ALL, 5
               )
download_btn = wx.Button(self, label='Download Image')
download_btn.Bind(wx.EVT_BUTTON, self.on_download)
main_sizer.Add(download_btn, 0, wx.ALL|wx.CENTER, 5)
 
self.SetSizer(main_sizer)

The first item here is to add a spacer to the main_sizer. Then you add the title and the img related widgets. The last widget to be added is still the download button.

Next, you will need to write a new method:

def reset_image(self):
    img = wx.Image(240, 240)
    self.image_ctrl.SetBitmap(wx.Bitmap(img))
    self.Refresh()

The reset_image() method is for resetting the wx.StaticBitmap back to an empty image. This can happen when the user uses the regular search first, selects an item and then decides to do an advanced search. Resetting the image prevents the user from seeing a previously selected item and potentially confusing the user.

The last method you need to add is load_search_results():

def load_search_results(self, query):
    full_url = base_url + '?' + urlencode(query, quote_via=quote_plus)
    r = requests.get(full_url)
    data = r.json()
    self.search_results = []
    for item in data['collection']['items']:
        if item.get('data') and len(item.get('data')) > 0:
            data = item['data'][0]
            if data['title'].strip() == '':
                # Skip results with blank titles
                continue
            result = Result(item)
            self.search_results.append(result)
    self.update_search_results()
    self.reset_image()

The load_search_results() method is called using pubsub. Both the main and the advanced_search modules call it by passing in a query dictionary. Then you encode that dictionary into a formatted URL. Next you use requests to send a JSON request and you then extract the results. This is also where you call reset_image() so that when a new set of results loads, there is no result selected.

Now you are ready to create an advanced search!


The advanced_search.py Script

The advanced_search module is a wx.Panel that has all the widgets you need to do an advanced search against NASA’s API. If you read their documentation, you will find that there are around a dozen filters that can be applied to a search.

Let’s start at the top:

class AdvancedSearch(wx.Panel):
 
    def __init__(self, parent):
        super().__init__(parent)
 
        self.main_sizer = wx.BoxSizer(wx.VERTICAL)
 
        self.free_text = wx.TextCtrl(self)
        self.ui_helper('Free text search:', self.free_text)
        self.nasa_center = wx.TextCtrl(self)
        self.ui_helper('NASA Center:', self.nasa_center)
        self.description = wx.TextCtrl(self)
        self.ui_helper('Description:', self.description)
        self.description_508 = wx.TextCtrl(self)
        self.ui_helper('Description 508:', self.description_508)
        self.keywords = wx.TextCtrl(self)
        self.ui_helper('Keywords (separate with commas):',
                       self.keywords)

The code to set up the various filters is all pretty similar. You create a text control for the filter, then you pass it into ui_helper() along with a string that is a label for the text control widget. Repeat until you have all the filters in place.

Here are the rest of the filters:

self.location = wx.TextCtrl(self)
self.ui_helper('Location:', self.location)
self.nasa_id = wx.TextCtrl(self)
self.ui_helper('NASA ID:', self.nasa_id)
self.photographer = wx.TextCtrl(self)
self.ui_helper('Photographer:', self.photographer)
self.secondary_creator = wx.TextCtrl(self)
self.ui_helper('Secondary photographer:', self.secondary_creator)
self.title = wx.TextCtrl(self)
self.ui_helper('Title:', self.title)
search = wx.Button(self, label='Search')
search.Bind(wx.EVT_BUTTON, self.on_search)
self.main_sizer.Add(search, 0, wx.ALL | wx.CENTER, 5)
 
self.SetSizer(self.main_sizer)

At the end, you set the sizer to the main_sizer. Note that not all the filters that are in NASA’s API are implemented in this code. For example, I didn’t add media_type because this application will be hard-coded to only look for images. However if you wanted audio or video, you could update this application for that. I also didn’t include the year_start and year_end filters. Feel free to add those if you wish.

Now let’s move on and create the ui_helper() method:

def ui_helper(self, label, textctrl):
    sizer = wx.BoxSizer()
    lbl = wx.StaticText(self, label=label, size=(150, -1))
    sizer.Add(lbl, 0, wx.ALL, 5)
    sizer.Add(textctrl, 1, wx.ALL | wx.EXPAND, 5)
    self.main_sizer.Add(sizer, 0, wx.EXPAND)

The ui_helper() takes in label text and the text control widget. It then creates a wx.BoxSizer and a wx.StaticText. The wx.StaticText is added to the sizer, as is the passed-in text control widget. Finally the new sizer is added to the main_sizer and then you’re done. This is a nice way to reduce repeated code.

The last item to create in this class is on_search():

def on_search(self, event):
    query = {'q': self.free_text.GetValue(),
             'media_type': 'image',
             'center': self.nasa_center.GetValue(),
             'description': self.description.GetValue(),
             'description_508': self.description_508.GetValue(),
             'keywords': self.keywords.GetValue(),
             'location': self.location.GetValue(),
             'nasa_id': self.nasa_id.GetValue(),
             'photographer': self.photographer.GetValue(),
             'secondary_creator': self.secondary_creator.GetValue(),
             'title': self.title.GetValue()}
    pub.sendMessage('update_ui')
    pub.sendMessage('search_results', query=query)

When the user presses the Search button, this event handler gets called. It creates the search query based on what the user has entered into each of the fields. Then the handler will send out two messages using pubsub. The first message will update the UI so that the advanced search is hidden and the search results are shown. The second message will actually execute the search against NASA’s API.

Here is what the advanced search page looks like:

NASA Image Search with Advanced Search Page

Now let’s update the download dialog.


The download_dialog.py Script

The download dialog has a couple of minimal changes to it. Basically you need to add an import of Python’s os module and then update the save() function.

Add the following lines to the beginning of the function:

def save(self, path):
    _, ext = os.path.splitext(path)
    if ext.lower() != '.jpg':
        path = f'{path}.jpg'

This code was added to account for the case where the user does not specify the extension of the image in the saved file name.


Wrapping Up

This article covered a lot of fun new information. You learned one approach for working with an open API that doesn’t have a Python wrapper already around it. You discovered the importance of reading the API documentation and then added a user interface to that API. Then you learned how to parse JSON and download images from the Internet.

While it is not covered here, Python has a json module that you could use as well.

Here are some ideas for enhancing this application:

  • Caching search results
  • Downloading thumbnails in the background
  • Downloading links in the background

You could use threads to download the thumbnails and the larger images as well as for doing the web requests in general. This would improve the performance of your application. You may have noticed that the application became slightly unresponsive, depending on your Internet connectivity. This is because when it is doing a web request or downloading a file, it blocks the UI’s main loop. You should give threads a try if you find that sort of thing bothersome.


Download the Code


Related Reading

April 18, 2019 05:15 PM UTC


PyPy Development

PyPy 7.1.1 Bug Fix Release

The PyPy team is proud to release a bug-fix release version 7.1.1 of PyPy, which includes two different interpreters:
  • PyPy2.7, which is an interpreter supporting the syntax and the features of Python 2.
  • PyPy3.6-beta: the second official release of PyPy to support 3.6 features.
The interpreters are based on much the same codebase, thus the double release.

This bugfix fixes bugs related to large lists, dictionaries, and sets, some corner cases with unicode, and PEP 3118 memory views of ctype structures. It also fixes a few issues related to the ARM 32-bit backend. For the complete list see the changelog.

You can download the v7.1.1 releases here:
http://pypy.org/download.html

As always, this release is 100% compatible with the previous one and fixed several issues and bugs raised by the growing community of PyPy users. We strongly recommend updating.

The PyPy3.6 release is rapidly maturing, but is still considered beta-quality.

The PyPy team

April 18, 2019 04:24 PM UTC


Python Diary

Custom Home Automation System source release

I am happy to announce the release of my generation 1 home automation system source code. I will be releasing Generation 2, the code which is currently in-use in the next couple of days to a week. If you would like to be informed of the Generation 2 code drop, please watch the BitBucket repo to be informed.

First, a little bit of history. I originally started writing this code back in 2015 to run exclusively on my Raspberry Pi connected to an external speaker. It was controlled using HTTP URL endpoints, which can be hit using various NFC tags throughout my home. Eventually I bought a 7" touch-screen and an additional Raspberry Pi. This is when my automation system began to grow and mature more into what it is today. The first external display was placed in my bedroom, and ran PyCARS, another project I wrote for my home automation system. As a result, the original Raspberry Pi running the home automation system no longer needed an attached speaker, and instead a UDP broadcast packet was sent on my home network to notify any listening HUD(a PyCars device).

For a time this configuration worked great, but as the system got more complex, I began to see more and more thread-locking which crashed the entire system from time-to-time. As a result, the system was rewritten to use Gevent, and the UDP broadcast system was replaced by ZeroMQ to ensure the packets were always received.

Generation 2, which will be available soon within the repo has many new features which were written during 2018 and partly this year. This system is the one which is currently in-use, however it will be replaced eventually with a newer idea.

If you wanted to check out the source code for my custom made home automation system, which is built in Python, you can find it on BitBucket here: Home Automation source code. The code is licensed under the GPLv2.

April 18, 2019 03:48 PM UTC


Codementor

Generators in Python

Basic generator functionality explained

April 18, 2019 03:17 PM UTC


Neckbeard Republic

Immutability in Python

In Python, immutable vs mutable data types and objects types can cause some confusion—and weird bugs. With this course you'll see what the difference between mutable and immutable data types is in Python, and how you can use it to your advantage in your own programs.

April 18, 2019 02:00 PM UTC


Stefan Scherfke

Packaging Python inside your organization with GitLab and Conda

Python Packaging has recently been discussed a lot, but the articles usually only focus on publishing (open source) code to PyPI.

But what do you do when your organization uses Python for in-house development and you can’t (or don’t want to) make everything Open Source? Where do you store and manage your code? How do you distribute your packages?

In this article, I describe how we solve this problem with GitLab, Conda and a few other tools.

You can find all code and examples referenced in this article under gitlab.com/ownconda. These tools and examples are using the own prefix in order to make a clear distinction between our own and third-party code. I will not necessarily update and fix the code, but it is released under the Blue Oak license so you can copy and use it. Any feedback is welcome, nonetheless.

Contents:

Software selection

In this section I’ll briefly explain the reasons why we are using GitLab and Conda.

Code and issue management

Though you could use private repositories from one of the well-known cloud services, you should probably use a self-hosted service to retain full control over your code. In some countries it may even be forbidden to use a US cloud service for your organization’s data.

There are plenty of competitors in this field: GitLab, Gitea, Gogs, Gitbucket or Kallithea—just to name a few.

Our most important requirements are:

The only tool that (currently) meets these requirements is GitLab. It has a lot more features that are very useful for an organization wide use, e.g., LDAP and Kerberos support, issue labels and boards, Mattermost integration or Git LFS support. And—more importantly—it also has a really nice UX and is one of the few pieces of software that I actually enjoy using.

GitLab has a free core and some paid versions that add more features and support.

The package manager: Pip or Conda?

Pip is the official package installer for Python. It supports Python source distributions and (binary) Wheel packages. Pip only installs files in the current environment’s site-packages directory and can optionally create entry points in its bin directory. You can use Virtualenv to isolate different projects from another, and Devpi to host your own package index. Devpi can both, mirror/cache PyPI and store your own packages. The Python packaging ecosystem is overlooked by the Python Packaging Authority working group (PyPA).

Conda stems from the scientific community and is being developed by Anaconda. In contrast to Pip, Conda is a full-fledged package manager similar to apt or dnf. Like virtualenv, Conda can create isolated virtual environments. Conda is not directly compatible with Python’s setup.py or pyproject.toml files. Instead, you have to create a Conda recipe for every package and build it with conda-build. This is a bit more involved because you have to convert every package that you find on PyPI, but it also lets you patch and extend every package. With very little effort you can create a self-extracting Python distribution with a selection of custom packages (similar to the Miniconda distribution).

Conda-forge is a (relatively) new project that has a huge library of Conda recipes and packages. However, if you want full control over your own packages you may want to host and build everything on your own.

What to use?

Because we need to package more than just Python, we chose to use Conda. This dates back to at least to Conda v2.1 which was released in 2013. At that time, projects like conda-forge weren’t even in sight.

Supplementary tools

To aid our work with GitLab and Conda, we developed some supplementary tools. I have released a slightly modified version of them, called ownconda tools, alongside with this article.

The ownconda tools are a click based collection of commands that reside under the entry point ownconda.

Initially, they were only meant to help with the management of recipes for external packages, and with running the build/test/upload steps in our GitLab pipeline. But they have become a lot more powerful by now and even include a GitLab Runner that lets you run your projects’ pipelines locally (including artifacts handling, which the official gitlab-runner cannot do locally).

$ ownconda --help
Usage: ownconda [OPTIONS] COMMAND [ARGS]...

  Support tools for local development, CI/CD and Conda packaging.

Options:
  --help  Show this message and exit.

Commands:
  build                 Build all recipes in RECIPE_ROOT in the correct...
  check-for-updates     Update check for external packages in RECIPE_ROOT.
  ci                    Run a GitLab CI pipeline locally.
  completion            Print Bash or ZSH completion activation script.
  dep-graph             Create a dependency graph from a number of Conda...
  develop               Install PATHS in develop/editable mode.
  gitlab                Run a task on a number of GitLab projects.
  lint                  Run pylint for PATHS.
  make-docs             Run sphinx-build and upload generated html...
  prune-index           Delete old packages from the local Conda index at...
  pylintrc              Print the built-in pylintrc to stdout.
  pypi-recipe           Create or update recipes for PyPI packages.
  sec-check             Run some security checks for PATHS.
  show-updated-recipes  Show updated recipes in RECIPE_ROOT.
  test                  Run tests in PATHS.
  update-recipes        Update Conda recipes in RECIPE_ROOT.
  upload                Upload Conda packages in PKG_DIR.
  validate-recipes      Check if recipes in RECIPE_ROOT are valid.

I will talk about the various subcommands in more detail in later sections.

How it should work

The subject of packaging consists of several components: The platforms on which your code needs to build and run, the package manager and repository, management of external and internal packages, a custom Python distribution, and means to keep an overview over all packages and their dependencies. I will go into detail about each aspect in the following sections.

Aspects involved in the topic of packaging

Runtime and build environment

Our packages need to run on Fedora desktop systems and on Centos 7. Packages built on Centos also run on Fedora, so we only have a single build environment: Centos 7.

We use different Docker images for our build pipeline and some deployments. The most important ones are centos7-ownconda-runtime and centos7-ownconda-develop. The former only contains a minimal setup to install and run Conda packages while the latter includes all build dependencies, conda-build and the ownconda tools.

If your OS landscape is more heterogeneous, you may need to add more build environments which makes things a bit more complicated—especially if you need to support macOS or even Windows.

To build Docker images in our GitLab pipelines, we use docker-in-docker. That means that the GitLab runners start docker containers that can access /var/run/dockers.sock to run docker build.

GitLab provides a Docker registry that allows any project to host its own images. However, if a project is private, other project’s pipelines can not access these images. For this reason, we have decided to serve Docker images from a separate host.

3rd party packages

We re-package all external dependencies as Conda packages and host them in our own Conda repository.

This has several benefits:

Recipe organization

We can either put the recipe for every package into its own repository (which is what conda-forge does) or use a single repository for all recipes (which is what we are doing).

The multi-repository approach makes it easier to only build packages that have changed. It also makes it easier to manage access levels if you have a lot of contributors that each only manage a few packages.

The single-repository approach has less overhead if you only have a few maintainers that take care of all the recipes. To identify updated packages that need re-building, we can use ownconda’s show-updated-recipes command.

Linking against system packages

With Conda, we can (and must) decide whether we want to link against system packages (e.g., installed with yum or use other Conda packages to satisfy a package’s dependencies.

One extreme would be to only build Python packages on our own and completely depend on system packages for all C libraries. The other extreme would be to build everything on our own, even glibc and gcc.

The former has a lot less overhead but becomes the more fragile the more heterogeneous your runtime environments become. The latter is a lot more complicated and involved but gives you more control and reliability.

We decided to take the middle ground between these two extremes: We build many libraries on our own but rely on the system’s gcc, glibc, and X11 libraries. This is quite similar to what the manylinux standard for Python Wheels does.

Recipes must list the system libraries that they link against. The rules for valid system libraries are encoded in ownconda validate-recipes and enforced by conda-build’s –error-overlinking option.

Recipe management

Recipes for Python packages can easily be created with ownconda pypi-recipe. This is similar to conda skeleton pypi but tailored to our needs. Recipes for other packages have to be created manually.

We also implemented an update check for our recipes. Every recipe contains a script called update_check.py which uses one of the update checkers provided by the ownconda tools.

These checkers can query PyPI, GitHub release lists and (FTP) directory listings, or crawl an entire website. The command ownconda check-for-updates runs the update scripts and compares the version numbers they find against the recipes’ current versions. It can also print URLs to the packages’ changelogs:

$ own check-for-updates --verbose .
  [████████████████████████████████████]  100%
Package: latest version (current version)
freetype 2.10.0 (2.9.1):
  https://www.freetype.org/index.html#news

python-attrs 19.1.0 (18.2.0):
  http://www.attrs.org/en/stable/changelog.html

python-certifi 2019.3.9 (2018.11.29):
  https://github.com/certifi/python-certifi/commits/master

...

qt5 5.12.2 (5.12.1):
  https://wiki.qt.io/Qt_5.12.2_Change_Files

readline 8.0.0 (7.0.5):
  https://tiswww.case.edu/php/chet/readline/CHANGES

We can then update all recipes with ownconda update-recipes:

$ ownconda update-recipes python-attrs ...
python-attrs
cd /data/ssd/home/stefan/Projects/ownconda/external-recipes && /home/stefan/ownconda/bin/python -m own_conda_tools pypi-recipe attrs -u
diff --git a/python-attrs/meta.yaml b/python-attrs/meta.yaml
index 7d167a8..9b3ea20 100644
--- a/python-attrs/meta.yaml
+++ b/python-attrs/meta.yaml
@@ -1,10 +1,10 @@
 package:
  name: attrs
-  version: 18.2.0
+  version: 19.1.0

 source:
-  url: https://files.pythonhosted.org/packages/0f/9e/26b1d194aab960063b266170e53c39f73ea0d0d3f5ce23313e0ec8ee9bdf/attrs-18.2.0.tar.gz
-  sha256: 10cbf6e27dbce8c30807caf056c8eb50917e0eaafe86347671b57254006c3e69
+  url: https://files.pythonhosted.org/packages/cc/d9/931a24cc5394f19383fbbe3e1147a0291276afa43a0dc3ed0d6cd9fda813/attrs-19.1.0.tar.gz
+  sha256: f0b870f674851ecbfbbbd364d6b5cbdff9dcedbc7f3f5e18a6891057f21fe399

 build:
-  number: 1
+  number: 0

...

The update process

Our Conda repository has various channels for packages of different maturity, e.g. experimental, testing, staging, and stable.

Updates are first built locally and uploaded to the testing channel for some manual testing.

If everything goes well, the updates are committed into the develop branch, pushed to GitLab and uploaded to the staging channel. We also send a changelog around to notify everyone about important updates and when they will be uploaded into the stable channel.

After a few days in testing, the updates are merged into the master branch and upload to the stable channel for production use.

This is a relatively save procedure which (usually) catches any problems before they go into production.

Example recipes

You can find the recipes for all packages required to run the ownconda tools here. As a bonus, I also added the recipes for NumPy and PyQt5.

Internal projects

Internal packages are structured in a similar way to most projects that you see on PyPI. We put the source code into src, the pytest tests into tests and the Sphinx docs into docs. We do not use namespace packages. They can lead to various nasty bugs. Instead, we just prefix all packages with own_ to avoid name clashes with other packages and to easily tell internal and external packages apart.

A project usually has the folloing files and directories: .gitignore, .gitlab-ci.yml, conda/meta.yaml, setup.py, setup.cfg, MANIFEST.in, docs/, src/, tests/ A project usually contains at least these files and directories.

The biggest difference to “normal” Python projects is the additional Conda recipe in each project. It contains all meta data and the requirements. The setup.py contains only the minimum amount of information to get the package installed via pip:

ownconda develop also creates/updates a Conda environment for the current project and installs all requirements that it collects from the project’s recipe.

Projects also contain a .gitlab-ci.yml which defines the GitLab CI/CD pipeline. Most projects have at least a build, a test and an upload stage. The test stage is split into parallel steps for various test tools (e.g., pytest, pylint and bandit). Projects can optionally build documentation and upload it to our docs server. The ownconda tools provide helpers for all of these steps:

We also use our own Git flow:

Visualisation of our Git flow

Package and documentation hosting

Hosting a Conda repository is very easy. In fact, you can just run python -m http.server in your local Conda base directory if you previously built any packages. You can then use it like this: conda search --override-channels --channel=http://localhost:8000/conda-bld PKG.

A Conda repository consists of one or more channels. Each channel is a directory that contains a noarch directory and additional platform directories (like linux-64). You put your packages into these directories and run conda index channel/platform to create an index for each platform (you can omit the platform with newer versions of conda-build). The noarch directory must always exist, even if you put all your packages into the linux-64 directory.

The base URL for our Conda channels is https://forge.services.own/conda/channel. You can put a static index.html into each channel’s directory that parses the repo data and displays it nicely:

Forge channel view.  A JavaScript reads and renders the contents of the repodata.json. A JavaScript reads and renders the contents of a channel’s repodata.json.

The upload service (for packages created in GitLab pipelines) resides under https://forge.services.own/upload/<channel>. It is a simple web application that stores the uploaded file in channel/linux-64 and runs conda index. For packages uploaded to the stable channel, it also creates a hard link in a special archive channel.

Every week, we prune our channels with ownconda prune-index. In case that we accidentally prune too aggressively, we have the option to restore packages from the archive.

We also host our own Read the Docs like service. GitLab pipelines can upload Sphinx documentation to https://forge.services.own/docs via ownconda make-docs.

Note

The server name forge does not refer to conda-forge but to SourceForge.net, which was quite popular back in the days.

Python distribution

With Constructor, you can easily create your own self-extractable Python distribution. These distributions are similar to miniconda, but you can customize them to your needs.

A constructor file is a simple YAML file with some meta data (e.g., the distribution name and version) and the list of packages that should be included. You can also specify a post-install script.

The command constructor <distdir>/construct.yaml will then download all packages and put them into a self extracting Bash script. We upload the installer scripts onto our Conda index, too.

Instead of managing multiple construct.yaml files manually, we create them dynamically in a GitLab pipeline which makes building multiple similar distributions (e.g., for different Python versions) a bit easier.

Deployment

We are currently on the road from copy-stuff-with-fabric-to-vms to docker-kubernetes-yay-land. I am not going to go too much into detail here—this topic is not directly related to packaging and worth its own article.

Most of our deployments are now Ansible based. Projects contain an ansible directory with the required playbooks and other files. Shared roles are managed in a separate ownsible project. The ansible deployments are usually part of the GitLab CI/CD pipeline. Some are run automatically, some need to be triggered manually.

Some newer projects are already using Docker based deployments. Docker images are built as part of the pipeline and uploaded into our Docker registry from which they are then pulled for deployments.

Dependency management

It is very helpful if you can build a dependency graph of all your packages.

Not only can it be used to build all packages in the correct order (as we will shortly see), but visualizing your dependencies may also help you to improve your architecture, detect circular dependencies or unused packages.

The command ownconda dep-graph builds such a dependency graph from the packages that you pass to it. It can either output a sorted list of packages or a DOT graph. Since the resulting graph can become quite large, there are several ways to filter packages. For example, you can only show a package’s dependencies or why the package is needed.

The following figure shows the dependency graph for our python recipe. It was created with the command ownconda dep-graph external-recipes/ --implicit --requirements python --out=dot > deps_python.dot:

Dependency graph for Python Dependency graph for Python

These graphs can become quite unclear relatively fast, though. This is the full dependency graph for the ownconda tools:

Dependency graph for the ownconda tools Dependency graph for the ownconda tools

I do not want to know how this would have looked if these were all JavaScript packages …

Making it work

Now that you know the theory of how everything should work, we can start to bootstrap our packaging infrastructure.

Some of the required steps are a bit laborious and you may need the assistance of your IT department in order to set up the domains and GitLab. Other steps can be automated and should be relatively painless, though:

Set up GitLab and a Conda repo server

  1. Install GitLab. I’ll assume that it will be available under https://git.services.own.
  2. Setup the forge server. I’ll assume that it will be available under https://forge.services.own:

    • In your www root, create a conda folder which will contain the channels and their packages.
    • Create the upload service that copies files sent to /upload/channel into www-root/conda/channel/linux-64 and calls conda index.
    • Setup a Docker registry on the server.

Bootstrap Python, Pip and Conda

  1. Clone all repositories that you need for the bootstrapping process:

    $ mkdir -p ~/Projects/ownconda
    $ cd ~/Projects/ownconda
    $ for r in external-recipes ownconda-tools ownconda-dist; do \
    >     git clone git@gitlab.com:ownconda/$r.git \
    > done
    
  2. Build all packages needed to create your Conda distribution. The ownconda tools provide a script that uses a Docker container to build all packages and upload them into the stable channel:

    $ ownconda-tools/contrib/bootstrap.sh
    

    Note

    The script might fail to build some packages. The most probable causes are HTTP timeouts or unavailable servers. Just re-run the script and hope for the best. If the issue persists, you might need to fix the corresponding Conda recipe, though (Sometimes, people re-upload a source archive and thereby change its SHA256 value).

  3. Create the initial Conda distributions and upload them:

    $ cd ownconda-dist
    $ python gen_installer.py .. 3.7
    $ python gen_installer.py .. 3.7 dev
    $ cd -
    $ curl -F "file=@ownconda-3.7.sh" https://forge.services.own/upload/stable
    $ curl -F "file=@ownconda-3.7-dev.sh" https://forge.services.own/upload/stable
    $
    $ # Create symlinks for more convenience:
    $ ssh forge.services.own
    # cd www-root/conda/stable
    # ln -s linux-64/ownconda-3.7.sh
    # ln -s linux-64/ownconda-3.7.sh ownconda.sh
    # ln -s linux-64/ownconda-3.7-dev.sh
    # ln -s linux-64/ownconda-3.7-dev.sh ownconda-dev.sh
    

    You can now download the installers from https://forge.services.own/conda/stable/ownconda[-dev][-3.7].sh

  4. Setup your local ownconda environment. You can use the installer that you just built (or (re)download it from the forge if you want to test it):

    $ bash ownconda-3.7.sh
    $ # or:
    $ cd ~/Downloads
    $ wget https://forge.services.own/conda/stable/ownconda-dev.sh
    $ bash ownconda-dev.sh
    $
    $ source ~/.bashrc   # or open a new terminal
    $ conda info
    $ ownconda --help
    

Build the docker images

  1. Create a GitLab pipeline for the centos7-ownconda-runtime project. This will generate your runtime Docker image.
  2. When the runtime image is available, create a GitLab pipeline for the centos7-ownconda-develop project. This will generate your development Docker image used in your projects’ pipelines.

Build all packages

  1. Create a GitLab pipeline for the external-recipes project to build and upload the remaining 3rd party packages.
  2. You can now build the packages for your internal projects. You must create the pipelines in dependency order so that the requirements for each project are built first. The ownconda tools help you with that:

    $ mkdir gl-projects
    $ cd gl-projects
    $ ownconda gitlab update
    $ ownconda dep-graph --no-third-party --out=project . > project.txt
    $ for p in $(cat projects.txt); do \
    >     ownconda gitlab -p $p run-py ../ownconda-tools/contrib/gl_run_pipeline.py \
    > done
    

    If a pipeline fails and the script aborts, just remove the successful projects from the projects.txt and re-run the for loop.

Congratulations, you are done! You have built all internal and external packages, you have created your own Conda distribution and you have all Docker images that you need for running and building your packages.

Outlook / Future work and unsolved problems

Managing your organization’s packaging infrastructure like this is a whole lot of work but it rewards you with a lot of independence, control and flexibility.

We have been continuously improving our process during the last years and still have a lot of ideas on our roadmap.

While, for example, GitLab has a very good authentication and authorization system, our Conda repository lacks all of this (apart from IP restrictions for uploading and downloading packages). We do not want users (or automated scripts) to enter credentials when they install or update packages, but we are not aware of a (working) password-less alternative. Combining Conda with Kerberos might work in theory, but in practice this is not yet possible. Currently, we are experimenting with HTTPS client certificates. This might work well enough but it also doesn’t seem to be the Holy Grail of Conda Authorization.

Another big issue is creating more reproducible builds and easier rollback mechanisms in case an update ships broken code. Currently, we are pinning the requirements’ versions during a pipelines test stage. We are also working towards dockerized Blue Green Deployments and are exploring tools for container orchestration (like Kubernetes). On the other hand, we are still delivering GUI applications to client workstations via Bash scripts … (this works quite well, though, and provides us with a good amount of control and flexibility).

We are also still having an eye on Pip. Conda has the biggest benefits when deploying packages to VMs and client workstations. The more we use docker, the smaller the benefit might become, and we might eventually switch back to Pip.

But for now, Conda serves us very well.

Comments

You can leave comments and suggestions at Hackernews and Reddit or reach me via Twitter and Mastodon.

April 18, 2019 11:37 AM UTC


The Code Bits

Introduction to Generators in Python

In this post, we will learn what generators are, how to create them, how they work and how to use them in Python.

Generator function

Generators are functions that allow us to create iterators in Python. They provide a convenient, simple and memory-efficient approach to creating iterators. These are useful when dealing with large amounts of data.

Before starting with generators, it would be good to understand how a for-loop works in Python. It will be also be useful to know what iterable, iterator and the iterator protocol are.

An example: Generate even numbers using a generator function

Let us start with a simple example. We will be creating a generator function which generates a specific count of even numbers starting from a given value. We will be using this same example throughout this post.

def generate_even_numbers(start, count):
    # Make sure that the first number is even.
    start = start if start % 2 == 0 else start + 1

    while count > 0:
        yield start
        start += 2
        count -= 1

Note that we used a yield statement within the function body to return our data. If you don’t understand it right away, no need to worry, we will get to its roots soon enough!

Let us see how we would use this generator function in a for-loop.

>>> generator_iterator = generate_even_numbers(0, 3)
>>> for num in generator_iterator:
...     print(num)
...
0
2
4

As you can see, we were able to use the value returned by the generator function in a for-loop, so it must have been an iterable.

Generator function returns a generator iterator

Let us check the type of the value returned by the generator function.

>>> generator_iterator = generate_even_numbers(0, 3)
>>> type(generator_iterator)
<class 'generator'>

Okay, so the value returned is of type ‘generator’. This value is usually referred to as the generator iterator, even though the term generator is sometimes used interchangeably to refer to both the generator function as well as the generator iterator.

Now let us confirm that the generator_iterator is indeed an iterator. As per the iterator protocol, an iterator must:

  1. return its elements one by one when next() method is called on it. When all the elements are exhausted, it must raise StopIteration.
  2. >>> generator_iterator = generate_even_numbers(0, 3)
    >>> next(generator_iterator)
    0
    >>> next(generator_iterator)
    2
    >>> next(generator_iterator)
    4
    >>> next(generator_iterator)
    Traceback (most recent call last):
      File "", line 1, in 
    StopIteration
    
  3. return itself when iter() method is called on it.
  4. >>> generator_iterator = generate_even_numbers(0, 3)
    >>> generator_iterator
    <generator object generate_even_numbers at 0x10cb431b0>
    >>> iter(generator_iterator)
    <generator object generate_even_numbers at 0x10cb431b0>
    

So now we know that the generator function is a convenient way to create an iterator. But what makes this function different from our normal methods in Python? How does it return an iterator? The answer lies in the yield statement.

How does the generator function work?

Let us revisit our generator example, now with some prints so that we can clearly understand how it works.

def generate_even_numbers(start, count):
    print("In the generator function")

    # Make sure that the first number is even.
    start = start if start % 2 == 0 else start + 1

    while count > 0:
        print("[count:{}] Hello! Before I yield....".format(count))
        yield start
        print("[count:{}] Hey! I am back!!".format(count))
        start += 2
        count -= 1
    print("[count:{}] That's all I have got...".format(count))

Now let us see its usage.

>>> generator_iterator = generate_even_numbers(3, 2)
>>> for num in generator_iterator:
...     print("Processing even number: {}".format(num))
...
In the generator function
[count:2] Hello! Before I yield....
Processing even number: 4
[count:2] Hey! I am back!!
[count:1] Hello! Before I yield....
Processing even number: 6
[count:1] Hey! I am back!!
[count:0] That's all I have got...

There are a couple of things you should notice:

  1. Lazy evaluation
  2. How the yield statement works

Let us discuss these.

Lazy evaluation

Calling the function generate_even_numbers(3, 2) just returns a generator iterator. It does not start executing the function. This is called lazy evaluation. They start executing and yielding values only when it is needed, that is, when next() is called. As a result, only one element of the iterator is held in memory at a time. This makes them memory efficient and hence useful when dealing with large amounts of data.

How does the yield statement work?

By now, you may have gathered that the only special thing about the generator function with respect to normal functions is that they use yield to return their values. However, the yield statement is very much different from a normal return statement.

The yield statement makes a function a generator.

When next() is called on the generator iterator, the generator function executes till a yield statement is encountered. When the yield statement is reached, the execution state of the function is remembered (including the local variables and any try statements) and the function’s execution is temporarily suspended. The value associated with the yield statement is returned by the next() method.

When next() is called again, the generator function resumes execution. The saved local execution state is recollected and the statement next to yield is executed first. Then it continues executing till the next yield statement is encountered. Thus goes the process.

Finally, if there is no more yield in the generator function when next() is called, it ends up raising StopIteration. At this point, the for-loop would exit.

A simpler example: Generator function to yield some strings

Let us make sure that all of that is clear with a simpler example.

def generate_hello_world():
    print("....Started executing the generator function")
    yield "Hello"
    print("....Between yields!")
    yield "World"
    print("....Done with yields!")

Let us see how to use the iterator returned by the generator function using next() method.

>>> """ We get the generator iterator """
>>> generator_iterator = generate_hello_world()

>>> """ When next() is called, the function executes till the first yield statement """
>>> next(generator_iterator)
....Started executing the generator function
'Hello'

>>> """ When next() is called again, it picks up where it left off and executes till the next yield statement """
>>> next(generator_iterator)
....Between yields!
'World'

>>> """ When there are no more yields, calling next() raises StopIteration """
>>> next(generator_iterator)
....Done with yields!
Traceback (most recent call last):
  File "", line 1, in 
StopIteration

Now let us see how to use the generator function in a for-loop.

>>> for word in generate_hello_world():
...     print(word)
...
....Started executing the generator function
Hello
....Between yields!
World
....Done with yields!
>>>

On a side note, pay attention to how we did not use a separate variable to hold the generator iterator as in our previous examples. We directly called the generator function with the for-loop. This is doable because of how a for-loop works in Python. The expression following “in” is evaluated only once. This expression is expected to result in an iterable. In this case, it will result in the generator iterator. Then the method iter() is called on the iterable to get the iterator associated with it. Then next() is called repeatedly on the iterator until the iterator is exhausted.

Conclusion

In this post, we learned how to create generator functions in Python, how they work and how to use them.

April 18, 2019 07:46 AM UTC


Davide Moro

Testing metrics thoughts and examples: how to turn lights on and off through MQTT with pytest-play

In this article I'll share some personal thoughts about test metrics and talk about some technologies and tools playing around a real example: how to turn lights on and off through MQTT collecting test metrics.

By the way the considerations contained in this article are valid for any system, technology, test strategy and test tools so you can easily integrate your existing automated tests with statsd with a couple of lines of code in any language.

I will use the pytest-play tool in this example so that even non programmers should be able to play with automation collecting metrics because this tool is based on YAML (this way no classes, functions, threads, imports, no compilation, etc) and if Docker is already no installation is needed. You'll need only a bit of command line knowledge and traces of Python expressions like variables["count"] > 0.

Anyway... yes, you can drive telematics/IoT devices with MQTT using pytest-play collecting and visualizing metrics thanks to:

or any other statsd capable monitoring engine.

In our example we will see step by step how to:
using MQTT and pytest-play, using YAML files.

Why test metrics?

"Because we can" (cit. Big Band TheorySeries 01 Episode 09 - The Cooper-Hofstadter Polarization):
Sheldon: Someone in Sezchuan province, China is using his computer to turn our lights on and off.
Penny: Huh, well that’s handy. Um, here's a question... why?!
All together: Because we can!
If the "Because we can" answer doesn't convince your boss, there are several advantages that let you react proactively before something of not expected happens. And to be proactive you need knowledge of you system under test thanks to measurable metrics that let you:
so that you can be proactive and:
Ouch! The effects of a bad release in action
In addition you can:
What should you measure? Everything of valuable for you:.

Some information about statsd/Graphite and MQTT

statsd/Graphite

Very very interesting readings about statsd and the measure everything approach:


If you are not familiar with statsd and Graphite you can install it (root/root by default):

docker run -d\
 --name graphite\
 --restart=always\
 -p 80:80\
 -p 2003-2004:2003-2004\
 -p 2023-2024:2023-2024\
 -p 8125:8125/udp\
 -p 8126:8126\
 graphiteapp/graphite-statsd

and play with it sending fake metrics using nc:
echo -n "my.metric:320|ms" | nc -u -w0 127.0.0.1 8125
you'll find a new metric aggregations available:
stats.timers.$KEY.mean
stats.timers.$KEY.mean_$PCT
stats.timers.$KEY.upper_$PCT
stats.timers.$KEY.sum_$PCT
...
where:

More info, options, configurations and metric types here:

What is MQTT?

From http://mqtt.org/:
MQTT is a machine-to-machine (M2M)/"Internet of Things" connectivity protocol.
It was designed as an extremely lightweight publish/subscribe messaging transport.
It is useful for connections with remote locations where a small code footprint is required and/or network bandwidth is at a premium.
MQTT is the standard de facto for smarthome/IoT/telematics/embedded devices communications, even on low performance embedded devices, and it
is available on many cloud infrastructures.

Every actor can publish a message for a certain topic and every actor can subscribe to a set of topics, so you get a message for every message of interest.

Topics are hierarchical so that you can subscribe to a very specific or wide range of topics coming from devices or sensors (e.g., /house1/room1/temp, /house1/room1/humidity or all messages related to /house1/room1/ etc).

For example in a telematics application every device will listen to any command or configuration sent by a server component through a MQTT broker (e.g., project1/DEVICE_SN/cmd);
server will be notified for any device response or communication subscribing to a particular topic (e.g., project1/DEVICE_SN/data).
So:
If you are not confident with MQTT you can install the mosquitto utility and play with the mosquitto_sub and mosquitto_pub commands using with the public broker iot.eclipse.org. For example you can publish a message for a given topic:
$ mosquitto_pub -t foo/bar -h iot.eclipse.org -m "hello pytest-play!"
 and see the response assuming that you previously subscribed to foo/bar (we see all messages sent with mosquitto_pub of our topics of interest here):
$ mosquitto_sub -t foo/bar/# -h iot.eclipse.org -v

Prerequisites

pytest-play is multi platform because it is based on Python (installation might be different for different operative system).
Using Docker instead no installation is required, you need to install Docker and you are ready to start playing with pytest-play without any installation:
As a user you should be confident with a shell and command line options.

Steps

And now let's start with our example.

Create a new folder project

Create a new folder (e.g., fridge) and enter inside.

Create a variables file

Create a env.yml file with the following contents:
pytest-play:
  mqtt_host: YOUR_MQTT_HOST
  mqtt_port: 20602
  mqtt_endpoint: foo/bar
You can have one or more configuration files defining variables for your convenience. Typically you have one configuration file or each target environment (e.g., dev.yml, alpha.yml, etc).

We will use later this file for passing variables thanks to the --variables env.yml command line option, so you can switch environment passing different files.

Create the YML script file

Create a a YAML file called test_light_on.yml inside the fridge folder or any other subfolder if any. Note well: the *.yml extension and test_ prefix matter otherwise the file won't be considered as executable at this time of writing.

If you need to simulate a command or simulate a device activity you need just one command inside your YAML file:
- comment: send light turn ON command
  provider: mqtt
  type: publish
  host: "$mqtt_host"
  port: "$mqtt_port"
  endpoint: "$mqtt_endpoint/$device_serial_number/cmd"
  payload: '{"Payload":"244,1"}'
where 244 stands for the internal ModBUS registry reference for the fridge light and 1 stands for ON (and 0 for OFF).

But... wait a moment. Until now we are only sending a payload to a MQTT broker resolving the mqtt_host variable for a given endpoint and nothing more... pretty the same business you can do with mosquitto_pub, right? You are right! That's why we are about to implement something of more:
Put inside our file the following contents inside the test_light_on.yml file and save:
markers:
  - light_on
test_data:
  - device_serial_number: 8931087315095410996
  - device_serial_number: 8931087315095410997
---
- comment: subscribe to device data and store messages to messages variable once received (non blocking subscribe)
  provider: mqtt
  type: subscribe
  host: "$mqtt_host"
  port: "$mqtt_port"
  topic: "$mqtt_endpoint/$device_serial_number"
  name: "messages"
- comment: send light turn ON command
  provider: mqtt
  type: publish
  host: "$mqtt_host"
  port: "$mqtt_port"
  endpoint: "$mqtt_endpoint/$device_serial_number/cmd"
  payload: '{"Payload":"244,1"}'
- comment: start tracking response time (stored in response_time variable)
  provider: metrics
  type: record_elapsed_start
  name: response_time
- comment: wait for a device response
  provider: python
  type: while
  timeout: 12
  expression: 'len(variables["messages"]) == 0'
  poll: 0.1
  sub_commands: []
- command: store elapsed response time in response_time variable
  provider: metrics
  type: record_elapsed_stop
  name: response_time
- comment: assert that status light response was sent by the device
  provider: python
  type: assert
  expression: 'loads(variables["messages"][0])["measure_id"] == [488]'
- comment: assert that status light response was sent by the device with status ON
  provider: python
  type: assert
  expression: 'loads(variables["messages"][0])["bin_value"] == [1]'
Let's comment command by command and section by section the above YAML configuration.

Metadata, markers and decoupled test data

First of all the --- delimiter splits an optional metadata document from the scenario itself. The metadata section in our example contains:
markers:
  - light_on
You can mark your scripts with one or more markers so that you can select which scenario will run from the command line using marker expressions like -m light_off or  something like -m "light_off and not slow" assuming that you have some script marked with the pretend slow marker.

Decoupled test data and parametrization

Assume that you have 2 or more real devices providing different firmware versions always ready to be tested.

In such case we want define our scenario once and it will be executed more thanks to parametrization. Our scenario will be executed for each any item defined in the test_data array in the metadata section. In our example it will be executed twice:
test_data:
  - device_serial_number: 8931087315095410996
  - device_serial_number: 8931087315095410997
If you want you can track different metrics for different serial numbers so that you are able to compare different firmware versions.

Subscribe to topics where we expect a device response

As stated in the official play_mqtt documentation https://github.com/davidemoro/play_mqtt
you can subscribe to one or more topics using the mqtt provider and type: subscribe. You have to provide the where the MQTT broker host lives (e.g., iot.eclipse.org), the port, obviously the topic you want to subscribe (e.g., foo/bar/$device_serial_number/data/light where $device_serial_number will be replaced with what you define in environment configuration files or for each test_data section.
- comment: subscribe to device data and store messages to messages variable once received (non blocking subscribe)
  provider: mqtt
  type: subscribe
  host: "$mqtt_host"
  port: "$mqtt_port"
  topic: "$mqtt_endpoint/$device_serial_number"
  name: "messages"
This is a non blocking call so that while the flow continues, it will collect underground every message published on the topics of our interest storing them to a messages variable.

messages is an array containing all matching messaging coming from MQTT and you can access to the messages value in expressions with variables["messages"].

Publish a command

This is self explaining (you can send any payload, even dynamic/parametrized payloads):
- comment: send light turn ON command
  provider: mqtt
  type: publish
  host: "$mqtt_host"
  port: "$mqtt_port"
  endpoint: "$mqtt_endpoint/$device_serial_number/cmd"
  payload: '{"Payload":"244,1"}'
where 244 is the internal reference and 1 stands for ON.

Track time metrics

This command let you start tracking time from now until a record_elapsed_stop will be executed:
- comment: start tracking response time (stored in response_time variable)
  provider: metrics
  type: record_elapsed_start
  name: response_time
... <one or more commands or asynchronous waiters here>
- command: store elapsed response time in response_time variable
  provider: metrics
  type: record_elapsed_stop
  name: response_time
The time metric will be available under a variable name called in our example response_time (from name: response_time). For a full set of metrics related commands and options see https://github.com/pytest-dev/pytest-play.

You can record key metrics of any type for several reasons:
  • make assertions about some expected timings
  • report key performance metrics or properties in custom JUnit XML reports (in conjunction with the command line option --junit-xml results.xml for example so that you have an historical trend of metrics for each past or present test execution)
  • report key performance metrics on statsd capable third party systems (in conjunction with the command line option --stats-d [--stats-prefix play --stats-host http://myserver.com --stats-port 3000])

While

Here we are waiting for a message response was collected and stored to the messages variable (do you remember the already discussed MQTT subscribe command in charge of collecting/storing messages of interest?):
- comment: wait for a device response
  provider: python
  type: while
  timeout: 12
  expression: 'len(variables["messages"]) == 0'
  poll: 0.1
  sub_commands: []
You can specify a timeout (e.g., timeout: 12), a poll time (how many wait seconds between a while iteration, in such case poll: 0.1) and an optional list of while's sub commands (not needed for this example).

When the expression returns a true-ish value, the while command exits.

Does your device publish different kind of data on the same topic? Modify the while expression restricting to the messages of your interest, for example:
- comment: [4] wait for the expected device response
  provider: python
  type: while
  timeout: 12
  expression: 'len([item for item in variables["messages"] if loads(item)["measure_id"] == [124]]) == 0'
  poll: 0.1
  sub_commands: []
In the above example we are iterating over our array obtaining only the entries with a given measure_id where the loads is a builtin JSON parse (python's json.loads).
<?xml version="1.0" encoding="utf-8"?><testsuite errors="0" failures="0" name="pytest" skipped="0" tests="1" time="10.664"><testcase classname="test_on.yml" file="test_on.yml" name="test_on.yml[test_data0]" time="10.477"><properties><property name="response_time" value="7.850502967834473"/></properties><system-out>...

Assertions

And now it's assertions time:
- comment: assert that status light response was sent by the device
  provider: python
  type: assert
  expression: 'loads(variables["messages"][0])["measure_id"] == [488]'
- comment: assert that status light response was sent by the device with status ON
  provider: python
  type: assert
  expression: 'loads(variables["messages"][0])["bin_value"] == [1]'
Remember that the messages variables is an array of string messages? We are taking the first message (with variables["messages"][0] you get the first raw payload), parse the JSON payload so that assertions will be simpler (in our case loads(variables["messages"][0]) for sake of completeness) obtaining a dictionary and then assert that we have the expected values under certain dictionary keys.

As you can see pytest-play is not 100% codeless by design because it requires a very basic Python expressions knowledge, for example:
  • variables["something"] == 0
  • variables["something"] != 5
  • not variables["something"]
  • variables["a_boolean"] is True
  • variables["a_boolean"] is False
  • variables["something"] == "yet another value"
  • variables["response"]["status"] == "OK" and not variables["response"]["error_message"]
  • "VALUE" in variables["another_value"]
  • len([item for item in variables["mylist"] if item > 0) == 0
  • variables["a_string"].startswith("foo")
One line protected Python-based expressions let you express any kind of waiters/assertions without having the extend the framework's commands syntax introducing an exotic YAML-based meta language that will never be able to express all the possible use cases. The basic idea behind Python expressions is that even for non programmers it is easier to learn the basics of Python assertions instead of trying to figure out how to express assertions in an obscure meta language.

pytest-play is not related to MQTT only, it let you write actions and assertions against a real browser with Selenium, API/REST, websockets and more.

So if you have to automate a task for a device simulator, a device driver, some simple API calls with assertions, asynchronous wait for a condition is met with timeouts or interact with browsers, cross technology actions (e.g., publish a MQTT message and poll a HTTP response until something happens) and decoupled test data parametrization... even if you are not a programmer because you don't have to deal with imports, function or class definitions and it is always available if you have Docker installed.


And now you can show off with shining metrics!

Run your scenario

And finally, assuming that you are already inside your project folder, let's run our scenario using Docker (remember --network="host" if you want to send metrics to a server listening on localhost):
docker run --rm -it -v $(pwd):/src --network="host" davidemoro/pytest-play --variables env.yml --junit-xml results.xml --stats-d --stats-prefix play test_light_on.yml
The previous command will run our scenario printing the results and if there is a stats server listening on localhost metrics will be collected and you will be able to create live dashboards like the following one:
statsd/Graphene response time dashboard
and metrics are stored in the results.xml file too:
<?xml version="1.0" encoding="utf-8"?><testsuite errors="0" failures="0" name="pytest" skipped="0" tests="1" time="10.664"><testcase classname="test_on.yml" file="test_on.yml" name="test_on.yml[test_data0]" time="10.477"><properties><property name="response_time" value="7.850502967834473"/></properties><system-out>...

Sum up

This was a very long article and we talked about a lot of technologies and tools. So if you are not yet familiar with some tools or technologies it's time to read some documentation and play with some hello world examples:

Any feedback is welcome!

Do you like pytest-play?

Let's get in touch for any suggestion, contribution or comments. Contributions will be very appreciated too!
Star

April 18, 2019 07:39 AM UTC


Catalin George Festila

About psychopy tool.

A good definition for this tool can be found at the Wikipedia website: 2002: PsychoPy was originally written by Peirce as a proof of concept - that a high-level scripting language could generate experimental stimuli in real time (existing solutions, such as Psychtoolbox, had to pre-generate movies or use CLUT animation techniques). The install of this python module is very simple: C:\Python373\

April 18, 2019 06:30 AM UTC


Wingware Blog

Using Anaconda with Wing Python IDE

In this issue of Wing Tips we take a look at how to use the Anaconda Distribution of Python with Wing.

Anaconda's key advantage is its easy-to-use package management system. Anaconda comes with a large collection of third party packages that are not in an installation of Python from python.org. Many additional packages can be installed quickly and easily as needed, from the command line with conda install.

Anaconda's marketing focuses on data science and machine learning applications, but its extensive packages library makes it a good option also for other types of desktop and web development.

There is much ongoing work in the world of Python packaging but, at least for now, Anaconda seems to fail less often than other solutions for resolving dependencies and installing necessary packages automatically.

Configuring the Environment

To use the Anaconda with Wing, simply set the Python Executable used in your Wing project to Anaconda's python or python.exe. How you do this depends on whether you are creating a new project or have an existing project that you want to modify.

New Projects

To create a new Wing project that uses Anaconda, select New Project from the Project menu and then under Python Executable select Custom and enter the full path to Anaconda's python or python.exe.

In many cases, Wing will automatically find Anaconda and include it in the drop down menu to the right of the entry area that enables when Custom is chosen:

/images/blog/anaconda/new-project.gif

Shown Above: Choose "New Project" from the "Project Menu", select "Custom" for "Python Executable", find Anaconda in the drop down menu, and press "OK" to create the new project.

If Anaconda does not appear in Wing's list of available Python installations, and you don't know the full path to use, then you can start Anaconda from the command line outside of Wing and use the value printed by the following, when run interactively:

import sys
print(sys.executable)

Existing Projects

To change an existing project to use Anaconda Python, the steps are the same as above except that the change is made under Project Properties in the Project menu.

Virtualenv

If you are using virtualenv with Anaconda, use the full path of the virtualenv's python.exe or python instead. When in doubt, you can print sys.executable as already described to find the correct full path to use in Wing's Python Executable configuration.

☕ Et Voila!

In most cases, that is all that you need to do. Wing will start using your Anaconda installation immediately for source intelligence, for the next debug session, and in the integrated Python Shell after it is restarted from its Options menu.

Fixing Import Errors on Windows

On Windows, Anaconda may fail to load DLLs when its python.exe is run directly from the command line or within Wing. This is due to the fact that by default the Anaconda installer no longer sets the PATH that it needs to run, in order to avoid conflicting with different Python installations on the same system.

Simple Solution

A simple solution to fix this problem is to run Scripts\activate (located within your Anaconda installation) on the command line, and then start Wing from within the activated environment, for example with c:\Program Files (x86)\Wing Pro 7\bin\wing.exe. This causes Wing to inherit the necessary PATH that was set up by Anaconda's activate script.

This solution works well if you will be using the same Anaconda installation for all projects that you open in a session. If you change to projects that use a different Python installation, you will need to quit and restart Wing in the correct environment.

Recommended Solutions

Our recommended solutions require a bit more work up front, but once in place they automatically set the necessary PATH without the potential for polluting other Python installations with unwanted environment.

A good one-time fix is to create a small wrapper script called wrap_anaconda.bat and place the following into it:

@echo off
set ANACONDA_DIR=%USERPROFILE%\Anaconda3
set PATH=%PATH%;%ANACONDA_DIR%;%ANACONDA_DIR%\DLLS;%ANACONDA_DIR%\Library;%ANACONDA_DIR%\Library\bin;%ANACONDA_DIR%\Scripts;%ANACONDA_DIR%\mingw-w64\bin
%ANACONDA_DIR%\python.exe %*

You may need to change the value of ANACONDA_DIR to match where Anaconda is installed on your system.

Then set Python Executable in Wing's Project Properties to the full path of this batch file. This sets up the necessary environment and then runs Anaconda's Python. No other configuration is necessary, and this script can also be used on the command line or in other IDEs.

A similar solution that does not require creating a wrapper script is to set the necessary PATH in Wing's Project Properties, from the Project menu. Add the following to Environment under the Environment tab:

ANACONDA_DIR=${USERPROFILE}\Anaconda3
PATH=${PATH};$(ANACONDA_DIR);${ANACONDA_DIR}\DLLS;${ANACONDA_DIR}\Library;
  ${ANACONDA_DIR}\Library\bin;${ANACONDA_DIR}\Scripts;${ANACONDA_DIR}\mingw-w64\bin

Again, you may need to change the value of ANACONDA_DIR to match where Anaconda is installed on your system.

Both of these solutions work well if there are multiple Python installations on your system, because it ensures that the correct PATH is always set when the project is open, allowing other projects to use a different environment.



That's it for now! In next week's Wing Tips we'll get back to looking at some of the lesser-known but useful features in Wing.

April 18, 2019 01:00 AM UTC


Matt Layman

Completing Account Deactivation on Building SaaS with Python and Django

In the latest episode of Building SaaS with Python and Django, we completed the account deactivation workflow of the Django app.

This included:

The recording is available on YouTube and the full transcript is below.

April 18, 2019 12:00 AM UTC

April 17, 2019


Catalin George Festila

Update python modules of 3.73 version.

Today we tested an older tool with the new version of python 3.7.3. This is a tool that will help you update your python modules. Here's how to install: C:\Python373\Scripts>pip install pip-review Collecting pip-review ... Requirement already satisfied: pyparsing>=2.0.2 in c:\python373\lib\site-package s (from packaging->pip-review) (2.4.0) Installing collected packages: packaging, pip-review

April 17, 2019 03:24 PM UTC


Real Python

How to Work With a PDF in Python

The Portable Document Format or PDF is a file format that can be used to present and exchange documents reliably across operating systems. While the PDF was originally invented by Adobe, it is now an open standard that is maintained by the International Organization for Standardization (ISO). You can work with a preexisting PDF in Python by using the PyPDF2 package.

PyPDF2 is a pure-Python package that you can use for many different types of PDF operations.

By the end of this article, you’ll know how to do the following:

Let’s get started!

Free Bonus: Click here to get access to a chapter from Python Tricks: The Book that shows you Python's best practices with simple examples you can apply instantly to write more beautiful + Pythonic code.

History of pyPdf, PyPDF2, and PyPDF4

The original pyPdf package was released way back in 2005. The last official release of pyPdf was in 2010. After a lapse of around a year, a company called Phasit sponsored a fork of pyPdf called PyPDF2. The code was written to be backwards compatible with the original and worked quite well for several years, with its last release being in 2016.

There was a brief series of releases of a package called PyPDF3, and then the project was renamed to PyPDF4. All of these projects do pretty much the same thing, but the biggest difference between pyPdf and PyPDF2+ is that the latter versions added Python 3 support. There is a different Python 3 fork of the original pyPdf for Python 3, but that one has not been maintained for many years.

While PyPDF2 was recently abandoned, the new PyPDF4 does not have full backwards compatibility with PyPDF2. Most of the examples in this article will work perfectly fine with PyPDF4, but there are some that cannot, which is why PyPDF4 is not featured more heavily in this article. Feel free to swap out the imports for PyPDF2 with PyPDF4 and see how it works for you.

pdfrw: An Alternative

Patrick Maupin created a package called pdfrw that can do many of the same things that PyPDF2 does. You can use pdfrw for all of the same sorts of tasks that you will learn how to do in this article for PyPDF2, with the notable exception of encryption.

The biggest difference when it comes to pdfrw is that it integrates with the ReportLab package so that you can take a preexisting PDF and build a new one with ReportLab using some or all of the preexisting PDF.

Installation

Installing PyPDF2 can be done with pip or conda if you happen to be using Anaconda instead of regular Python.

Here’s how you would install PyPDF2 with pip:

$ pip install pypdf2

The install is quite quick as PyPDF2 does not have any dependencies. You will likely spend as much time downloading the package as you will installing it.

Now let’s move on and learn how to extract some information from a PDF.

How to Extract Document Information From a PDF in Python

You can use PyPDF2 to extract metadata and some text from a PDF. This can be useful when you’re doing certain types of automation on your preexisting PDF files.

Here are the current types of data that can be extracted:

You need to go find a PDF to use for this example. You can use any PDF you have handy on your machine. To make things easy, I went to Leanpub and grabbed a sample of one of my books for this exercise. The sample you want to download is called reportlab-sample.pdf.

Let’s write some code using that PDF and learn how you can get access to these attributes:

# extract_doc_info.py

from PyPDF2 import PdfFileReader

def extract_information(pdf_path):
    with open(pdf_path, 'rb') as f:
        pdf = PdfFileReader(f)
        information = pdf.getDocumentInfo()
        number_of_pages = pdf.getNumPages()

    txt = f"""
    Information about {pdf_path}: 

    Author: {information.author}
    Creator: {information.creator}
    Producer: {information.producer}
    Subject: {information.subject}
    Title: {information.title}
    Number of pages: {number_of_pages}
    """

    print(txt)
    return information

if __name__ == '__main__':
    path = 'reportlab-sample.pdf'
    extract_information(path)

Here you import PdfFileReader from the PyPDF2 package. The PdfFileReader is a class with several methods for interacting with PDF files. In this example, you call .getDocumentInfo(), which will return an instance of DocumentInformation. This contains most of the information that you’re interested in. You also call .getNumPages() on the reader object, which returns the number of pages in the document.

Note: That last code block uses Python 3’s new f-strings for string formatting. If you’d like to learn more, you can check out Python 3’s f-Strings: An Improved String Formatting Syntax (Guide).

The information variable has several instance attributes that you can use to get the rest of the metadata you want from the document. You print out that information and also return it for potential future use.

While PyPDF2 has .extractText(), which can be used on its page objects (not shown in this example), it does not work very well. Some PDFs will return text and some will return an empty string. When you want to extract text from a PDF, you should check out the PDFMiner project instead. PDFMiner is much more robust and was specifically designed for extracting text from PDFs.

Now you’re ready to learn about rotating PDF pages.

How to Rotate Pages

Occasionally, you will receive PDFs that contain pages that are in landscape mode instead of portrait mode. Or perhaps they are even upside down. This can happen when someone scans a document to PDF or email. You could print the document out and read the paper version or you can use the power of Python to rotate the offending pages.

For this example, you can go and pick out a Real Python article and print it to PDF.

Let’s learn how to rotate a few of the pages of that article with PyPDF2:

# rotate_pages.py

from PyPDF2 import PdfFileReader, PdfFileWriter

def rotate_pages(pdf_path):
    pdf_writer = PdfFileWriter()
    pdf_reader = PdfFileReader(path)
    # Rotate page 90 degrees to the right
    page_1 = pdf_reader.getPage(0).rotateClockwise(90)
    pdf_writer.addPage(page_1)
    # Rotate page 90 degrees to the left
    page_2 = pdf_reader.getPage(1).rotateCounterClockwise(90)
    pdf_writer.addPage(page_2)
    # Add a page in normal orientation
    pdf_writer.addPage(pdf_reader.getPage(2))

    with open('rotate_pages.pdf', 'wb') as fh:
        pdf_writer.write(fh)

if __name__ == '__main__':
    path = 'Jupyter_Notebook_An_Introduction.pdf'
    rotate_pages(path)

For this example, you need to import the PdfFileWriter in addition to PdfFileReader because you will need to write out a new PDF. rotate_pages() takes in the path to the PDF that you want to modify. Within that function, you will need to create a writer object that you can name pdf_writer and a reader object called pdf_reader.

Next, you can use .GetPage() to get the desired page. Here you grab page zero, which is the first page. Then you call the page object’s .rotateClockwise() method and pass in 90 degrees. Then for page two, you call .rotateCounterClockwise() and pass it 90 degrees as well.

Note: The PyPDF2 package only allows you to rotate a page in increments of 90 degrees. You will receive an AssertionError otherwise.

After each call to the rotation methods, you call .addPage(). This will add the rotated version of the page to the writer object. The last page that you add to the writer object is page 3 without any rotation done to it.

Finally you write out the new PDF using .write(). It takes a file-like object as its parameter. This new PDF will contain three pages. The first two will be rotated in opposite directions of each other and be in landscape while the third page is a normal page.

Now let’s learn how you can merge multiple PDFs into one.

How to Merge PDFs

There are many situations where you will want to take two or more PDFs and merge them together into a single PDF. For example, you might have a standard cover page that needs to go on to many types of reports. You can use Python to help you do that sort of thing.

For this example, you can open up a PDF and print a page out as a separate PDF. Then do that again, but with a different page. That will give you a couple of inputs to use for example purposes.

Let’s go ahead and write some code that you can use to merge PDFs together:

# pdf_merging.py

from PyPDF2 import PdfFileReader, PdfFileWriter

def merge_pdfs(paths, output):
    pdf_writer = PdfFileWriter()

    for path in paths:
        pdf_reader = PdfFileReader(path)
        for page in range(pdf_reader.getNumPages()):
            # Add each page to the writer object
            pdf_writer.addPage(pdf_reader.getPage(page))

    # Write out the merged PDF
    with open(output, 'wb') as out:
        pdf_writer.write(out)

if __name__ == '__main__':
    paths = ['document1.pdf', 'document2.pdf']
    merge_pdfs(paths, output='merged.pdf')

You can use merge_pdfs() when you have a list of PDFs that you want to merge together. You will also need to know where to save the result, so this function takes a list of input paths and an output path.

Then you loop over the inputs and create a PDF reader object for each of them. Next you will iterate over all the pages in the PDF file and use .addPage() to add each of those pages to itself.

Once you’re finished iterating over all of the pages of all of the PDFs in your list, you will write out the result at the end.

One item I would like to point out is that you could enhance this script a bit by adding in a range of pages to be added if you didn’t want to merge all the pages of each PDF. If you’d like a challenge, you could also create a command line interface for this function using Python’s argparse module.

Let’s find out how to do the opposite of merging!

How to Split PDFs

There are times where you might have a PDF that you need to split up into multiple PDFs. This is especially true of PDFs that contain a lot of scanned-in content, but there are a plethora of good reasons for wanting to split a PDF.

Here’s how you can use PyPDF2 to split your PDF into multiple files:

# pdf_splitting.py

from PyPDF2 import PdfFileReader, PdfFileWriter

def split(path, name_of_split):
    pdf = PdfFileReader(path)
    for page in range(pdf.getNumPages()):
        pdf_writer = PdfFileWriter()
        pdf_writer.addPage(pdf.getPage(page))

        output = f'{name_of_split}{page}.pdf'
        with open(output, 'wb') as output_pdf:
            pdf_writer.write(output_pdf)

if __name__ == '__main__':
    path = 'Jupyter_Notebook_An_Introduction.pdf'
    split(path, 'jupyter_page')

In this example, you once again create a PDF reader object and loop over its pages. For each page in the PDF, you will create a new PDF writer instance and add a single page to it. Then you will write that page out to a uniquely named file. When the script is finished running, you should have each page of the original PDF split into separate PDFs.

Now let’s take a moment to learn how you can add a watermark to your PDF.

How to Add Watermarks

Watermarks are identifying images or patterns on printed and digital documents. Some watermarks can only be seen in special lighting conditions. The reason watermarking is important is that it allows you to protect your intellectual property, such as your images or PDFs. Another term for watermark is overlay.

You can use Python and PyPDF2 to watermark your documents. You need to have a PDF that only contains your watermark image or text.

Let’s learn how to add a watermark now:

# pdf_watermarker.py

from PyPDF2 import PdfFileWriter, PdfFileReader

def create_watermark(input_pdf, output, watermark):
    watermark_obj = PdfFileReader(watermark)
    watermark_page = watermark_obj.getPage(0)

    pdf_reader = PdfFileReader(input_pdf)
    pdf_writer = PdfFileWriter()

    # Watermark all the pages
    for page in range(pdf_reader.getNumPages()):
        page = pdf_reader.getPage(page)
        page.mergePage(watermark_page)
        pdf_writer.addPage(page)

    with open(output, 'wb') as out:
        pdf_writer.write(out)

if __name__ == '__main__':
    create_watermark(
        input_pdf='Jupyter_Notebook_An_Introduction.pdf', 
        output='watermarked_notebook.pdf',
        watermark='watermark.pdf')

create_watermark() accepts three arguments:

  1. input_pdf: the PDF file path to be watermarked
  2. output: the path you want to save the watermarked version of the PDF
  3. watermark: a PDF that contains your watermark image or text

In the code, you open up the watermark PDF and grab just the first page from the document as that is where your watermark should reside. Then you create a PDF reader object using the input_pdf and a generic pdf_writer object for writing out the watermarked PDF.

The next step is to iterate over the pages in the input_pdf. This is where the magic happens. You will need to call .mergePage() and pass it the watermark_page. When you do that, it will overlay the watermark_page on top of the current page. Then you add that newly merged page to your pdf_writer object.

Finally, you write the newly watermarked PDF out to disk, and you’re done!

The last topic you will learn about is how PyPDF2 handles encryption.

How to Encrypt a PDF

PyPDF2 currently only supports adding a user password and an owner password to a preexisting PDF. In PDF land, an owner password will basically give you administrator privileges over the PDF and allow you to set permissions on the document. On the other hand, the user password just allows you to open the document.

As far as I can tell, PyPDF2 doesn’t actually allow you to set any permissions on the document even though it does allow you to set the owner password.

Regardless, this is how you can add a password, which will also inherently encrypt the PDF:

# pdf_encrypt.py

from PyPDF2 import PdfFileWriter, PdfFileReader

def add_encryption(input_pdf, output_pdf, password):
    pdf_writer = PdfFileWriter()
    pdf_reader = PdfFileReader(input_pdf)

    for page in range(pdf_reader.getNumPages()):
        pdf_writer.addPage(pdf_reader.getPage(page))

    pdf_writer.encrypt(user_pwd=password, owner_pwd=None, 
                       use_128bit=True)

    with open(output_pdf, 'wb') as fh:
        pdf_writer.write(fh)

if __name__ == '__main__':
    add_encryption(input_pdf='reportlab-sample.pdf',
                   output_pdf='reportlab-encrypted.pdf',
                   password='twofish')

add_encryption() takes in the input and output PDF paths as well as the password that you want to add to the PDF. It then opens a PDF writer and a reader object, as before. Since you will want to encrypt the entire input PDF, you will need to loop over all of its pages and add them to the writer.

The final step is to call .encrypt(), which takes the user password, the owner password, and whether or not 128-bit encryption should be added. The default is for 128-bit encryption to be turned on. If you set it to False, then 40-bit encryption will be applied instead.

Note: PDF encryption uses either RC4 or AES (Advanced Encryption Standard) to encrypt the PDF according to pdflib.com.

Just because you have encrypted your PDF does not mean it is necessarily secure. There are tools to remove passwords from PDFs. If you’d like to learn more, Carnegie Mellon University has an interesting paper on the topic.

Conclusion

The PyPDF2 package is quite useful and is usually pretty fast. You can use PyPDF2 to automate large jobs and leverage its capabilities to help you do your job better!

In this tutorial, you learned how to do the following:

Also keep an eye on the newer PyPDF4 package as it will likely replace PyPDF2 soon. You might also want to check out pdfrw, which can do many of the same things that PyPDF2 can do.

Further Reading

If you’d like to learn more about working with PDFs in Python, you should check out some of the following resources for more information:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

April 17, 2019 02:00 PM UTC


codingdirectional

Sum the factorial of a list object with python

Before we leave CodeWars for a while here is another quick solution to one of the 7 Kyu questions. The question goes like this, given a list consists of numbers, find the total of all the factorials of those numbers. For example,

sum_factorial([6, 4, 2]) //will return 746

Because of the complexity of the question we will need to create two methods, the first one will accept the list input and then the second one will do the real factorial calculation, then we will sum the factorials up in the first method.

def sum_factorial(lst):
    
    total = 0
    arr = list()
    for num in lst:
        total = factorial(num)
        arr.append(total)
        total = 0
    for num in arr:
        total += num
    return total

def factorial(num):
    sum = 0
    if num > 1:
        sum = num * factorial(num-1)
    else:
        return 1

    return sum

That will do the job! In the next coming article, I will start to write about my python journey because I believe you have seen enough coding and want some interesting python programming related story series. And next month we will start a new python project together. Follow me on Twitter or share this post if you want to. If you really like this post don’t forget to consider to donate 1 dollar to help this site out!

See you in the next post!

April 17, 2019 01:02 PM UTC


Codementor

Selenium Using Python: All You Need to Know

This article on Selenium Using Python will give you an overview on binding between selenium and python as well as locating elements in selenium using python

April 17, 2019 07:20 AM UTC


codingdirectional

Reverse a number with Python

In this snippet, we are going to create a python method to reverse the order of a number. This is one of the questions on Codewars. If you enter -123 into the method you will get -321. If you enter 1000 into the method you will get 1. Below is the entire solution.

def reverse_number(n):
    num_list = list(str(n))
    num_list.reverse()

    if "-" in num_list:
        num_list.pop(len(num_list)-1)
        num_list.insert(0, "-")

    return int("".join(num_list))

If you do follow my website you know that I always write simple python code and post them here, but starting from the next article, I will stop posting python code for a while and start to talk about my python journey and the cool software which is related to python. I hope you will appreciate this new style of writing and thus will make learning python for everyone a lot more fun than just staring at the boring and sometimes long python snippet.

Like, share or follow me on Twitter.

If you have any solution for this problem do comment below.

April 17, 2019 05:56 AM UTC


Quansight Labs Blog

MOA: a theory for composable and verifiable tensor computations

Python-moa (mathematics of arrays) is an approach to a high level tensor compiler that is based on the work of Lenore Mullins and her dissertation. A high level compiler is necessary because there are many optimizations that a low level compiler such as gcc will miss. It is trying to solve many of the same problems as other technologies such as the taco compiler and the xla compiler. However, it takes a much different approach than others guided by the following principles.

  1. What is the shape? Everything has a shape. scalars, vectors, arrays, operations, and functions.
  2. What are the given indicies and operations required to produce a given index in the result?

Having a compiler that is guided upon these principles allows for high level reductions that other compilers will miss and allows for optimization of algorithms as a whole. Keep in mind that MOA is NOT a compiler. It is a theory that guides compiler development. Since python-moa is based on theory we get unique properties that other compilers cannot guarantee:

Read more… (5 min remaining to read)

April 17, 2019 05:00 AM UTC


Ned Batchelder

Startup.py

Someone recently asked how to permanently change the prompt in the Python interactive REPL. The answer is you can point the PYTHONSTARTUP environment variable at a Python file, and that file will be executed every time you enter the interactive prompt.

I use this to import modules I often want to use, define helpers, and configure my command history.

In my .bashrc I have:

export PYTHONSTARTUP=~/.startup.py

Then my .startup.py file is:

# Ned's startup.py file, loaded into interactive python prompts.
# Has to work on both 2.x and 3.x

print("(.startup.py)")

import collections, datetime, itertools, math, os, pprint, re, sys, time
print("(imported collections, datetime, itertools, math, os, pprint, re, sys, time)")

pp = pprint.pprint

# A function for pasting code into the repl.
def paste():
    import textwrap
    exec(textwrap.dedent(sys.stdin.read()), globals())

# Readline and history support
def hook_up_history():
    try:
        # Not sure why this module is missing in some places, but deal with it.
        import readline
    except ImportError:
        print("No readline, use ^H")
    else:
        import atexit
        import os
        import rlcompleter

        history_path = os.path.expanduser(
            "~/.pyhistory{0}".format(sys.version_info[0])
        )

        def save_history(history_path=history_path):
            import readline
            readline.write_history_file(history_path)

        if os.path.exists(history_path):
            readline.read_history_file(history_path)

        atexit.register(save_history)

# Don't do history stuff if we are IPython, it has its own thing.
is_ipython = 'In' in globals()
if not is_ipython:
    hook_up_history()

# Get rid of globals we don't want.
del is_ipython, hook_up_history

A few things could us an explanation. The paste() function lets me paste code into the REPL that has blank lines in it, or is indented. Basically, I can copy code from somewhere, and use paste() to paste it into the prompt without having to fix those things first. Run paste(), then paste the code, then type an EOF indicator (Ctrl-D or Ctrl-Z, depending on your OS). The pasted code will be run as if it had been entered correctly.

The history stuff gives me history that persists across Python invocations, and keeps the Python 2 history separate from the Python 3 history. “pp” is very handy to have as a short alias.

Of course, you can put anything you want in your own .startup.py file. It’s only run for interactive sessions, not when you are running programs, so you don’t have to worry that you will corrupt important programs.

April 17, 2019 12:48 AM UTC