skip to navigation
skip to content

Planet Python

Last update: August 25, 2016 01:49 AM

August 24, 2016


PyCon Sweden

Thanks to the diver world I was able to meet like minded people...



Thanks to the diver world I was able to meet like minded people like me who use all the products

August 24, 2016 10:41 PM

WOw this is amazing!!!!



WOw this is amazing!!!!

August 24, 2016 10:36 PM


Nikola

Automating Nikola rebuilds with Travis CI

In this guide, we’ll set up Travis CI to rebuild a Nikola website and host it on GitHub Pages.

Why?

By using Travis CI to build your site, you can easily blog from anywhere you can edit text files. Which means you can blog with only a web browser and GitHub.com or try a service like Prose.io. You also won’t need to install Nikola and Python to write. Or a real computer, a mobile phone could probably access one of those services and write something.

Caveats

  • The build might take a couple minutes to finish (1:30 for the demo site; YMMV)
  • When you commit and push to GitHub, the site will be published unconditionally. If you don’t have a copy of Nikola for local use, there is no way to preview your site.

What you need

  • A computer for the initial setup that can run Nikola and the Travis CI command-line tool (written in Ruby) — you need a Unix-like system (Linux, OS X, *BSD, etc.); Windows users should try Bash on Ubuntu on Windows (available in Windows 10 starting with Anniversary Update) or a Linux virtual machine.
  • A GitHub account (free)
  • A Travis CI account linked to your GitHub account (free)

Setting up Nikola

Start by creating a new Nikola site and customizing it to your liking. Follow the Getting Started guide. You might also want to add support for other input formats, namely Markdown, but this is not a requirement (unless you want to use Prose.io).

After you’re done, you must configure deploying to GitHub in Nikola. Make your first deployment from your local computer and make sure your site works right. Don’t forget to set up .gitignore. Moreover, you must set GITHUB_COMMIT_SOURCE = False — otherwise, Travis CI will go into an infinite loop.

If everything works, you can make some change to your site (so you see that rebuilding works), but don’t commit it just yet.

Setting up Travis CI

Next, we need to set up Travis CI. To do that, make sure you have the ruby and gem tools installed on your system. If you don’t have them, install them from your OS package manager.

First, download/copy the .travis.yml file (note the dot in the beginning; the downloaded file doesn’t have it!) and adjust the real name, e-mail (used for commits; line 12/13), and the username/repo name on line 21. If you want to render your site in another language besides English, add the appropriate Ubuntu language pack to the list in this file.

travis.yml

# Travis CI config for automated Nikola blog deployments
language: python
cache: apt
sudo: false
addons:
  apt:
    packages:
    - language-pack-en-base
python:
- 3.5
before_install:
- git config --global user.name 'Travis CI'
- git config --global user.email 'travis@invalid'
- git config --global push.default 'simple'
- pip install --upgrade pip wheel
- echo -e 'Host github.com\n    StrictHostKeyChecking no' >> ~/.ssh/config
- eval "$(ssh-agent -s)"
- chmod 600 id_rsa
- ssh-add id_rsa
- git remote rm origin
- git remote add origin git@github.com:USERNAME/REPO.git
- git fetch origin master
- git branch master FETCH_HEAD
install:
- pip install 'Nikola[extras]'
script:
- nikola build && nikola github_deploy -m 'Nikola auto deploy [ci skip]'
notifications:
    email:
        on_success: change
        on_failure: always

Next, we need to generate a SSH key for Travis CI.

echo id_rsa >> .gitignore
echo id_rsa.pub >> .gitignore
ssh-keygen -C TravisCI -f id_rsa -N ''

Open the id_rsa.pub file and copy its contents. Go to GitHub → your page repository → Settings → Deploy keys and add it there. Make sure Allow write access is checked.

And now, time for our venture into the Ruby world. Install the travis gem:

gem install --user-install travis

You can then use the travis command if you have configured your $PATH for RubyGems; if you haven’t, the tool will output a path to use (eg. ~/.gem/ruby/2.0.0/bin/travis)

We’ll use the Travis CI command-line client to log in (using your GitHub password), enable the repository and encrypt our SSH key. Run the following three commands, one at a time (they are interactive):

travis login
travis enable
travis encrypt-file id_rsa --add

Commit everything to GitHub:

git add .
git commit -am "Automate builds with Travis CI"

Hopefully, Travis CI will build your site and deploy. Check the Travis CI website or your e-mail for a notification. If there are any errors, make sure you followed this guide to the letter.

August 24, 2016 06:05 PM


Python Piedmont Triad User Group

PYPTUG Monthly meeting August 30 2016 (flask-restplus, openstreetmap)

Come join PYPTUG at out next monthly meeting (August 30th 2016) to learn more about the Python programming language, modules and tools. Python is the perfect language to learn if you've never programmed before, and at the other end, it is also the perfect tool that no expert would do without. Monthly meetings are in addition to our project nights.




What

Meeting will start at 6:00pm.

We will open on an Intro to PYPTUG and on how to get started with Python, PYPTUG activities and members projects, in particular some updates on the Quadcopter project, then on to News from the community.

Then on to the main talk.

 
Main Talk: Building a RESTful API with Flask-Restplus and Swagger
by Manikandan Ramakrishnan
Bio:
Manikandan Ramakrishnan is a Data Engineer with Inmar Inc.

Abstract:
Building an API and documenting it properly is like having a cake and eating it too. Flask-RESTPlus is a great Flask extension that makes it really easy to build robust REST APIs quickly with minimal setup. With its built in Swagger integration, it is extremely simple to document the endpoints and to enforce request/response models.

Lightning talks! 


We will have some time for extemporaneous "lightning talks" of 5-10 minute duration. If you'd like to do one, some suggestions of talks were provided here, if you are looking for inspiration. Or talk about a project you are working on.
One lightning talk will cover OpenStreetMap

When

Tuesday, August 30th 2016
Meeting starts at 6:00PM

Where

Wake Forest University, close to Polo Rd and University Parkway:

Wake Forest University, Winston-Salem, NC 27109

 Map this

See also this campus map (PDF) and also the Parking Map (PDF) (Manchester hall is #20A on the parking map)

And speaking of parking:  Parking after 5pm is on a first-come, first-serve basis.  The official parking policy is:
"Visitors can park in any general parking lot on campus. Visitors should avoid reserved spaces, faculty/staff lots, fire lanes or other restricted area on campus. Frequent visitors should contact Parking and Transportation to register for a parking permit."

Mailing List


Don't forget to sign up to our user group mailing list:

https://groups.google.com/d/forum/pyptug?hl=en

It is the only step required to become a PYPTUG member.

RSVP on meetup:
https://www.meetup.com/PYthon-Piedmont-Triad-User-Group-PYPTUG/events/233095834/

August 24, 2016 03:28 PM


Machinalis

Searching for aliens

First contact

Have you ever seen through a plane’s window, or in Google Maps, some precisely defined circles on the Earth? Typically many of them, close to each other? Something like this:

Do you know what they are? If you are thinking of irrigation circles, you are wrong. Do not believe the lies of the conspirators. Those are, undoubtedly, proofs of extraterrestrial visitors on earth.

As I want to be ready for the first contact I need to know where these guys are working. It should be easy with so many satellite images at hand.

So I asked the machine learning experts around here to lend me a hand. Surprisingly, they refused. Mumbling I don’t know what about irrigation circles. Very suspicious. But something else they mentioned is that a better initial approach would be to use some computer-vision detection technique.

So, there you go. Those damn conspirators gave me the key.

Aliens

Circles detection

So now, in the Python ecosystem computer vision means OpenCV. And as it happens, this library has got the HoughCircles module which finds circles in an image. Not surprising: OpenCV has a bazillion of useful modules like that.

Lets make it happen.

First of all, I’m going to use Landsat 8 data. I’ll choose scene 229/82 for two reasons:

  • I know it includes circles, and
  • it includes my house (I want to meet the extraterrestrials living close by, not those in Area 51)
Crop of the Landsat 8 scene 229/82

Crop of the Landsat scene 229/82

The first issue I have to solve is that the HoughCircles function

finds circles in a grayscale image using a modification of the Hough transform

Well, grayscale does not exactly match multi-band Landsat 8 data, but each one of the bands can be treated as a single grayscale image. Now, a circle can express itself differently in different bands, because each band has its own way to sense the earth. So, the detector can define slightly different center coordinates for the same circle. For that reason, if two centers are too close then I’m going to keep only one of them (and discard the other as repeated).

Next, I need to determine the maximum and minimum circle’s radius. Typically, those circles sizes vary, from 400 mts up to 800 mts. That is between 13 and 26 Landsat pixels (30 mts). That’s a starting point. For the rest of the parameters I’ll just play around and try different values (not very scientific, I’m sorry).

So I run my script (which you can see in this Jupyter notebook) and without too much effort I can see that the circles are detected:

Crop of the Landsat 8 scene 229/82, with detected circles

Crop of the Landsat 8 scene 229/82, with the detected circles (the colors of the circles correspond to the size).

By changing the parameters I get to detect more (getting more false-positives) or less circles (missing some real ones). As usual, there’s a trade-off there.

Filter-out false positives

These circles only make sense in farming areas. If I configure the program not to miss real circles, then I get a lot of false positives. There are too many detected circles in cities, clouds, mountains, around rivers, etc.

Crop of the Landsat 8 scene 229/82, with detected circles and labels

That’s a whole new problem that I will need to solve. I can use vegetation indices, texture computation, machine learning. There’s a whole battery of possibilities to explore. Intuition, experience, domain knowledge, good data-science practices and ufology will help me out here. Unlucky enough, all that is out of the scope of this post.

So, my search for aliens will continue.

Mental disorder disclaimer

I hope it’s clear that all the search for aliens story is fictional. Just an amusing way to present the subject.

Once clarified that, the technical aspects in the post are still valid.

To help our friends of Kilimo, we developed an irrigation circles’ detector prototype. As hinted before, instead of approaching the problem with machine learning we attacked it using computer vision techniques.

Please, feel free to comment, contact us or whatever. I’m @py_litox on Twitter.

August 24, 2016 02:19 PM


Hynek Schlawack

Hardening Your Web Server’s SSL Ciphers

There are many wordy articles on configuring your web server’s TLS ciphers. This is not one of them. Instead I will share a configuration which is both compatible enough for today’s needs and scores a straight “A” on Qualys’s SSL Server Test.

August 24, 2016 01:49 PM


pythonwise

Generate Relation Diagram from GAE ndb Model

Working with GAE, we wanted to create relation diagram from out ndb model. By deferring the rendering to dot and using Python's reflection this became an easy task. Some links are still missing since we're using ancestor queries, but this can be handled by some class docstring syntax or just manually editing the resulting dot file.

August 24, 2016 06:51 AM


Codementor

Asynchronous Tasks using Celery with Django

celery with django

Prerequisites

Introduction

Celery is a task queue based on distributed message passing. It is used to handle long running asynchronous tasks. RabbitMQ, on the other hand, is message broker which is used by Celery to send and receive messages. Celery is perfectly suited for tasks which will take some time to execute but we don’t want our requests to be blocked while these tasks are processed. Case in point are sending emails, SMSs, making remote API calls, etc.

Target

Using the Local Django Application

Local File Directory Structure

We are going to use a Django application called mycelery. Our directory is structured in this way. The root of our django application is ‘mycelery’.

Add the lines in the settings.py file to tell the Celery that we will use RabbitMQ as out message broker and accept data in json format.

BROKER_URL = 'amqp://guest:guest@localhost//'
CELERY_ACCEPT_CONTENT = ['json']
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'

CELERY_ACCEPT_CONTENT is the type of contents allowed to receive.
CELERY_TASK_SERIALIZER is a string used for identifying default serialization method.
CELERY_RESULT_SERIALIZER is the type of result serialization format.

After adding the message broker, add the lines in a new file celery.py that tells Celery that we will use the settings in settings.py defined above.

from __future__ import absolute_import
import os
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'mycelery.settings')

from django.conf import settings
from celery import Celery

app = Celery('mycelery',
             backend='amqp',
             broker='amqp://guest@localhost//')

# This reads, e.g., CELERY_ACCEPT_CONTENT = ['json'] from settings.py:
app.config_from_object('django.conf:settings')

# For autodiscover_tasks to work, you must define your tasks in a file called 'tasks.py'.
app.autodiscover_tasks(lambda: settings.INSTALLED_APPS)

@app.task(bind=True)
def debug_task(self):
    print("Request: {0!r}".format(self.request))

Create tasks in Celery

Create an app named myceleryapp and make a tasks.py file in this app’s folder. All the tasks will be defined in this file.

In the tasks.py, we are just playing with a number so that it will be a long task for now.

from celery import shared_task,current_task
from numpy import random
from scipy.fftpack import fft

@shared_task
def fft_random(n):
	for i in range(n):
    	x = random.normal(0, 0.1, 2000)
    	y = fft(x)
    	if(i%30 == 0):
        	process_percent = int(100 * float(i) / float(n))
        	current_task.update_state(state='PROGRESS',
                                  	meta={'process_percent': process_percent})
	return random.random()

Using current_task.update_state() method, we can pass the status of the task completed to the message broker every 30 iterations.

Calling tasks in Django

To call the above task, the following lines of code is required. You can put these lines in your files from wherever you want to call them.

from .tasks import fft_random
job = fft_random.delay(int(n))

Import the method and make a call. That’s it! Now your operation is running in background.

Get the status of the task

To get the status of the task above, define the following method in views.py

# Create your views here.
def task_state(request):
    data = 'Fail'
    if request.is_ajax():
        if 'task_id' in request.POST.keys() and request.POST['task_id']:
            task_id = request.POST['task_id']
            task = AsyncResult(task_id)
            data = task.result or task.state
        else:
            data = 'No task_id in the request'
    else:
        data = 'This is not an ajax request'

    json_data = json.dumps(data)
    return HttpResponse(json_data, content_type='application/json')

Task_id is send to the method from the JavaScript. This method checks the status of the task with id task_id and return them in json format to JavaScript. We can then call this method from our JavaScript and show a corresponding bar.

Conclusion

We can use Celery to run different types of tasks from sending emails to scraping a website. In the case of long running tasks, we’d like to show the status of the task to our user, and we can use a simple JavaScript bar which calls the task status url and sets the time spent on the task. With the help of Celery, a user’s experience on Django websites can be improved dramatically.

Other tutorials you might also be interested in:

August 24, 2016 06:36 AM


Reuven Lerner

Fun with floats

I’m in Shanghai, and before I left to teach this morning, I decided to check the weather.  I knew that it would be hot, but I wanted to double-check that it wasn’t going to rain — a rarity during Israeli summers, but not too unusual in Shanghai.

I entered “shanghai weather” into DuckDuckGo, and got the following:

Never mind that it gave me a weather report for the wrong Chinese city. Take a look at the humidity reading!  What’s going on there?  Am I supposed to worry that it’s ever-so-slightly more humid than 55%?

The answer, of course, is that many programming languages have problems with floating-point numbers.  Just as there’s no terminating decimal number to represent 1/3, lots of numbers are non-terminating when you use binary, which computers do.

As a result floats are inaccurate.  Just add 0.1 + 0.2 in many programming languages, and prepare to be astonished.  Wait, you don’t want to fire up a lot of languages? Here, someone has done it for you: http://0.30000000000000004.com/ (I really love this site.)

If you’re working with numbers that are particularly sensitive, then you shouldn’t be using floats. Rather, you should use integers, or use something like Python’s decimal.Decimal, which guarantees accuracy at the expense of time and space. For example:

>> from decimal import Decimal
>>> x = Decimal('0.1')
>>> y = Decimal('0.2')
>>> x + y
Decimal('0.3')
>>> float(x+y)
0.3

Of course, you should be careful not to create your decimals with floats:

>> x = Decimal(0.1)
>>> y = Decimal(0.2)
>>> x + y
Decimal('0.3000000000000000166533453694')

Why is this the case? Let’s take a look:

>> x
Decimal('0.1000000000000000055511151231257827021181583404541015625')

>>> y
Decimal('0.200000000000000011102230246251565404236316680908203125')

So, if you’re dealing with sensitive numbers, be sure not to use floats! And if you’re going outside in Shanghai today, it might be ever-so-slightly less humid than your weather forecast reports.

The post Fun with floats appeared first on Lerner Consulting Blog.

August 24, 2016 03:16 AM

August 23, 2016


Yann Larrivée

ConFoo Montreal 2017 Calling for Papers

ConFoo Montreal: March 8th-10th 2016

ConFoo Montreal: March 8th-10th 2016

Want to get your web development ideas in front of a live audience? The call for papers for the ConFoo Montreal 2017 web developer conference is open! If you have a burning desire to hold forth about PHP, Java, Ruby, Python, or any other web development topics, we want to see your proposals. The window is open only from August 21 to September 20, 2016, so hurry. An added benefit: If your proposal is selected and you live outside of the Montreal area, we will cover your travel and hotel.

You’ll have 45 minutes to wow the crowd, with 35 minutes for your topic and 10 minutes for Q&A. We can’t wait to see your proposals. Knock us out!

ConFoo Montreal will be held on March 8-10, 2017. For those of you who already know about our conference, be aware that this annual tradition will still be running in addition to ConFoo Vancouver. Visit our site to learn more about both events.

August 23, 2016 04:30 PM


Enthought

Webinar: Introducing the NEW Python Integration Toolkit for LabVIEW

Python Integration Toolkit for LabVIEW from Enthought - Webinar
Dates: Thursday, August 25, 2016, 1:00-1:45 PM CT or Wednesday, August 31, 2016, 9:00-9:45 AM CT / 3:00-3:45 PM BT

Register now (if you can’t attend, we’ll send you a recording)

LabVIEW is a software platform made by National Instruments, used widely in industries such as semiconductors, telecommunications, aerospace, manufacturing, electronics, and automotive for test and measurement applications. Earlier this month, Enthought released the Python Integration Toolkit for LabVIEW, which is a “bridge” between the LabVIEW and Python environments.

In this webinar, we’ll demonstrate:

  1. How the new Python Integration Toolkit for LabVIEW from Enthought seamlessly brings the power of the Python ecosystem of scientific and engineering tools to LabVIEW
  2. Examples of how you can extend LabVIEW with Python, including using Python for signal and image processing, cloud computing, web dashboards, machine learning, and more
Python Integration Toolkit for LabVIEW

Quickly and efficiently access scientific and engineering tools for signal processing, machine learning, image and array processing, web and cloud connectivity, and much more. With only minimal coding on the Python side, this extraordinarily simple interface provides access to all of Python’s capabilities.

Try it with your data, free for 30 days

Download a free 30 day trial of the Python Integration Toolkit for LabVIEW from the National Instruments LabVIEW Tools Network.

How LabVIEW users can benefit from Python :

Register

August 23, 2016 02:44 PM


Mike Driscoll

Python 201 Releasing in 2 Weeks!

My second book, Python 201: Intermediate Python is releasing two weeks from today on September 6th, 2016. I just wanted to remind anyone who is interested that you can pre-order a signed paperback copy of the book here right up until release day. You will also receive a copy of the book in the following digital formats: PDF, EPUB and MOBI (Kindle format).

My book is also available for early release digitally on Gumroad and Leanpub.

Check out either of those links for more information!

August 23, 2016 01:55 PM


Chris Moffitt

Lessons Learned from Analyze This! Challenge

Introduction

I recently had the pleasure of participating in a crowd-sourced data science competition in the Twin Cities called Analyze This! I wanted to share some of my thoughts and experiences on the process - especially how this challenge helped me learn more about how to apply data science theory and open source tools to real world problems.

I also hope this article can encourage others in the Twin Cities to participate in future events. For those of you not in the Minneapolis-St. Paul metro area, then maybe this can help motivate you to start up a similar event in your area. I thoroughly enjoyed the experience and got a lot out of the process. Read on for more details.

Background

Analyze This! is a crowd-source data science competition. Think of it as a mashup of an in person Kaggle competition, plus a data science user group mixed in with a little bit of Toastmasters. The result is a really cool series of events that accomplishes two things. First, it helps individuals build their data science skills on a real world problem. Secondly it helps an organization get insight into their data challenges.

The process starts when the Analyze This organizers partner with a host organization to identify a real-world problem that could be solved with data analytics. Once the problem is defined and the data gathered, it is turned over to a group of eager volunteers who spend a couple of months analyzing the data and developing insights and actionable next steps for solving the defined problem. Along the way, there are periodic group meetings where experts share their knowledge on a specific data science topic. The process culminates in a friendly competition where the teams present the results to the group. The host organization and event organizers judge the results based on a pre-defined rubric. A final winning team typically wins a modest financial reward (more than enough for a dinner but not enough to pay the rent for the month).

In this specific case, Analyze This! partnered with the Science Museum of Minnesota to gather and de-identify data related to membership activity. The goal of the project was to develop a model to predict whether or not a member would renew their membership and use this information to increase membership renewal rates for the museum.

Observations

As I mentioned earlier, the entire process was really interesting, challenging and even fun. Here are a few of my learnings and observations that I took away from the events that I can apply to future challenges and real life data science projects:

The Best Way to Learn is By Doing

I came into the event with a good familiarity with python but not as much real-world experience with machine learning algorithms. I have spent time learning about various ML tools and have played with some models but at some point, you can only look at Titanic or Iris data sets for so long!

The best analogy I can think of is that it is like taking a math class and looking at the solution in the answer key. You may think you understand how to get to the solution but “thinking you can” is never the same as spending time wrestling with the problem on your own and “knowing you can.”

Because the data set was brand new to us all, it forced us all to dig in and struggle with understanding the data and divining insights. There was no “right answer” that we could look at in advance. The only way to gain insight was to wrestle with the data and figure it out with your team. This meant reasearching the problem and developing working code examples.

Descriptive Analytics Still Matter

Many people have seen some variation of the chart that looks like this:

Descriptive to Prescriptive Analytics

source

Because I wanted to learn about ML, I tended to jump ahead in this chart and go straight for the predictive model without spending time on the descriptive analytics. After sitting through the presentations from each group, I realized that I should have spent more time looking at the data from a standard stats perspective and use some of those basic insights to help inform the eventual model. I also realized that the descriptive analytics were really useful in helping to tell the story around the final recommendations. In other words, it’s not all about a fancy predictive model.

Speaking of Models

In this specific case, all the teams developed models to predict a members likely renewal based on various traits. Across the group, the teams tried pretty much any model that is available in the python or R ecosystem. Despite how fancy everyone tried to get, a simple logistic regression model won out. I think the moral of the story is that sometimes a relatively simple model with good results beats a complex model with a marginally better results.

Python Served Me Well

My team (and several others) used python for much of the analysis. In addition to pandas and scikit-learn, I leveraged jupyter notebooks for a lot of exploratory data analysis. Of course, I used conda to setup a python3 virtual environment for this project which made it really nice to play around with various tools without messing up other python environments.

I experimented with folium to visualize geographic data. I found it fairly simple to build interesting, data-rich maps with this tool. If there is interesting, I may write about it more in the future.

I also took TPOT for a spin. It worked well and I think it generated some useful models. We eventually used a different model but I plan to continue learning more about TPOT and look forward to seeing how it continues to improve.

Presenting Results is a Skill

One of the key aspects of the Analyze This challenge that I enjoyed is that each team had to present their solutions during a 10 minute presentation. Because we had all spent time with the same data set, we were all starting from a similar baseline. It was extremely interesting to see how the teams presented their results and used various visualizations to explain their process and provide actionable insight. We all tended to identify several common features that drove renewal rates but it was interesting to see how different teams attacked a similar problem from different angles.

Several of the groups scored results that were very close to each other. The scoring rubric factored in more weight on the the presentation than on the actual model results which I think is a wise move and separates this challenge from something like a kaggle competition.

The other interesting/challenging part of presenting the results was the wide range of knowledge in the room. On one end of the spectrum there were PhD’s, Data Scientists and very experienced statisticians. On the other end were people just learning some of these concepts and had little or no training in data science or statistics. This wide spread of knowledge meant that each group had to think carefully about how to present their information in a way that would appeal to the entire audience.

Community is important

One of the goals of the Analyze This organizers is to foster a community for data science learning. I felt like they did a really nice job of making everyone feel welcome. Even though this was a competition, the more experienced members were supportive of the less knowledgeable individuals. There was a lot of formal and informal knowledge sharing.

I have seen several variations of this venn diagram to describe data scientists.

Data Science Venn Diagram

During the competition, I noticed that the pool of participants fit into many of these categories. We had everything from people that do data science as a full time job to web developers to people just interested in learning more. The really great thing was that it was a supportive group and people were willing to share knowledge and help others.

My experience with this cross-section of people reinforced my belief that the “perfect data scientist” does lie at the intersection of these multiple functions.

I hope the Analyze This! group can continue building on the success of this competition and encourage even more people to participate in the process.

Networking

I am really excited about the people I met through this process. I ended up working with a great group of guys on my team. I also got to learn a little more about how others are doing Data Science in the Twin Cities. Of course, I used this as an opportunity to expand my network.

Conclustion

I am sure you can tell that I’m a big supporter of Analyze This!, its mission and the people that are leading the program. Pedro, Kevin, Jake, Mitchell, Daniel and Justin did a tremendous amount of work to make this happen. I am very impressed with their knowledge and dedication to make this happen. They are doing this to help others and build up the community. They receive no pay for the countless hours of work they put into it.

The process was a great way to learn more about data science and hone my skills in a real-world test. I got to meet some smart people and help a worthy organization (hopefully) improve their membership renewal rates. I highly encourage those of you that might be at FARCON 2016, to stop by and listen to the group presentations. I also encourage you to look for the next challenge and find some time to participate. I am confident you will find it time well spent.

August 23, 2016 12:25 PM


Andrew Dalke

Fragment by copy and trim

This is part of a series of essays on how to fragment a molecular graph using RDKit. These are meant to describe the low-level steps that go into fragmentation, and the ways to think about testing. To fragment a molecule in RDKit, use FragmentOnBonds(). The essays in this series are:

Why another fragmentation method?

How do you tell if an algorithm is correct? Sometimes you can inspect the code. More often you have test cases where you know the correct answer. For complex algorithms, especially when using real-world data, testing is even harder. It's not always clear where edge cases are, and often the real-world is often more complicated than you think.

Another validation solution is to write two different algorithms which accomplish the same thing, and compare them with each other. I did that in my essay about different ways to evaluate the parity of a permutation order. That cross-validation testing was good enough, because it's easy to compute all possible input orders.

In the previous essay in this series, I cross-validated that fragment_chiral() and fragment_on_bonds() give the same results. They did. That isn't surprising because they implement essentially the same algorithm. Not only that, but I looked at the FragmentOnBonds() implementation when I found out my first fragment_chiral() didn't give the same answers. (I didn't know I needed to ClearComputedProps() after I edit the structure.)

Cross-validation works better when the two algorithm are less similar. The way you think about a problem and implement a solution are connected. Two similar solutions may be the result of thinking about things the same way, and come with the same blind spots. (This could also be a design goal, if you want the new implementation to be "bug compatible" with the old.)

I've made enough hints that you likely know what I'm leading up to. RDKit's FragmentOnBonds() code currently (in mid-2016) converts directional bonds (the E-Z stereochemistry around double bonds into non-directional single bonds.

>>> from rdkit import Chem
>>> mol = Chem.MolFromSmiles("F/C=C/F")
>>> fragmented_mol = Chem.FragmentOnBonds(mol, [0], dummyLabels=[(0,0)])
>>> Chem.MolToSmiles(fragmented_mol, isomericSmiles=True)
'[*]C=CF.[*]F'
when I expected the last line to be
'[*]/C=C/F.[*]F'

This may or may not be what you want. Just in case, I've submitted a patch to discuss changing it in RDKit, so your experience in the future might be different than mine in the past.

I bring it up now to show how cross-validation of similar algorithms isn't always enough to figure out if the algorithms do as you expect.

Validate though re-assembly

As a reminder, in an earlier essay I did a much stronger validation test. I took the fragments, reassembled them via SMILES syntax manipulation, and compared the canonicalized result to the canonicalized input structure. These should always match, to within limits of the underlying chemistry support.

They didn't, because my original code didn't support directional bonds. Instead, in that essay I pre-processed the input SMILES string to replace the "/" and "\" characters with a "-". That gives the equivalent chemistry without directional bonds.

I could use that validation technique again, but I want to explore more of what it means to cross-validate by using a different fragmentation implementations.

Fragment by 'copy and trim'

Dave Cosgrove, on the RDKit mailing list, described the solution he uses to cut a bond and preserve chirality in toolkits which don't provide a high-level equivalent to FragmentOnBonds(). This method only works on non-ring bonds, which isn't a problem for me as I want to fragment R-groups. (As I discovered while doing the final testing, my implementation of the method also assumes there is only one molecule in the structure.)

To make fragment from a molecule, trim away all the atoms which aren't part of the fragment, except for the atom connected to the fragment. Convert that atom into the wildcard atom "*" by setting its atomic number, charge, and number of hydrogens to 0, and setting the chirality and aromaticity flags correctly. The image below shows the steps applied to cutting the ester off of aspirin:

steps to trim a fragment of aspirin

This method never touches the bonds connected to a chiral atom of a fragment, so chirality or other bond properties (like bond direction!) aren't accidently removed by breaking the bond and making a new one in its place.

A downside is that I need to copy and trim twice in order to get both fragments after cutting a chain bond. I can predict now the performance won't be as fast as FragmentOnBonds().

Let's try it out.

Trim using the interactive shell

I'll use aspirin as my reference structure and cut the bond to the ester (the R-OCOCH3), which is the bond between atoms 2 and 3.

aspirin with atoms labeled by index
I want the result to be the same as using FragmentOnBonds() to cut that bond:
>>> from rdkit import Chem
>>> aspirin = Chem.MolFromSmiles("O=C(Oc1ccccc1C(=O)O)C")
>>> bond_idx = aspirin.GetBondBetweenAtoms(2, 3).GetIdx()
>>> bond_idx
2
>>> fragmented_mol = Chem.FragmentOnBonds(aspirin, [bond_idx], dummyLabels=[(0,0)])
>>> Chem.MolToSmiles(fragmented_mol, isomericSmiles=True)
'[*]OC(C)=O.[*]c1ccccc1C(=O)O'

I'll need a way to find all of the atoms which are on the ester side of the bond. I don't think there's a toolkit function to get all the atoms on one side of a bond, so I'll write one myself.

Atom 2 is the connection point on the ester to the rest of the aspirin structure. I'll start with that atom, show that it's an oxygen, and get information about its bonds:

>>> atom = aspirin.GetAtomWithIdx(2)
>>> atom.GetSymbol()
'O'
>>> [bond.GetBondType() for bond in atom.GetBonds()]
[rdkit.Chem.rdchem.BondType.SINGLE, rdkit.Chem.rdchem.BondType.SINGLE]
I'll use the bond object to get to the atom on the other side of the bond from atom 2, and report more detailed information about the atom at the other end of each bond:
>>> for bond in atom.GetBonds():
...   other_atom = bond.GetOtherAtom(atom)
...   print(other_atom.GetIdx(), bond.GetBondType(), other_atom.GetSymbol(), other_atom.GetIsAromatic())
... 
1 SINGLE C False
3 SINGLE C True
This says that atom 1 is an aliphatic carbon, and atom 3 is an aromatic carbon. This matches the structure diagram I used earlier.

I know that atom 3 is the other end of the bond which was cut, so I can stop there. What about atom 1? What is it connected to?

>>> next_atom = aspirin.GetAtomWithIdx(1)
>>> [b.GetOtherAtom(next_atom).GetIdx() for b in next_atom.GetBonds()]
[0, 2, 12]
I've already processed atom 2, so only atoms 0 and 12 are new.

What are those atoms connected to?

>>> next_atom = aspirin.GetAtomWithIdx(0)
>>> [b.GetOtherAtom(next_atom).GetIdx() for b in next_atom.GetBonds()]
[1]
>>> next_atom = aspirin.GetAtomWithIdx(12)
>>> [b.GetOtherAtom(next_atom).GetIdx() for b in next_atom.GetBonds()]
[1]
Only atom 1, which I've already seen before so I've found all of the atoms on the ester side of the bond.

Semi-automated depth-first search

There are more atoms on the other side of the bond, so I'll have to automate it somewhat. This sort of graph search is often implemented as a depth-first search (DFS) or a breadth-first search (BFS). Python lists easily work as stacks, which makes DFS slightly more natural to implement.

To give you an idea of what I'm about to explain, here's an animation of the aspirin depth-first search:

depth-first search of an apsirin fragment

If there is the chance of a ring (called "cycle" in graph theory), then there are two ways to get to the same atom. To prevent processing the same atom multiple times, I'll set up set named "seen_ids", which contains the atoms indices I've before. Since it's the start, I've only seen the two atoms which are at the end of the bond to cut.

>>> seen_ids = set([2, 3])
I also store a list of the atoms which are in this side of the bond, at this point is only atom 3, and a stack (a Python list) of the atoms I need to process further:
>>> atom_ids = [3]
>>> stack = [aspirin.GetAtomWithIdx(3)]
I'll start by getting the top of the stack (the last element of the list) and looking at its neighbors:
>>> atom = stack.pop()
>>> neighbor_ids = [b.GetOtherAtom(atom).GetIdx() for b in atom.GetBonds()]
>>> neighbor_ids
[2, 4, 8]
I'll need to filter out the neighbors I've already seen. I'll write a helper function for that. I'll need both the atom objects and the atom ids, so this will return two lists, one for each:
def get_atoms_to_visit(atom, seen_ids):
    neighbor_ids = []
    neighbor_atoms = []
    for bond in atom.GetBonds():
        neighbor_atom = bond.GetOtherAtom(atom)
        neighbor_id = neighbor_atom.GetIdx()
        if neighbor_id not in seen_ids:
            neighbor_ids.append(neighbor_id) 
            neighbor_atoms.append(neighbor_atom)
    
    return neighbor_ids, neighbor_atoms
and use it:
>>> atom_ids_to_visit, atoms_to_visit = get_atoms_to_visit(atom, seen_ids)
>>> atom_ids_to_visit
[4, 8]
>>> [a.GetSymbol() for a in atoms_to_visit]
['C', 'C']
These atom ids are connected to the original atom, so add them all it to the atom_ids.
>>> atom_ids.extend(atom_ids_to_visit) 
>>> atom_ids
[3, 4, 8]
These haven't been seen before, so add the atoms (not the atom ids) to the stack of items to process:
>>> stack.extend(atoms_to_visit)
>>> stack
[<rdkit.Chem.rdchem.Atom object at 0x111bf37b0>, <rdkit.Chem.rdchem.Atom object at 0x111bf3760>]
>>> [a.GetIdx() for a in stack]
[4, 8]
Now that they've been seen, I don't need to process them again, so add the new ids to the set of seen ids:
>>> seen_ids.update(atom_ids_to_visit)
>>> seen_ids
{8, 2, 3, 4}

That's the basic loop. The stack isn't empty, so pop the top of the stack to get a new atom object, and repeat the earlier steps:

>>> atom = stack.pop()
>>> atom.GetIdx()
8
>>> atom_ids_to_visit, atoms_to_visit = get_atoms_to_visit(atom, seen_ids)
>>> atom_ids_to_visit
[7, 9]
>>> atom_ids.extend(atom_ids_to_visit) 
>>> atom_ids 
[3, 4, 8, 7, 9]
>>> stack.extend(atoms_to_visit)
>>> [a.GetIdx() for a in stack]
[4, 7, 9]
>>> seen_ids.update(atom_ids_to_visit)
>>> seen_ids
{2, 3, 4, 7, 8, 9}
Then another loop. I'll stick it in a while loop to process the stack until its empty, and I'll only have it print the new atom ids added to the list:
>>> while stack:
...     atom = stack.pop()
...     atom_ids_to_visit, atoms_to_visit = get_atoms_to_visit(atom, seen_ids)
...     atom_ids.extend(atom_ids_to_visit)
...     stack.extend(atoms_to_visit)
...     seen_ids.update(atom_ids_to_visit)
...     print("added atoms", atom_ids_to_visit)
... 
added atoms [10, 11]
added atoms []
added atoms []
added atoms [6]
added atoms [5]
added atoms []
added atoms []
The final set of atoms is:
>>> atom_ids
[3, 4, 8, 7, 9, 10, 11, 6, 5]

Fully automated - fragment_trim()

In this section I'll put all the pieces together to make fragment_trim(), a function which implements the trim algorithm.

Find atoms to delete

The trim algorithm needs to know which atoms to delete. That will be all of the atoms on one side of the bond, except for the atom which is at the end of the bond. (I'll turn that atom into a "*" atom.) I'll use the above code to get the atom ids to delete:

# Look for neighbor atoms of 'atom' which haven't been seen before  
def get_atoms_to_visit(atom, seen_ids):
    neighbor_ids = []
    neighbor_atoms = []
    for bond in atom.GetBonds():
        neighbor_atom = bond.GetOtherAtom(atom)
        neighbor_id = neighbor_atom.GetIdx()
        if neighbor_id not in seen_ids:
            neighbor_ids.append(neighbor_id) 
            neighbor_atoms.append(neighbor_atom)
    
    return neighbor_ids, neighbor_atoms

# Find all of the atoms connected to 'start_atom_id', except for 'ignore_atom_id'
def find_atom_ids_to_delete(mol, start_atom_id, ignore_atom_id):
    seen_ids = {start_atom_id, ignore_atom_id}
    atom_ids = [] # Will only delete the newly found atoms, not the start atom
    stack = [mol.GetAtomWithIdx(start_atom_id)]
    
    # Use a depth-first search to find the connected atoms
    while stack:
        atom = stack.pop()
        atom_ids_to_visit, atoms_to_visit = get_atoms_to_visit(atom, seen_ids)
        atom_ids.extend(atom_ids_to_visit)
        stack.extend(atoms_to_visit)
        seen_ids.update(atom_ids_to_visit)
    
    return atom_ids
and try it out:
>>> from rdkit import Chem
>>> aspirin = Chem.MolFromSmiles("O=C(Oc1ccccc1C(=O)O)C")
>>> find_atom_ids_to_delete(aspirin, 3, 2)
[1, 0, 12]
>>> find_atom_ids_to_delete(aspirin, 2, 3)
[4, 8, 7, 9, 10, 11, 6, 5]
That looks right. (I left out the other testing I did to get this right.)

Trim atoms

The trim function has two parts. The first is to turn the attachment point into the wildcard atom "*", which I'll do by removing any charges, hydrogens, chirality, or other atom properties. This atom will not be deleted. The second is to remove the other atoms on that side of the bond.

Removing atoms from a molecule is slightly tricky. The atom indices are reset after each atom is removed. (While I often use a variable name like "atom_id", the atom id is not a permanent id, but simply the current atom index.) When atom "i" is deleted, all of the atoms with a larger id "j" get the new id "j-1". That is, the larger ids are all shifted down one to fill in the gap.

This becomes a problem because I have a list of ids to delete, but as I delete them the real ids change. For example, if I delete atoms 0 and 1 from a two atom molecule, in that order, I will get an exception:

>>> rwmol = Chem.RWMol(Chem.MolFromSmiles("C#N"))
>>> rwmol.RemoveAtom(0)
>>> rwmol.RemoveAtom(1)
[13:54:02] 

****
Range Error
idx
Violation occurred on line 155 in file /Users/dalke/ftps/rdkit-Release_2016_03_1/Code/GraphMol/ROMol.cpp
Failed Expression: 1 <= 0
****

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Range Error
	idx
	Violation occurred on line 155 in file Code/GraphMol/ROMol.cpp
	Failed Expression: 1 <= 0
	RDKIT: 2016.03.1
	BOOST: 1_60
This is because after RemoveAtom(0) the old atom with id 1 gets the id 0. As another example, if I remove atoms 0 and 1 from a three atom molecule, I'll end up with what was originally the second atom:
>>> rwmol = Chem.RWMol(Chem.MolFromSmiles("CON"))
>>> rwmol.RemoveAtom(0)
>>> rwmol.RemoveAtom(1)
>>> Chem.MolToSmiles(rwmol)
'O'
Originally the ids were [C=0, O=1, N=2]. The RemoveAtom(0) removed the C and renumbered the remaining atoms, giving [O=0, N=1]. The RemoveAtom(1) then removed the N, leaving [O=0] as the remaining atom.

While you could pay attention to the renumbering and adjust the index of the atom to delete, the far easier solution is to sort the atom ids, and delete starting from the largest id.

Here is the function to turn the attachment atom into a wildcard and to delete the other atoms:

def trim_atoms(mol, wildcard_atom_id, atom_ids_to_delete):
    rwmol = Chem.RWMol(mol)
    # Change the attachment atom into a wildcard ("*") atom
    atom = rwmol.GetAtomWithIdx(wildcard_atom_id)
    atom.SetAtomicNum(0)
    atom.SetIsotope(0)
    atom.SetFormalCharge(0)
    atom.SetIsAromatic(False)
    atom.SetNumExplicitHs(0)
    
    # Remove the other atoms.
    # RDKit will renumber after every RemoveAtom() so
    # remove from highest to lowest atom index.
    # If not sorted, may get a "RuntimeError: Range Error"
    for atom_id in sorted(atom_ids_to_delete, reverse=True):
        rwmol.RemoveAtom(atom_id)
    
    # Don't need to ClearComputedProps() because that will
    # be done by CombineMols()
    return rwmol.GetMol()
(If this were a more general-purpose function I would need to call ClearComputedProps(), which is needed after any molecular editing in RDKit. I don't need to do this here because it will be done during CombineMols(), which is coming up.)

How did I figure out which properties needed to be changed to make a wildcard atom? I tracked the down during cross-validation tests of an earlier version of this algorithm.

I'll try out the new code:

>>> fragment1 = trim_atoms(aspirin, 2, [1, 0, 12])
>>> Chem.MolToSmiles(fragment1, isomericSmiles=True)
'[*]c1ccccc1C(=O)O'
>>> fragment2 = trim_atoms(aspirin, 3, [4, 8, 7, 9, 10, 11, 6, 5])
>>> Chem.MolToSmiles(fragment2, isomericSmiles=True)
'[*]OC(C)=O'

fragment_trim()

The high-level fragment code is straight-forward, and I don't think it requires explanation:

def fragment_trim(mol, atom1, atom2):
    # Find the fragments on either side of a bond
    delete_atom_ids1 = find_atom_ids_to_delete(mol, atom1, atom2)
    fragment1 = trim_atoms(mol, atom1, delete_atom_ids1)
    
    delete_atom_ids2 = find_atom_ids_to_delete(mol, atom2, atom1)
    fragment2 = trim_atoms(mol, atom2, delete_atom_ids2)
    
    # Merge the two fragments.
    # (CombineMols() does the required ClearComputedProps())
    return Chem.CombineMols(fragment1, fragment2)

I'll check that it does what I think it should do:

>>> new_mol = fragment_trim(aspirin, 2, 3)
>>> Chem.MolToSmiles(new_mol)
'[*]OC(C)=O.[*]c1ccccc1C(=O)O'
This matches the FragmentOnBonds() output at the start of this essay.

Testing

I used the same cross-validation method I did in my previous essay. I parsed structures from ChEMBL and checked that fragment_trim() produces the same results as FragmentOnBonds(). It doesn't. The first failure mode surprised me:

fragment_trim() only works with a single connected structure

Here's one of the failure reports:

Mismatch: record: 61 id: CHEMBL1203109 smiles: Cl.CNC(=O)c1cc(C(O)CNC(C)CCc2ccc3c(c2)OCO3)ccc1O cut: 1 2
   smiles_trim: Cl.Cl.[*]C.[*]NC(=O)c1cc(C(O)CNC(C)CCc2ccc3c(c2)OCO3)ccc1O
smiles_on_bond: Cl.[*]C.[*]NC(=O)c1cc(C(O)CNC(C)CCc2ccc3c(c2)OCO3)ccc1O
Somehow the "Cl" appears twice. After thinking about it I realized that my implementation of the copy and trim algorithm assumes that the input structure is strongly connected, that is that it only contains a single molecule. Each copy contains a chlorine atom, so when I CombineMols() I end up with two chlorine atoms.

I could fix my implementation to handle this, but as I'm only interested in cross-validation, the easier solution is to never process a structure with multiple molecules. I decided that if the input SMILES contains multiple molecules (that is, if the SMILES contains the dot disconnection symbol ".") then I would choose the component which has the most characters, using the following:

if "." in smiles:
    # The fragment_trim() method only works on a connected molecule.
    # Arbitrarily pick the component with the longest SMILES string.
    smiles = max(smiles.split("."), key=len)

Directional bonds

After 1000 records and 31450 tests there were 362 mismatches. All of them appeared to be related to directional bonds, like this mismatch report:

Mismatch: record: 124 id: CHEMBL445177 smiles: CCCCCC/C=C\CCCCCCCc1cccc(O)c1C(=O)O cut: 7 8
   smiles_trim: [*]/C=C\CCCCCC.[*]CCCCCCCc1cccc(O)c1C(=O)O
smiles_on_bond: [*]C=CCCCCCC.[*]CCCCCCCc1cccc(O)c1C(=O)O
I knew this would happen coming in to this essay, but it's still nice to see. It demonstrates that the fragment_trim() is a better way to cross-validate FragmentOnBonds() than my earlier fragment_chiral() algorithm.

It's possible that other mismatches are hidden in the hundreds of reports, so I removed directional bonds from the SMILES and processed again:

if "/" in smiles:
    smiles = smiles.replace("/", "")
if "\\" in smiles:
    smiles = smiles.replace("\\", "")
That testing shows I forgot to include "atom.SetIsotope(0)" when I converted the attachment atom into a wildcard atom. Without it, the algorithm turned a "[18F]" into a "[18*]". It surprised me because I copied it from working code I made for an earlier version of the algorithm. Looking into it, I realized I left that line out during the copy&paste. This is why regression tests using diverse real-world data are important.

I stopped the testing after 100,00 records. Here is the final status line:
Processed 100000 records, 100000 molecules, 1120618 tests, 0 mismatches.  T_trim: 2846.79 T_on_bond 258.18
While trim code is slower than using the built-in function, the point of this exercise isn't to figure out the fastest implementation but to have a better way to cross-validate the original code, which it has done.

The final code

Here is the final code:

from __future__ import print_function

import datetime
import time

from rdkit import Chem

# Look for neighbor atoms of 'atom' which haven't been seen before  
def get_atoms_to_visit(atom, seen_ids):
    neighbor_ids = []
    neighbor_atoms = []
    for bond in atom.GetBonds():
        neighbor_atom = bond.GetOtherAtom(atom)
        neighbor_id = neighbor_atom.GetIdx()
        if neighbor_id not in seen_ids:
            neighbor_ids.append(neighbor_id) 
            neighbor_atoms.append(neighbor_atom)
    
    return neighbor_ids, neighbor_atoms

# Find all of the atoms connected to 'start_atom_id', except for 'ignore_atom_id'
def find_atom_ids_to_delete(mol, start_atom_id, ignore_atom_id):
    seen_ids = {ignore_atom_id, start_atom_id}
    atom_ids = [] # Will only delete the newly found atoms, not the start atom
    stack = [mol.GetAtomWithIdx(start_atom_id)]

    # Use a depth-first search to find the connected atoms
    while stack:
        atom = stack.pop()
        atom_ids_to_visit, atoms_to_visit = get_atoms_to_visit(atom, seen_ids)
        atom_ids.extend(atom_ids_to_visit)
        stack.extend(atoms_to_visit)
        seen_ids.update(atom_ids_to_visit)
    
    return atom_ids

def trim_atoms(mol, wildcard_atom_id, atom_ids_to_delete):
    rwmol = Chem.RWMol(mol)
    # Change the attachment atom into a wildcard ("*") atom
    atom = rwmol.GetAtomWithIdx(wildcard_atom_id)
    atom.SetAtomicNum(0)
    atom.SetIsotope(0)
    atom.SetFormalCharge(0)
    atom.SetIsAromatic(False)
    atom.SetNumExplicitHs(0)
    
    # Remove the other atoms.
    # RDKit will renumber after every RemoveAtom() so
    # remove from highest to lowest atom index.
    # If not sorted, may get a "RuntimeError: Range Error"
    for atom_id in sorted(atom_ids_to_delete, reverse=True):
        rwmol.RemoveAtom(atom_id)
    
    # Don't need to ClearComputedProps() because that will
    # be done by CombineMols()
    return rwmol.GetMol()
    
def fragment_trim(mol, atom1, atom2):
    # Find the fragments on either side of a bond
    delete_atom_ids1 = find_atom_ids_to_delete(mol, atom1, atom2)
    fragment1 = trim_atoms(mol, atom1, delete_atom_ids1)
    
    delete_atom_ids2 = find_atom_ids_to_delete(mol, atom2, atom1)
    fragment2 = trim_atoms(mol, atom2, delete_atom_ids2)

    # Merge the two fragments.
    # (CombineMols() does the required ClearComputedProps())
    return Chem.CombineMols(fragment1, fragment2)

####### Cross-validation code

def fragment_on_bond(mol, atom1, atom2):
    bond = mol.GetBondBetweenAtoms(atom1, atom2)
    return Chem.FragmentOnBonds(mol, [bond.GetIdx()], dummyLabels=[(0, 0)])


def read_records(filename):
    with open(filename) as infile:
        for recno, line in enumerate(infile):
            smiles, id = line.split()
            if "*" in smiles:
                continue
            yield recno, id, smiles
            
def cross_validate():
    # Match a single bond not in a ring
    BOND_SMARTS = "[!#0;!#1]-!@[!#0;!#1]"
    single_bond_pat = Chem.MolFromSmarts(BOND_SMARTS)
    
    num_mols = num_tests = num_mismatches = 0
    time_trim = time_on_bond = 0.0

    # Helper function to print status information. Since
    # the function is defined inside of another function,
    # it has access to variables in the outer function.
    def print_status():
        print("Processed {} records, {} molecules, {} tests, {} mismatches."
              "  T_trim: {:.2f} T_on_bond {:.2f}"
              .format(recno, num_mols, num_tests, num_mismatches,
                      time_trim, time_on_bond))
    
    filename = "/Users/dalke/databases/chembl_20_rdkit.smi"
    start_time = datetime.datetime.now()
    for recno, id, smiles in read_records(filename):
        if recno % 100 == 0:
            print_status()

        if "." in smiles:
            # The fragment_trim() method only works on a connected molecule.
            # Arbitrarily pick the component with the longest SMILES string.
            smiles = max(smiles.split("."), key=len)
            
        ## Remove directional bonds to see if there are mismatches
        ## which are not caused by directional bonds.
        #if "/" in smiles:
        #    smiles = smiles.replace("/", "")
        #if "\\" in smiles:
        #    smiles = smiles.replace("\\", "")
        
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            continue
        num_mols += 1

        # Find all of the single bonds
        matches = mol.GetSubstructMatches(single_bond_pat)

        for a1, a2 in matches:
            # For each pair of atom indices, split on that bond
            # and determine the canonicalized fragmented molecule.
            # Do it for both fragment_trim() ...
            t1 = time.time()
            mol_trim = fragment_trim(mol, a1, a2)
            t2 = time.time()
            time_trim += t2-t1
            smiles_trim = Chem.MolToSmiles(mol_trim, isomericSmiles=True)

            # ... and for fragment_on_bond()
            t1 = time.time()
            mol_on_bond = fragment_on_bond(mol, a1, a2)
            t2 = time.time()
            time_on_bond += t2-t1
            smiles_on_bond = Chem.MolToSmiles(mol_on_bond, isomericSmiles=True)

            # Report any mismatches and keep on going.
            num_tests += 1
            if smiles_trim != smiles_on_bond:
                print("Mismatch: record: {} id: {} smiles: {} cut: {} {}".format(
                      recno, id, smiles, a1, a2))
                print("   smiles_trim:", smiles_trim)
                print("smiles_on_bond:", smiles_on_bond)
                num_mismatches += 1
    
    end_time = datetime.datetime.now()
    elapsed_time = str(end_time - start_time)
    elapsed_time = elapsed_time.partition(".")[0]  # don't display subseconds
    
    print("Done.")
    print_status()
    print("elapsed time:", elapsed_time)

if __name__ == "__main__":
    cross_validate()

August 23, 2016 12:00 PM


Python Software Foundation

The Python Software Foundation is seeking a blogger!

Interview prominent Pythonistas, connect with the community, expand your circle of friends and learn about events in the Python world!


The Python Software Foundation (PSF) is seeking a blogger to contribute to the PSF blog located at http://pyfound.blogspot.com/. As a PSF blogger you will work with the PSF Communication Officers to brainstorm blog content, communicate activities, and provide updates on content progression. Example of content includes PSF community service awardee profiles, details about global Python events and PSF grants, or recent goings-on within the PSF itself. One goal of the 2016 - 2017 PSF Board of Directors is to increase transparency around PSF activities by curating more frequent blog content.


The Python Software Foundation is a 501(c)(3) non-profit corporation that holds the intellectual property rights behind the Python programming language. We also run the North American PyCon conference annually, support other Python conferences/workshops around the world, and fund Python related development with our grants program. To see more info on our grants program, please read: https://www.python.org/psf/grants/.


Job Description
----------------------------


Needed Experience
----------------------------


Bloggers for the Python Software Foundation receive a fixed fee per post they produce.

To apply please email two to three examples of recent articles (e.g. personal blog, contribution to professional publication) as well as a brief account of your writing experience to psf-blog@python.org. If you have questions, direct them to psf-blog@python.org as well. UPDATE: The Python Software Foundation will be accepting applications until 11:59:59pm Pacific Standard Time Thursday, August 25th, 2016.

August 23, 2016 10:26 AM


S. Lott

On Generator Functions, Yield and Return

Here's the question, lightly edited to remove the garbage. (Sometimes I'm charitable and call it "rambling". Today, I'm not feeling charitable about the garbage writing style filled with strange assumptions instead of questions.)

someone asked if you could have both a yield and a return in the same ... function/iterator. There was debate and the senior people said, let's actually write code. They wrote code and proved that couldn't have both a yield and a return in the same ... function/iterator. .... 
The meeting moved on w/out anyone asking the why question. Why doesn't it make sense to have both a yield and a return. ...

The impact of the yield statement can be confusing. Writing code to mess around with it was somehow unhelpful. And the shocking "proved that couldn't have both a yield and a return in the same ... function" is a serious problem.

(Or a seriously incorrect summary of the conversation; a very real possibility considering the garbage-encrusted email. Or a sign that Python 3 isn't widely-enough used and the emil omitted this essential fact. And yes, I'm being overly sensitive to the garbage. But there's a better way to come to grips with reality and it involves asking questions and parsing details instead of repeating assumptions and writing garbage.)

An example


>>> def silly(n, stop=None):
for i in range(n):
if i == stop: return
yield i


>>> list(silly(5))
[0, 1, 2, 3, 4]
>>> list(silly(5, stop=3))
[0, 1, 2]

This works in both Python 3.5.1 and 2.7.10.

Some discussion

A function with no yield is a conventional function: the parameters from some domain are mapped to a return value in some range. Each mapping is a single evaluation of the function with concrete argument values.

A function with a yield statement becomes an iterable generator of (potentially) multiple values. The return statement changes it's behavior slightly. It no longer defines the one (and only) return value. In a generator function (one that has a yield) the return statement can be thought of as if it raised the StopIteration exception as a way to exit from the generator.

As can be seen in the example above, both statements are in one function. They both work to provide expected semantics.

The code which gets an error is this:

>>> def silly(n, stop=3):
... for i in range(n):
... if i == step: return "boom!"
... yield i


The "why?" question is should -- perhaps -- be obvious at this point.  The return raises an exception; it doesn't provide a value.

The topic, however, remains troubling. The phrase "have both a yield and a return" is bothersome because it fails to recognize that the yield statement has a special role. The yield statement transforms the semantics of the function to make it into a different object with similar syntax.

It's not a matter of having them "both". It's matter of having a return in a generator. This is an entirely separate and trivial-to-answer question.

A Long Useless Rant

The email seems to contain an implicit assumption. It's the notion that programming language semantics are subtle and slippery things. And even "senior people" can't get it right. Because all programming languages (other then the email sender's personal favorite) are inherently confusing. The confusion cannot be avoided.

There are times when programming language semantics are confusing.  For example, the ++ operator in C is confusing. Nothing can be done about that. The original definition was tied to the PDP-11 machine instructions. Since then... Well.... Aspects of the generated code are formally undefined.  Many languages have one or more places where the semantics are "undefined" or only defined by example.

This is not one of those times.

Here's the real problem I have with the garbage aspect of the email.

If you bring personal baggage to the conversation -- i.e., assumptions based on a comparison between some other language and Python -- confusion will erupt all over the place. Languages are different. Concepts don't map from language to language very well. Yes, there are simple abstract principles which have different concrete realizations in different languages. But among the various concrete realizations, there may not be a simple mapping.

It's essential to discard all knowledge of all previous favorite programming languages when learning a new language.

I'll repeat that for the author of the email.

Don't Go To The Well With A Full Bucket.

You won't get anything.

In this specific case, the notion of "function" in Python is expanded to include two superficially similar things. The syntax is nearly identical. But the behaviors are remarkably different. It's essential to grasp the idea that the two things are different, and can't be casually lumped together as "function/iterator".

The crux of the email appears to be a failure to get the Python language rules in a profound way. 

August 23, 2016 08:00 AM

August 22, 2016


PyCharm

Next Batch of In-Depth Screencasts: VCS

In January we recorded a series of screencasts that introduced the major features of PyCharm — an overview, installation, editing, running, debugging, etc. In April we did our first “in-depth” screencast, focusing on testing.

We’re happy to announce the next in-depth screencasts, and note the plural: this is a 3-part series on using version control systems (VCS) in PyCharm. The JetBrains IDEs, including PyCharm, have worked very hard over the years to put a productive, easy UI atop version control. These videos spend more time showing the features that are available.

In the first video we cover Getting Started with VCS, going over: Versioning without a VCS using Local History, Setting Up a Local Git Repository, Uploading to GitHub, and Getting a Checkout from GitHub.

Next, we go over Core VCS: Color Coding, Adding Files, Committing Changes, Using Refactor, then Diff, History, Revert, and Update.

The last video concentrates on Branching, Merging, and Pushing:

We’re very happy to release these in-depth screencasts, which we’ve been working on for some time and were highly requested. And again, if you have any topics that you’d like to see get expanded screencast attention, let us know.

August 22, 2016 03:45 PM


Python 4 Kids

Python for Kids: Python 3 – Project 7

Using Python 3 in Project 7 of Python For Kids For Dummies

In this post I talk about the changes that need to be made to the code of
Project 7 in order for it to work with Python 3. Most of the code in project 7 will work without changes. However, in a lot of cases what Python outputs in Python 3 is different from the output in Python 2.7 and it’s those changes that I am mainly identifying below.

Disclaimer

Some people want to use my book Python for Kids for Dummies to learn Python 3.
I am working through the code in the existing book, highlighting changes from Python 2 to Python 3 and providing code that will work in Python 3. If you are using Python 2.7 you can ignore this post. This post is only for people who want to take the code in my book Python for Kids for Dummies and run it in Python 3.

Page 178

All code on this page is the same, and all outputs from the code is the same in Python 3 as in Python 2.7

Page 179-180
The code and syntax on these pages is the same, but the outputs are
different in Python 3. This is because, in Python 3,
the range builtin does not create a list as in Python 2.7 (see Python3/Project 5)

#Python 2.7 code: 

>>> test_string = '0123456789'
>>> test_string[0:1]
'0'
>>> test_string[1:3]
'12'
>>> # range(10) is a list of the numbers from 0 to 9 inclusive
>>> range(10)[0:1]
[0]
>>> range(10)[1:3]
[1, 2]
>>> test_string[:3]
'012'
>>> test_string[3:]
'3456789'
#Python 3 code:
>>> test_string = '0123456789'
>>> test_string[0:1]
'0'
>>> test_string[1:3]
'12'
>>> # range(10) is no longer a list. It's a.... errr... range 
>>> # so the [:] operator slices. You can use list()
>>> # to see what it corresponds to.
>>> range(10)[0:1]
range(0, 1)
>>> list(range(10)[0:1])
[0]
>>> # note same output as in Python 2.7 from range(10)[0:1]
>>> range(10)[1:3]
range(1, 3)
>>> list(range(10)[1:3])
[1, 2]
>>> test_string[:3]
'012'
>>> test_string[3:]
'3456789'
>>> 

Pages 180-196
All code on this page is the same, and all outputs from the code is the same in Python 3 as in Python 2.7

Page 199

The code on this page uses raw_input, which has been renamed to input in Python 3.
Either change all occurrences or add a line

raw_input = input 

at the start of the relevant code.

#Python 2.7 code: 
#### Input and Output Section
message = raw_input("Type the message to process below:\n")
ciphertext = encrypt_msg(message, ENCRYPTION_DICT)
plaintext = decrypt_msg(message, DECRYPTION_DICT)
print("This message encrypts to")
print(ciphertext)
print # just a blank line for readability
print("This message decrypts to")
print(plaintext)

#Python 3 code: 
#### Input and Output Section
message = input("Type the message to process below:\n")
ciphertext = encrypt_msg(message, ENCRYPTION_DICT)
plaintext = decrypt_msg(message, DECRYPTION_DICT)
print("This message encrypts to")
print(ciphertext)
print # just a blank line for readability
print("This message decrypts to")
print(plaintext)


>>> ================================== RESTART ================================
>>>
Type the message you'd like to encrypt below:
I love learning Python. And my teacher is smelly. And I shouldn't start a sentence with and.
This message encrypts to
F|ilsb|ib7okfkd|Mvqelk+|xka|jv|qb79ebo|fp|pjbiiv+||xka|F|pelriak$q|pq7oq|7|pbkqbk9b|tfqe|7ka+
This message decrypts to
L2oryh2ohduqlqj2SBwkrq;2Dqg2pB2whdfkhu2lv2vphooB;22Dqg2L2vkrxogq*w2vwduw2d2vhqwhqfh2zlwk2dqg;

                             
>>> ================================== RESTART ================================
>>>
Type the message you'd like to encrypt below:
F|ilsb|ib7okfkd|Mvqelk+|xka|jv|qb79ebo|fp|pjbiiv+||xka|F|pelriak$q|pq7oq|7|pbkqbk9b|tfqe|7ka+
This message encrypts to
C_fip8_f84lhcha_Jsnbih(_uh7_gs_n846b8l_cm_mg8ffs(__uh7_C_mbiof7h!n_mn4ln_4_m8hn8h68_qcnb_4h7(
This message decrypts to
I love learning Python. And my teacher is smelly.  And I shouldn't start a sentence with and.

Page 200

This code works as is in both Python 2.7 and Python 3. However, the way the open() builtin works has changed in Python 3 and this will cause some issues in later projects. In Python 3 open() has the same syntax as in Python 2.7, but uses a different way to get data out of the file and into your hands. As a practical matter this means that some Python 2.7 code will sometimes cause problems when run in Python 3. If you run into such a problem (open code that works in Python 2.7 but fails in Python 3), the first thing to try is to add the binary modifier. So,
instead of ‘r’ or ‘w’ for read and write use ‘rb’ or ‘wb’. This code doesn’t need it, but a later project will.

Page 201

The code on this page is the same, but the outputs are different in Python 3. Python 3 returns how much data has
been written (in this case, 36)

#Python 2.7 code: 
>>> file_object = open('p4k_test.py','w')
>>> text = "print('Hello from within the file')\n" # watch the " and '
>>> file_object.write(text) # writes it to the file
>>> file_object.write(text) # writes it to the file again!
>>> file_object.close() # finished with file, so close it
#Python 3 code:
>>> file_object = open('p4k_test.py','w')
>>> text = "print('Hello from within the file')\n" # watch the " and '
>>> file_object.write(text) # writes it to the file
36
>>> file_object.write(text) # writes it to the file again!
36
>>> file_object.close() # finished with file, so close it

Pages 202 and 203

All code on these page is the same, and all outputs from the code is the same in Python 3 as in Python 2.7

Page 204

All code on this page is the same in Python 3 as in Python 2.7, but some of the outputs are different
A line has been added in the Python 3 code below to shown that the file_object has been closed after leaving the
with clause – this was explicit in the print out in Python 2.7.

>>> #Python 2.7
>>> with open('p4k_test.py','r') as file_object:
        print(file_object.read())
        
        
print('Hello from within the file')
print('Hello from within the file')

>>> file_object
<closed file 'p4k_test.py', mode 'r' at 0xf7fed0>
       
       
>>> #Python 3
>>> with open('p4k_test.py','r') as file_object:
        print(file_object.read())

        
print('Hello from within the file')
print('Hello from within the file')

>>> file_object  # output different from 2.7
<_io.TextIOWrapper name='p4k_test.py' mode='r' encoding='UTF-8'>

>>> file_object.closed # but the file is still closed
True

Page 205

All code on this page is the same in Python 3 as in Python 2.7, but some of the outputs are different
A line has been added in the Python 3 code below to shown that the file_object has been closed after leaving the
with clause – this was explicit in the print out in Python 2.7. Also, because Python 3 uses a different way
of getting information from a file it is identified differently. In Python 2.7 it’s call a file – pretty straight
forward. in Python 3 it’s called a _io.TextIOWrapper. Not as enlightening, but a student doesn’t need to worry about
this difference in detail.

>>> #Python 2.7
>>> with open('testfile2','w') as a:
        a.write('stuff')
        
>>> with open('testfile2','r') as a,
         open('p4k_test.py','r') as b:
        print(a.read())
        print(b.read())
        
stuff
print('Hello from within the file')
print('Hello from within the file')

>>> a
<closed file 'testfile2', mode 'r' at 0xf6e540>
>>> b
<closed file 'p4k_test.py', mode 'r' at 0xef4ed0>


       
>>> #Python 3
>>> with open('testfile2','r') as a, open('p4k_test.py','r') as b:
	print(a.read())
	print(b.read())

	
stuff
print('Hello from within the file')
print('Hello from within the file')

>>> a
<_io.TextIOWrapper name='testfile2' mode='r' encoding='UTF-8'>
>>> a.closed
True
>>> b
<_io.TextIOWrapper name='p4k_test.py' mode='r' encoding='UTF-8'>
>>> b.closed
True                                  

Page 207

All code on this page is the same in Python 3 as in Python 2.7, but some of the outputs are different
(the write method returns the amount of data written and this is output in the console in Python 3)

>>> #Python 2.7
>>> INPUT_FILE_NAME = "cryptopy_input.txt"
>>> with open(INPUT_FILE_NAME,'w') as input_file:
        input_file.write('This is some test text')

       
>>> #Python 3
>>> INPUT_FILE_NAME = "cryptopy_input.txt"
>>> with open(INPUT_FILE_NAME,'w') as input_file:
	input_file.write('This is some test text')

	
22

# this code is the same in Python 2.7 and Python 3:

INPUT_FILE_NAME = “cryptopy_input.txt”
OUTPUT_FILE_NAME = “cryptopy_output.txt”

Page 208-218
All code on this page is the same, and all outputs from the code is the same in Python 3 as in Python 2.7


August 22, 2016 01:25 PM


Mike Driscoll

ANN: The wxPython Cookbook Kickstarter

Several years ago, the readers of this blog asked me to take some of my articles and turn them into a cookbook on wxPython. I have finally decided to do just that. I am including over 50 recipes that I am currently editing to make them more consistent and updating them to be compatible with the latest versions of wxPython. I currently have nearly 300 pages of content!

To help fund the initial production of the book, I am doing a fun little Kickstarter campaign for the project. The money raised will be used for the unique perks offered in the campaign as well as various production costs related to the book, such as ISBN acquisition, artwork, software expenses, advertising, etc.

In case you don’t know what wxPython is, the wxPython package is a popular toolkit for creating cross platform desktop user interfaces. It works on Windows, Mac and Linux with little to no modification of your code base.

The examples in my book will work with both wxPython 3.0.2 Classic as well as wxPython Phoenix, which is the bleeding edge of wxPython that supports Python 3. If I discover any recipes that do not work with Phoenix, they will be clearly marked or there will be an alternative example given that does work.

Here is a listing of the current set of recipes in no particular order:

  • Adding / Removing Widgets Dynamically
  • How to put a background image on a panel
  • Binding Multiple Widgets to the Same Handler
  • Catching Exceptions from Anywhere
  • wxPython’s Context Managers
  • Converting wx.DateTime to Python datetime
  • Creating an About Box
  • How to Create a Login Dialog
  • How to Create a “Dark Mode”
  • Generating a Dialog from a Config File
  • How to Disable a Wizard’s Next Button
  • How to Use Drag and Drop
  • How to Drag and Drop a File From Your App to the OS
  • How to Edit Your GUI Interactively Using reload()
  • How to Embed an Image in the Title Bar
  • Extracting XML from the RichTextCtrl
  • How to Fade-in a Frame / Dialog
  • How to Fire Multiple Event Handlers
  • Making your Frame Maximize or Full Screen
  • Using wx.Frame Styles
  • Get the Event Name Instead of an Integer
  • How to Get Children Widgets from a Sizer
  • How to Use the Clipboard
  • Catching Key and Char Events
  • Learning How Focus Works in wxPython
  • Making Your Text Flash
  • Minimizing to System Tray
  • Using ObjectListView instead of ListCtrl
  • Making a Panel Self-Destruct
  • How to Switch Between Panels
  • wxPython: Using PyDispatcher instead of Pubsub
  • Creating Graphs with PyPlot
  • Redirect Python’s Logging Module to a TextCtrl
  • Redirecting stdout / stderr
  • Resetting the Background Color
  • Saving Data to a Config File
  • How to Take a Screenshot of Your wxPython App and Print it
  • Creating a Simple Notebook
  • Ensuring Only One Instance Per Frame
  • Storing Objects in ComboBox or ListBox Widgets
  • Syncing Scrolling Between Two Grids
  • Creating Taskbar Icons
  • A wx.Timer Tutorial
  • How to Update a Progress Bar from a Thread
  • Updating Your Application with Esky
  • Creating a URL Shortener
  • Using Threads in wxPython
  • How to Create a Grid in XRC
  • An Introduction to XRC

 Note: Recipe names and order are subject to change

wxpython_cookbook_final

August 22, 2016 01:03 PM


Doug Hellmann

random — Pseudorandom Number Generators — PyMOTW 3

The random module provides a fast pseudorandom number generator based on the Mersenne Twister algorithm. Originally developed to produce inputs for Monte Carlo simulations, Mersenne Twister generates numbers with nearly uniform distribution and a large period, making it suited for a wide range of applications. Read more… This post is part of the Python Module … Continue reading random — Pseudorandom Number Generators — PyMOTW 3

August 22, 2016 01:00 PM


Mike Driscoll

PyDev of the Week: Michele Simionato

This week we welcome Michele Simionato as our PyDev of the Week! Michele is an expert on Python and is known for his paper on Python’s Method Resolution Order which was published to the Python website by Guide Van Rossum and for a very interesting series of articles on metaclasses that he wrote with David Mertz. They are a bit difficult to find, but you can read the first one of the 3-part series here. He is one of the founders of the Italian Python Association. Michele has a Ph. D. about the Renormalization of Quantum Field Theory. Let’s take a few moments

simionato

Can you tell us a little about yourself (hobbies, education, etc):

I originally come from academia and I have a Ph. D. in Theoretical Physics. Then I worked for several years for an Analytics firm (stock market risk assessment) and now I am back to science, doing earthquake simulations at GEM.

Why did you start using Python?

It happened in 2002. At the time I was a postdoc researcher in the department of Physics and Astronomy at Pittsburgh University. I decided that it was time to learn some modern programming language, in view of a possible career outside academia. After reading a couple of long books by Bruce Eckel, first about C++ and then about Java, I decided that I did not want to program in either of them. I was in doubt between Ruby and Python, but Python won because of the better scientific libraries and of the more pragmatic philosophy.

What other programming languages do you know and which is your favorite?

A long time ago I started with Basic and Pascal and later on I worked a lot with Mathematica and Maple. After learning Python I become interested in functional languages and I know decently well Scheme, so much that I nearly wrote a book on it, The Adventures of a Pythonista in Schemeland. In my daily job I had to work a lot with SQL (which I like enough) and with Javascript (which I don’t like).

What projects are you working on now?

In the last three years I have become the maintainer and the main developer of the OpenQuake Engine, which is a computational engine to produce earthquake hazard and risk assessment. It means that after several years of being a database and Web developer I have become a scientific programmer and now I spend most of my time doing performance analysis of massive distributed calculations. I also keep a blog where I document my fighting with the engine.

Which Python libraries are your favorite (core or 3rd party)?

numpy is a really well thought library, an essential tool for people doing scientific applications.

Where do you see Python going as a programming language?

Honestly, I am unsure about where Python as a language is going, and I am not even convinced I like the recent trend. Certainly I would like for the language to become simpler, that’s what attracted me to Python in the first place, and instead I see several things that are becoming increasingly complicated. Also, there are now other languages out there that are worth of note, whereas for years Python had no competitors. If you want to know, I am thinking about Go for server side programming and about Julia for scientific programming. Both of them looks really interesting even if I have not programmed in either of them. Python should not rest thinking that it is best than Java and C++ (an easy win) and instead consider seriously the new contenders.

What is your take on the current market for Python programmers?

It has always been a good market for Python programmers (at least from when I started, 14 years ago) and now it is even more so. I get offers for Python jobs nearly every week.

Is there anything else you’d like to say?

My tagline at the EuroPython 2016 conference was “Legacy code warrior”: that reflects my daily job in the last 10 years at least. You can see a video of my talk here:

Thanks for doing the interview!

August 22, 2016 12:30 PM


Hynek Schlawack

Better Python Object Serialization

The Python standard library is full of underappreciated gems. One of them allows for simple and elegant function dispatching based on argument types. This makes it perfect for serialization of arbitrary objects – for example to JSON in web APIs and structured logs.

August 22, 2016 12:30 PM


"Menno's Musings"

IMAPClient 1.0.2

IMAPClient 1.0.2 is out! This is release comes with a few small fixes and tweaks, as well as a some documentation improvements.

Specifically:

  • There's now an explicit check that the pyOpenSSL version that IMAPClient is seeing is sufficient. This is to help with situations (typically on OS X) where the (old) system pyOpenSSL takes precedence over the version that IMAPClient needs. Use of virtualenvs is highly recommended.
  • Python 3.5 is now officially supported and tested against.
  • setup.py can now be used even if it's not in the current directory.
  • Handling of RFC2822 group address syntax has been documented.
  • The INI file format used by the live tests and interactive shell has finally been documented.
  • Links to ReadTheDocs now go to readthedocs.io
  • The project README has been arranged so that all the essentials are right at the top.

I announced that the project would be moving to Git and Github some time ago and this is finally happening. This release will be the last release where the project is on Bitbucket.

August 22, 2016 12:14 PM


Python Software Foundation

"In the beginning, there was one Python group": Community Service Award Recipient Stéphane Wirtel

In the beginning, there was one Python group in Charleroi, the P3B (Python Blanc Bleu Belge)”, Stéphane Wirtel recalls. This first Python group was led by Denis Frère and Olivier Laurent. Together with Aragne, the first company using Python in Belgium, and Marc-Andre Lemburg the P3B helped organize the inaugural EuroPython in 2002. Over the years, however, the P3B disbanded. “Other groups have organized some events for the Belgian community”, Wirtel adds. These groups, however, have faced some of the organizing challenges as the P3B.


As a Python user of 15 years, Wirtel contemplated what would be the best way to sustainably build the Belgian Python community. He originally wanted to organize the first PyCon in Belgium but eventually decided to invest his energies elsewhere. Ludovic Gasc, Fabien Benetou and Wirtel began by hosting Python events in Brussels and Charleroi.


The Python Software Foundation has awarded Wirtel in the second quarter of 2016 with a Community Service Award in recognition for his work organizing a Python User Group in Belgium, for his continued work creating marketing material for the PSF, for his continued outreach efforts with spreading the PSF's mission.


IMG_0137.JPG


Outreach at PythonFOSDEM and Building a New Python Belgium Community


“FOSDEM is one of the most important events in the European development community with over 5,000 attendees participating in a weekend event” Wirtel explains. The importance of FOSDEM led Wirtel and Gasc to create the first PythonFOSDEM.


Since 2013 Wirtel has organized the PythonFOSDEM devroom, expanding the room from 80 participants in 2013 to well over 400 participants in 2016. Benetou, who volunteered in the FOSDEM 2016 Python devroom, remembers the excitement in the room explaining that the room was filled within five minutes of opening.



With the growth of the PythonFOSDEM devroom and the return of AFPyro-BE, led by Ludovic Gasc, Wirtel has been focusing efforts on building the belgium@python.org mailing list and registering a Belgian Python website. “Stéphane continues to challenge us to organize bigger and bigger events”, Gasc comments on Wirtel. His continued work promoting Python in Belgium is helping provide the building blocks for a new Python community in Belgium.


Python Software Foundation Marketing Work Group


As a member of the PSF marketing work group, Wirtel is an ongoing voice in the discussion and creation of PSF marketing materials. Wirtel helped with flyer development and distribution for  PythonFOSDEM 2015, PyCon North America 2015 and PyCon Ireland 2015.


Inspiring new CPython contributors at EuroPython 2016


Wirtel spoke at EuroPython this year on the topic of CPython. His talk, titled “Exploring our Python Interpreter”, outlined the basics of how the Python interpreter works. Of notable importance Wirtel framed his talk for CPython novices, pointing out documentation on where to get started and resources for how to find CPython core mentors. Wirtel also pointed to a CPython patch he recently submitted for the __ltrace__ feature. With his patch you can compile Python to easily show the Python bytecode generated, a significant suggestion for beginners to be able to play with in the Python interpreter. Here is an example of his feature in action:


>>> __ltrace__ = None  # To enable tracing
>>> print("hello")     # Now, shows bytecodes run
0: LOAD_NAME, 0
push <built-in function print>
2: LOAD_CONST, 0
push 'hello'
4: CALL_FUNCTION, 1
ext_pop 'hello'
hello
ext_pop <built-in function print>
push None
6: PRINT_EXPR
pop None
8: LOAD_CONST, 1
push None
10: RETURN_VALUE
pop None


Some of Wirtel’s other projects includes working as a former core developer of Odoo from 2008 to 2014, an open source enterprise resource planner which is built with PostgreSQL and CPython. He has contributed to Gunicorn and is working to contribute more to CPython. Wirtel is also a member of the EuroPython Society and the Association Francophone de Python (AFPy) as well as a PSF Fellow. Wirtel has supported EuroPython the last two years as a volunteer and as a working group member too.

Wirtel’s passion for bringing new Pythonistas into the fold, be it through the creation and continued organizing of the PythonFOSDEM Devroom or the proliferation of CPython knowledge and tools particularly suited for the beginner, is profound. As he noted in his EuroPython 2016 talk, he was completely new to CPython at the 2014 PyCon North America at Montreal! “Simply put Wirtel is the type of person who gets things done” Benetou says, adding that “these are the type of people that inspire me, that I like”.

August 22, 2016 10:30 AM


Will Kahn-Greene

pyvideo last thoughts

What is pyvideo?

pyvideo.org is an index of Python-related conference and user-group videos on the Internet. Saw a session you liked and want to share it? It's likely you can find it, watch it, and share it with pyvideo.org.

This is my last update. pyvideo.org is now in new and better hands and will continue going forward.

Read more… (2 mins to read)

August 22, 2016 04:00 AM