skip to navigation
skip to content

Planet Python

Last update: December 05, 2016 04:48 PM

December 05, 2016


Weekly Python Chat

Reading & Writing Files

Files in Python: read them, write them, append them. Questions encouraged.

December 05, 2016 04:30 PM


Possbility and Probability

Improving your python: using pylint and flake8 in emacs

In a previous post I mentioned an issue I had with some python code that failed in a way I hadn’t expected. Long story short, I was in the wrong which does happen from time to time. Aside from the … Continue reading

The post Improving your python: using pylint and flake8 in emacs appeared first on Possibility and Probability.

December 05, 2016 02:30 PM


Doug Hellmann

getpass — Secure Password Prompt — PyMOTW 3

Many programs that interact with the user via the terminal need to ask the user for password values without showing what the user types on the screen. The getpass module provides a portable way to handle such password prompts securely. Read more… This post is part of the Python Module of the Week series for … Continue reading getpass — Secure Password Prompt — PyMOTW 3

December 05, 2016 02:00 PM


Mike Driscoll

PyDev of the Week: Alex Clark

This week we welcome Alex Clark (@aclark4life) as our PyDev of the Week! Alex is the person the pillow project, which is a “friendly” fork of the Python Imaging Library. Alex writes at his own Python blog which is certainly worth checking out. He is also the author of Plone 3.3 Site Administration. You might also want to check our Alex’s Github profile to see what projects are worthy of his attention. Let’s take some time to get to know him!

aclark-jobs

Can you tell us a little about yourself (hobbies, education, etc):

I’ve been calling myself a “Python Web Developer” professionally for the last 10 years or so, occasionally adding “& Musician” to the end; so music is a big part of what I do, or at least think about doing, outside of Python. I’m from Baltimore, MD, USA and I attended both Loyola High School and University (College when I attended) in Maryland. It took about 10 years, but I earned my Bachelor of Science Degree in Computer Science in May of 1998. Not all who wander are lost, but in my case I had no idea what I wanted to do until I saw a room full of Digital DECStations in the UNIX lab; at that moment, I decided I wanted to be in that room and I switched my major from Accounting to Computer Science.

Why did you start using Python?

An easy one! In the early 2000s I started working at NIH (National Institutes of Health in Bethesda, MD, USA) and my small group needed a website. A neighboring group was using Plone and “the rest is history”. I built their site using Plone (Python-based CMS) and I learned all the Python I needed to know to accomplish that task. I remember at the time being impressed by the aesthetics of Python, and I began researching the language and community. To my surprise, the first PyCon had just taken place in nearby Washington, DC.

What other programming languages do you know and which is your favorite?

I’ve learned (or have been forced to learn) a lot of languages, starting in school where I took a Programming Languages class. In most classes, C was the focus but in the Programming Languages class we were forced (ahem) to look at other languages like Lisp, Smalltalk, etc. I also had to take an x86 Assembly class. After school, I learned and used Perl and my first job or two. At this point, I’d definitely say Python is my favorite language, both in terms of the language itself and surrounding community. I’m not terribly active in the community these days, but I like knowing it’s there and I do follow events. Most importantly, I still oversee the operation of DC Python, which I very much enjoy doing. Lastly, I’ll mention JavaScript because as a Python Web Developer I can’t ignore it. I’m fascinated by it, and I dabble, but is not my favorite yet and I’m not sure it ever will be; again, the Python aesthetic is superior to most if not all languages, including JavaScript. Oh, and if you can call it a language: I enjoy the occasional shell programming very much (bash, zsh).

What projects are you working on now?

I’m glad you asked. A few years back (2012) I got carried away and ended up building my first start up business: “Python Packages”; I was selected to exhibit at PyCon 2012’s Startup Row and it was very exciting and fun. Unfortunately, the start up failed but the desire to build a software-as-a-service (SaaS)revenue-generating business has not gone away. At the end of 2015, I got tired of using and paying for Harvest Time Tracking & Invoicing service. It’s a very nice service, but there were some minor annoyances that gradually began to consume major space in my head. So I rolled my own with Django & Postgresql based on my exported data. I started using it “for real” in January 2016 and it’s been amazing. The fact that I can send “Harvest-like” invoices to my clients using software I (and others, via open source) created still blows my mind. Earlier this month, I started updating the software to make it more usable by folks who aren’t me. I hope to begin selling it on January 1, 2017.

I’ll mention two other projects, too:


The latter is marked “do not use”, but I’m about to flip the switch on that one and make it available for consumption (and I hope folks try, like and contribute).

Which Python libraries are your favorite (core or 3rd party)?

I’m glad you asked this one, too. I tend to enjoy frameworks and utilities more than libraries, so I’ll list these and hope they somewhat answer the question:

Where do you see Python going as a programming language?

I hope Python 3 “wins” by 2020 (if not sooner). I feel like most of the drama associated with the decision to build a new non-backwards-compat version of Python has passed, and I look forward to the day when all vendors include Python 3 by default.

What is your take on the current market for Python programmers?

I don’t keep too many tabs on this (other than continuously trying to be hired as a Python Web Developer for my standard rate!) but I’m hoping that the market for Python Data Scientists continues to grow. As someone involved in the process of spending lots of $$$ on Mathworks’ MATLAB licensing each year, I hope to see Python continue to compete in this area; I want to reach the point where I can easily present Python as an alternative to MATLAB.

Is there anything else you’d like to say?

Thank you for the PyDev of the week series and for inviting me to participate!

December 05, 2016 01:30 PM


hypothesis.works articles

Integrated vs type based shrinking

One of the big differences between Hypothesis and Haskell QuickCheck is how shrinking is handled.

Specifically, the way shrinking is handled in Haskell QuickCheck is bad and the way it works in Hypothesis (and also in test.check and EQC) is good. If you’re implementing a property based testing system, you should use the good way. If you’re using a property based testing system and it doesn’t use the good way, you need to know about this failure mode.

Unfortunately many (and possibly most) implementations of property based testing are based on Haskell’s QuickCheck and so make the same mistake.

Read more...

December 05, 2016 10:00 AM


Jean-Paul Calderone

Twisted Web in 60 Seconds: HTTP/2

Hello, hello. It's been a long time since the last entry in the "Twisted Web in 60 Seconds" series. If you're new to the series and you like this post, I recommend going back and reading the older posts as well.

In this entry, I'll show you how to enable HTTP/2 for your Twisted Web-based site. HTTP/2 is the latest entry in the HTTP family of protocols. It builds on work from Google and others to improve performance (and other) shortcomings of the older HTTP/1.x protocols in wide-spread use today.

Twisted implements HTTP/2 support by building on the general-purpose H2 Python library. In fact, all you have to do to have HTTP/2 for your Twisted Web-based site (starting in Twisted 16.3.0) is install the dependencies:

$ pip install twisted[http2]

Your TLS-based site is now available via HTTP/2! A future version of Twisted will likely extend this to non-TLS sites (which requires the Upgrade: h2c handshake) with no further effort on your part.

If you like this post or others in the Twisted Web in 60 Seconds series, let me know with a donation! I'll post another entry in the series when the counter hits zero. Topic suggestions welcome in the comment section.

December 05, 2016 07:00 AM

December 04, 2016


Django Weblog

Presenting DjangoCon Europe 2017

2017’s DjangoCon Europe takes place against the gorgeous backdrop of Florence in the springtime. Once again the event will be organised by a local committee of volunteers on behalf of the global Django community.

The event will be held in the historic Odeon Cinema in the centre of the city. It’s an architectural gem, an Art Deco interior in a Renaissance palace.

Key points

Ticket sales are now open. Early-bird rates are available until the 17th January.

The call for proposals is open too, until the 31st December.

Generous financial assistance packages are offered, to help ensure that everyone who will benefit has the opportunity to attend.

The conference can even offer discounted public transport passes (see the tickets page) valid for the duration of the event, to help you get around the city.

The call for proposals

The programme of talks will represent the vibrant diversity of interests and endeavours across the Django community, including some that you had not only never heard of, but would not have imagined. The speaker roster will also feature some of the best-known names in the world of Django. There’ll be talks from those who are leading its development into the future, and about its deepest internals - discussions on the highest technical level.

The organisers invite proposals from all. Whatever your level of technical or speaking experience, you are invited to share what you know or have done with Django with your friends and colleagues in the community.

Both the speaker line-up and the selection of talks will be curated to offer a wide and representative balance, so the platform created by DjangoCon Europe 2017 will have room for everyone.

And just in case five days in Florence are not enough, PyCon Italia immediately follows DjangoCon Europe. You’re invited to submit your talk proposal to PyCon Italia too, in the same process, by ticking a single box on the form.

The ambitions of DjangoCon Europe 2017

The conference

Each successive DjangoCon Europe has advanced new ideas about how a conference should be run and has set new standards for itself. Just measuring up to past editions is challenge enough, but the organisers of 2017’s event have ambitions for it of their own, that also extend beyond this gathering of nearly 400 Djangonauts.

The Italian context

The organisers consider DjangoCon Europe 2017 an opportunity for the whole Italian Django community to use it as a launching pad for future organisation, development and activity, so that it makes a tangible and material difference to the open-source software community and industry in Italy.

The social context

The organisers want the event to harness the energy, know-how and organisation skills in the community, and put them to work in local organisations that work to advance social inclusion, in particular, amongst women from immigrant communities, who are disproportionately marginalised and excluded socially, technologically, economically and educationally.

Responsibility and sustainability

The Django community has always generally been conscious that its technology exists in a social context and not a vacuum.

The overall themes of this DjangoCon Europe are responsibility and sustainability: responsibility to others in our industry and of our industry’s responsibility to the wider world, and the sustainability - economic, personal and social - of the industry itself.

The conference invites its attendees to participate in these discussions, and to consider how our technology’s long-term viability depends on them as much as it does on the technical brilliance of its technologists.

A Django festival of ideas and collaboration

These are ambitions and aspirations. Their vehicle will be the international festival of community that each DjangoCon Europe represents, and reinvests with new energy each year. The organisers give you Florence in the springtime, a magnificent capital of history, culture, beauty and food, and the perfect foundation for building the future with Django.

Don’t miss it.

December 04, 2016 10:03 PM


Brett Cannon

Why I took October off from OSS volunteering

December 04, 2016 11:56 AM

What to look for in a new TV

I'm kind of an A/V nerd. Now I'm not hardcore enough to have a vinyl collection or have an amp for my TV, but all my headphones cost over $100 and I have a Sonos Playbar so I don't have to put up with crappy TV speakers. What I'm trying to say is that I care about the A/V equipment I use, but not to the extent that money is no object when it comes to my enjoyment of a movie (I'm not that rich and my wife would kill me if I spent that kind of money on electronics). That means I tend to research extensively before making a major A/V purchase since I don't do it very often and I want quality within reason which does not lend itself to impulse buying.

Prior to September 1, 2016, I had a 2011 Vizio television. It was 47", did 1080p, and had passive 3D. When I purchased the TV I was fresh out of UBC having just finished my Ph.D. so it wasn't top-of-the-line, but it was considered very good for the price. I was happy with the picture, but admittedly it wasn't amazing; the screen had almost a matte finish which led to horrible glare. I also rarely used the 3D in the television as 3D Blu-Ray discs always cost extra and so few movies took the time to actually film in 3D to begin with, instead choosing to do it in post-production (basically animated films and TRON: Legacy were all that we ever watched in 3D). And to top it all off, the TV took a while to turn on. I don't know what kind of LCB bulbs were in it, but they took forever to warm up and just annoyed me (yes, very much a first-world problem).

So when UHD came into existence I started to keep an eye on the technology and what television manufacturers were doing to incorporate the technology to entice people like me to upgrade. After two years of watching this space and one of the TVs I was considering having a one-day sale that knock 23% off the price, I ended up buying a 55" Samsung KS8000 yesterday. Since I spent so much time considering this purchase I figured I would try and distill what knowledge I have picked up over the years into a blog post so that when you decide to upgrade to UHD you don't have to start from zero knowledge like I did.

What to care about

First, you don't care about the resolution of the TV. All UHD televisions are 4K, so that's just taken care of for you. It also doesn't generally make a difference in the picture because most people sit too far away from their TV to make the higher resolution matter.

No, the one thing you're going to care about is HDR and everything that comes with it. And of course it can't be a simple thing to measure like size or resolution. Oh no, HDR has a bunch of parts to it that go into the quality of the picture: brightness, colour gamut, and format (yes, there's a format war; HD-DVD/Blu-Ray didn't teach the TV manufacturers a big enough lesson).

Brightness

A key part of HDR is the range of brightness to show what you frequently hear referred to as "inky blacks" and "bright whites". The way you get deep blacks and bright whites is by supporting a huge range of brightness. What you will hear about TVs is what their maximum nit is. Basically you're aiming for 1000 nits or higher for a maximum and as close to 0 as possible for a minimum.

Now of course this isn't as simple as it sounds as there's different technology being used to try and solve this problem.

LCD

Thanks to our computers I'm sure everyone reading this is familiar with LCD displays. But what you might not realize is how they exactly work. In a nutshell there are LED lightbulbs behind your screen that provides white light, and then the LCD pixels turn on and off the red/green/blue parts of themselves to filter out certain colours. So yeah, there are lightbulbs in your screen and how strong they are dictates how bright your TV screen will be.

Now the thing that comes into play here for brightness is how those LED bulbs are oriented in order to get towards that 0 nits for inky blacks. Typical screens are edge-list, which means there is basically a strip of LEDs on the edges of the TV that shine light towards the middle of the screen. This is fine and it's what screens have been working with for a while, but it does mean there's always some light behind the pixels so it's kind of hard to keep it from leaking out a little bit.

This is where local dimming comes in. Some manufacturers are now laying out the LED bulbs in an array/grid behind the screen instead of at the edges. What this allows is for the TV to switch dim an LED bulb if it isn't needed at full strength to illuminate a certain quadrant of the screen (potentially even switching off entirely). Obviously the denser the array, the more local dimming zones and thus the greater chance a picture with some black in it will be able to switch off an LED to truly get a dark black for that part of the screen. As for how often something you're watching is going to allow you to take advantage of such local dimming due to a dark area lining up within a zone is going to vary so this is going to be a personal call as to whether this makes a difference to you.

OLED

If I didn't have a budget and wanted the ultimate solution for getting the best blacks in a picture, I would probably have an OLED TV from LG. What makes these TVs so great is the fact that OLEDs are essentially pixels that provide their own light. What that means is if you want an OLED pixel to be black, you simply switch it off. Or to compare it to local dimming, it's as if every pixel was its own local dimming zone. So if you want truly dark blacks, OLED are the way to go. It also leads to better colours since the intensity of the pixel is consistent compared to an LCD where the brightness is affected by how far the pixel is from the LED bulb that's providing light to the pixel.

But the drawback is that OLED TVs only get so bright. Since each pixel has to generate its own light they can't reach really four-digit nit levels like the LCD TVs can. It's still much brighter than any HD TV, but OLED TVs don't match the maximum brightness of the higher-end LCD TVs.

So currently it's a race to see if LCDs can get their blacks down or if OLEDs can get their brightness up. But from what I have read, in 2016 your best bet is OLED if you can justify the cost to yourself (they are very expensive televisions).

Colour gamut

While having inky blacks and bright whites are nice, not everyone is waiting for Mad Max: Fury Road in black and white. That means you actually care about the rest of the rainbow, which means you care about the colour gamut of the TV for a specific colour space. TVs are currently trying to cover as much of the DCI-P3 colour space as possible right now. Maybe in a few years TVs will fully cover that colour space, at which point they will start worrying about Rec. 2020 (also called BT.2020), but there's still room in covering DCI-P3 before that's something to care about.

In the end colour gamut is probably not going to be something you explicitly shop for, but more of something to be aware of that you will possibly gain by going up in price on your television.

Formats

So you have your brightness and you have your colours, now you have to care about what format all of this information is stored in. Yes my friends, there's a new format war and it's HDR10 versus Dolby Vision. Now if you buy a TV from Vizio or LG then you don't have to care because they are supporting both formats. But if you consider any other manufacturer you need to decide on whether you care about Dolby Vision because everyone supports HDR10 these days but no one supports Dolby Vision at the moment except those two manufacturers.

There is one key reason that HRD10 is supported by all television makers: it's an open specification. By being free it doesn't cut into profits of TVs which obviously every manufacturer likes and is probably why HDR10 is the required HDR standard for Ultra Blu-Ray discs (Dolby Vision is supported on Ultra Blu-Ray, but not required). Dolby Vision, on the other hand, requires licensing fees paid to Dolby. Articles also consistently suggest that Dolby Vision requires new hardware which would also drive up costs of supporting Dolby Vision (best I can come up with is that since Dolby Vision is 12-bit and HDR10 is 10-bit that TVs typically use a separate chip for Dolby Vision processing).

Dolby Vision does currently have two things going for it over HDR10. One is that Dolby Vision is dynamic per frame while HDR10 is static. This is most likely a temporary perk, though, because HDR10 is gaining dynamic support sometime in the future.

Two is that Dolby Vision is part of an end-to-end solution from image capture to projection in the theatres. By making Dolby Vision then also work at home it allows for directors and editors to get the results they want for the cinema and then just pass those results along to your TV without extra work.

All of this is to say that Dolby Vision seems to be the better technology, but the overhead/cost of adding it to a TV along with demand will ultimately dictate whether it catches on. Luckily all TV manufacturers has agreed on the minimum standard of HDR10 so you won't be completely left out if you buy a TV from someone other than LG or Vizio.

Where to go for advice

When it comes time to buy a TV, I recommend Rtings.com for advice. They have a very nice battery of tests they put the TV through and give you nice level of detail on how they reached their scores for each test. They even provide the settings they used for their tests so you can replicate them at home.

You can also read what the Wirecutter is currently recommending. For me, though, I prefer Rtings.com and use the Wirecutter as a confirmation check if their latest TV round-up isn't too out-of-date.

Ultra HD Premium

If you want a very simple way to help choose a television, you can simply consider ones that are listed as Ultra HD Premium. That way you know the TV roughly meets a minimum set of specifications that are reasonable to want if you're spending a lot of money on a TV. The certification is new in 2016 and so there are not a ton of TVs yet that have the certification, but since TV manufacturers like having stamps on their televisions I suspect it will start to become a thing.

One thing to be aware of is that Vizio doesn't like the certification. Basically they have complained that the lack of standards around how to actually measure what the certification requires makes it somewhat of a moot point. That's a totally reasonable criticism and why using the certification as a filter for TVs consider is good, but to not blindly buy a TV just because it has Ultra HD Premium stamp of approval.

Why I chose my TV

Much like when I bought a soundbar, I had some restrictions placed upon me when considering what television I wanted. One, the TV couldn't be any larger than 55" (to prevent the TV from taking over the living room even though we should have a 65" based on the minimum distance people might sit from the TV). This immediately put certain limits on me as some model lines don't start until 65" like the Vizio Reference series. I also wasn't willing to spend CAD 4,000 on an LG, so that eliminated OLED from consideration. I also wanted HDR, so that eliminated an OLED that was only HD.

In the end it was between the 55" Samsung KS8000, 55" Vizio P-series, and the 50" Vizio P-series. The reason for the same Vizio model at different sizes is the fact that they use different display technology; the 50" has a VA display while the 55" has an IPS display. The former will have better colours but the latter has better viewing angles. Unfortunately I couldn't find either model on display here in Vancouver to see what kind of difference it made.

One other knock against the Vizio -- at least at 55" -- was that it wasn't very good in a bright room. That's a problem for us as our living room is north facing with a big window and the TV is perpendicular to those windows, so we have plenty of glare on the screen as the sun goes down. The Samsung, on the other hand, was rated to do better in a glare-heavy room. And thanks to a one-day sale it brought the price of the Samsung to within striking distance of the Vizio. So in the end with the price difference no longer a factor I decided to go with the TV that worked best with glare and maximized the size I could go with.

My only worry with my purchase is if Dolby Vision ends up taking hold and I get left in the cold somehow. But thanks to the HDR10 support being what Ultra Blu-Ray mandates I'm not terribly worried of being shut out entirely from HDR content. There's also hope that I might be able to upgrade my television in the future thanks to it using a Mini One Connect which breaks out the connections from the television. In other TVs the box is much bigger as it contains all of the smarts of the television, allowing future upgrades. There's a chance I will be able to upgrade the box to get Dolby Vision in the future, but that's just a guess at this point that it's even possible, let alone whether Samsung choose to add Do

It's been 48 hours with the TV and both Andrea and I are happy with the purchase; me because the picture is great, Andrea because I will now shut up about television technology in regards to a new TV purchase.

December 04, 2016 11:56 AM

Introducing Which Film

What I'm announcing

Today I'm happy to announce the public unveiling of Which Film! I'll discuss how the site came about and what drives it, but I thought I would first explain what it does: it's a website to help you choose what movie you and your family/friends should watch together. What you do is you go to the site, enter in the Trakt.tv usernames of everyone who wants to watch a film together (so you need at least two people and kept data like your watchlist and ratings on Trakt), and then Which Film cross-references everyone's watchlists and ratings to create a list of movies that people may want to watch together.

The list of movies is ranked based on a simple point scale. If a movie is on someone's watchlist it gets 4 points, movies rated 10 ⭐ get 3 points, 9 ⭐ get 2 points, and 8 ⭐ get 1 point. Everyone who participates contributes points and the movies are sorted from highest score to lowest. The reason for the point values is the assumption that watching a movie most people have not seen is the best, followed by a movies people rate very highly. In the case of ties, the movie seen longest ago (if ever) by anyone in the group is ranked higher than movies seen more recently by someone. That way there's a bigger chance someone will be willing to watch a movie again when everyone else wants to see it for the first time.

None of this is very fancy or revolutionary, but it's useful any time you get together with a group of friends to watch a film and you end up having a hard time choosing to watch. It can help even between spouses as it will identify movies both people want to watch, removing that particular point of contention.

The story behind Which Film

Now normally launching a new website wouldn't cause for any backstory, but this project has been under development for about six years, so there's a bit of history to it.

One fateful night ...

The inspiration for Which Film stemmed from one night when my co-creator Karl, his wife, my wife, and I got together and decided we wanted to watch a movie. This turned out to be quite an ordeal due to disparate tastes among all four of us. Karl and I thought that there had to be a better way to figure out a film we could all happily watch together. It didn't need to necessarily be something none of us had seen (although that was preferred), but it did need to be something that had a chance of making all of us happy if we chose to watch it.

This is when I realized that at least for me I had all of the relevant data to make such a decision on IMDb. I had been keeping my watchlist and ratings up-to-date on the site for years, to the point of amassing a watchlist over of 400 movies. Karl and I realized that had all four of us done that we could have cross-referenced the data and easily have found a film we all liked. Yes, it would require convincing everyone involved to keep track of what movies they wanted to see and rating movies that had seen, but we figured that wasn't an insurmountable problem. And so we decided we should code up a solution since we're both software developers.

You need an API, IMDb

But there was trouble with this project from the beginning. It turns out that while IMDb is happy for you to store your data on their servers, they don't exactly make it easy to get the data out. For instance, when I started looking into this they had two ways of getting to your data in some programmatic way: RSS and CSV files. The problem with RSS is that it was capped at (I believe) 200 entries, so I couldn't use it to access my entire data set. The issue with CSV was that you had to be logged in to download it. And the issue with both approaches was they were constantly broken for for different things simultaneously; when I looked into this last RSS was busted for one kind of list while CSV was broken for another. To top it all off the brokenness wasn't temporary, but lasted for lengths of time measured in months. That obviously doesn't work if you want to rely on the data and there's no official API (and IMDb at least used to aggressively go after anyone who use their name in a project).

Luckily I found Trakt. It has an API, it was accessible on a phone, and it wasn't ugly. The trick, though, was getting my data from IMDb to Trakt. Luckily there was a magical point when CSV exporting on IMDb worked for all of my lists, and so I downloaded the data and hacked together csv2trakt to migrate my data over (there is TraktRater for importing into Trakt as well, but at the time I had issues getting it to run on macOS).

What platform?

With my data moved over, we then had to choose what platform to have Which Film on. We toyed with the idea of doing a mobile app, but I'm an Android user and Karl is on iOS (and the same split for our wives), so that would have meant two apps. That didn't really appeal to either of us so we decided to do a website. We also consciously chose to do a single-page app to avoid maintaining a backend where would have to worry about uptime, potential server costs, etc. It also helps that there's a local company in Vancouver called Surge that does really nice static page hosting with a very reasonable free tier (when they get Let's Encrypt support I'll probably bump up to their paid tier if people actually end up using Which Film).

Choosing a programming language is never easy for me

Since we had found a website we were willing to ask people to use to store data, I had solved my data import problem, and we had decided on doing a website solution, next was what technology stack to use. The simple answer would have been Python, but for me that's somewhat boring since I obviously know Python. To make sure we both maximized our learning from this project we endeavoured to find a programming language neither of us had extensive experience in.

Eventually we settled on Dart. At the time we made this decision I worked at Google which is where Dart comes from, so I knew if I got really stuck with something I had internal resources to lean on. Karl liked the idea of using Dart because his game developer background appreciated the fact that Dart was looking into things like SIMD for performance. I also knew that Dart had been chosen by the ads product division at Google which meant it wasn't going anywhere. That also meant choosing Angular 2 was a simple decision since Google was using Dart with Angular 2 for products and so it would have solid Dart support.

But why six years?!?

As I have said, the site isn't complicated as you can tell from its source code, so you may be wondering why it took us six years before we could finish it. Well, since coming up with this idea I at least finished my Ph.D., moved five times between two countries,and worked for two different employers (if you don't count my Ph.D.). Karl had a similar busy life over the same timespan. And having me spend a majority of those six years in a different timezone didn't help facilitate discussions. At least we had plenty of time to think through various UX and design problems. ☺

If you give Which Film a try do let Karl and/or me know on Twitter (if you just want to see how the website works and you don't have a Trakt account you can use our usernames: brettcannon and kschmidt).

December 04, 2016 11:56 AM


Lintel Technologies

Asynchronous HTTP Responses using twisted.web

In Twisted web, Resources may generate it’s response asynchronously rather than immediately upon the call to its render method. This is different from other synchronous python frameworks like django, flask etc. In those frameworks response has to be returned at the end of view function execution.

I used NOT_DONE_YET method to return response asynchronously. Also, used resource and render method to implement.

Conceptual Overview:

123

from twisted.web.resource import Resource
from twisted.web.server import NOT_DONE_YET,Site
from twisted.internet import reactor

class ShowResource(Resource):
    isLeaf =  True
    def printRequest(self,request):
        request.write('Hi, Your response are ready.')
        request.finish()


    def render_GET(self,request):
        req = request
        reactor.callLater(10,self.printRequest,req)
        return NOT_DONE_YET

root = ShowResource()
factory = Site(root)
reactor.listenTCP(8082, factory)
reactor.run()

 

I used request.write() method for resource to respond instead of returning string.
This method can be called repeatedly. Each call appends another string to the response body. Once response body has been passed to request.write, the application must call request.finish . request.finish, ends the response. To generate response asynchronously, render method must return NOT_DONE_YET. If you do not call request.finish() you will get awaiting response mode. Meaning, browser still waits for full response, twisted won’t end the request until request.finish() is called.

To run this, save a file and run as a python script.
Then, open web browser and enter url “localhost:8082”.
You will see awaiting response for 10 seconds and then response arrives.

The post Asynchronous HTTP Responses using twisted.web appeared first on Lintel Technologies Blog.

December 04, 2016 10:37 AM

December 03, 2016


Python Insider

Python 2.7.13 release candidate 1 available

A release candidate for Python 2.7.13, a bug fix release in the Python 2.7 series, is now available for download on python.org.

December 03, 2016 10:54 PM


Holger Peters

Fixing Python-only feeds

I moved this blog to a new blog software and had lost the ability to build custom feeds on tags. This should at least work now for the Python tag again and so your feed readers should be able to load Python feed again. Let me know if this is not working.

The IDs of posts in that feed are probably not the same as in the static site generator I used before, so your feed reader might show some old posts as unread, although you have read them already.

December 03, 2016 08:08 PM


Kushal Das

Communication tools and submitting weekly reports

I work for Fedora Engineering team. There are around 17 people in my team, and we cover Australia to USA geographically. Most of us are remote, only a hand few members go to the local offices. I did a blog post around 3 years back just after I started working remotely. In this post, I am trying to write down my thoughts about our communication styles.

Communication Tools

IRC is our primary communication medium. I am in around 42 channels dedicated to various sub-projects inside Fedora. We have a few dedicated meeting channels. As all meetings involve community members, the meeting timings are based on the availability of many other people. This is the only thing which is difficult as I have many meetings which go after midnight. Any other discussion where we need more participation and also to keep a record, we use our mailing lists.

A few of us also use various video chat systems regularly. It is always nice to see the faces. As a team, we mostly meet once during Flock, and some get a chance to meet each other during devconf.

Weekly reports

All of our team members send weekly work status updates to the internal mailing list. Sometimes I lag behind in this particular task. I tried various different approaches. I do maintain a text file on my laptop where I write down all the tasks I do. A Python script converts it into a proper email (with dates etc.) and sends it to the mailing list. The problem of using just a plain text file is that if I miss one day, I generally miss the next day too. The saga continues in the same way. Many of my teammates use taskwarrior as TODO application. I used it for around 6 months. As a TODO tool it is amazing, I have a script written by Ralph, which creates a very detailed weekly report. I was filling in my text file by copy/pasting from the taskwarrior report. Annotations were my main problem with taskwarrior. I like updating any TODO note on a GUI (web/desktop) much more, than any command line tool. In taskwarrior, I was adding, and ending tasks nicely, but was not doing any updates to the tasks.

I used Wunderlist few years back. It has very a nice WEB UI, and also a handy mobile application. The missing part is the power of creating reports. I found they have nice open source libraries to interact with the API services, both in Python and also in golang. I have forked wunderpy2, which has a patch for getting comments for any given task (I will submit a PR later). Yesterday I started writing a small tool using this library. It prints the report on STDOUT in Markdown format. There is no code for customization of the report yet, I will be adding them slowly. My idea is to run this tool, pipe the output to my report text file. Edit manually if I want to. Finally, I execute my Python script to send out the report. In the coming weeks I will be able to say how good is this method.

December 03, 2016 04:47 PM


Experienced Django

Exploring with Cookiecutter

After reading Two Scoops of Django, I decided to explore the cookiecutter app that they recommend.  If you haven’t used it, cookiecutter is ” A command-line utility that creates projects from cookiecutters (project templates). E.g. Python package projects, jQuery plugin projects.”

In particular I explored some of the Django-specific cookiecutter templates (though I did peek at a few other languages, too).  I started with the cookiecutter-django template as it was written by one of the authors of the book and it was at the top of the list I was looking at.

Cookiecutter-Django

Cookiecutter-Django is a full-on professional level template for producing enterprise-level Django apps.  As such, it comes with a LOT of bells and whistles.  Celery support?  Yep.  Docker? Yep.  Custom windows and pycharm setups?  Yep.  The list goes on.  This is pretty cool, but, at 134 files, many of which I do not know the use of, this is clearly not aimed at me, the hobbyist learning Django in his spare time.

It does, however, have some interesting project layout ideas that I think ARE useful and sound.  Among these are having separate doc and requirements directories and having the requirements files broken into separate environments (local, production, test) to codify and simplify the set up for the job you’re doing.

Lots of good stuff here to dig into in the future, but this is a bit over my head at the moment. Fortunately, there are a myriad of other templates to play with.  I found a good list of them at readthedocs.

Django-crud

The next one I ventured into was django-crud.  This is different from the others I looked at in that it was not a full django project, rather just the webapp portion of it.

It is pretty cool in that it gives a really detailed example of how to write a simple model (performing the CRUD database operations, of course) with full unit tests and simple templates.  At 16 files, it’s not so overwhelming and has a pretty clear intent.  I’m sure the author uses it as a starting point for webapps, but I’m finding it as useful learning tool and will likely be the basis as I start down the unit testing road (coming soon!).

Django-paas

I also took a quick tour through Django-paas. which is billed as a “Django template ready to use in SAAS platforms like Heroku, OpenShift, etc..”.

While this isn’t likely one I’m ready to use just yet (I’m still hosting locally on my private machine for use inside my network).  It’s definitely got a few features that are worth looking at.  Of note was the use of waitress (instead of gnuicorn) which might be an interesting alternative.  It also provided a procfile for heroku, which I can see being handy in the future.

Simple Django

My final stop of this tour was Simple-Django.  This is a very trimmed down version of the full cookiecutter-django template I started with.  It still has separate requirements and docs directories, but does not pull in nearly the number of third-party packages that its full-blown antecedent does.

This looks like a good start place for my level of development and interest.  It will likely be the basis I use if and when I create a django template for my own use.

Summary

While none of these cookie-cutter templates are exactly what I’m looking for at the moment, I found exploring them a good way to learn about different tools, directory layouts and general best practices in the django world.  I’d encourage you to explore some of these, even if they’re not something you need at the moment.

I’d like to thank the authors of each of these templates for the work they put into them and for sharing them with us, and, in particular, Audrey Roy Greenfeld and Danny Roy Greenfeld for creating cookiecutter and the cookiecutter-django template.

Thanks!

December 03, 2016 04:34 PM


Albert Hopkins

Packaging static Sphinx-based documentation in a reusable Django app

At my job we have a few Django apps which are packaged as reusable Django apps. Some of those apps have documentation generated by Sphinx that need to be served along with the served app (as opposed to at another instance like ReadTheDocs). Ideally the Sphinx documentation would be built (as HTML) and copied to the static/ directory of the app. We used to include a custom django-admin command to do so, but you have to remember to run it and you have to remember to do so before calling collectstatic. One alternative would be to include the built files with the source, but again one must remember to build the docs every they change and I'm not a big fan of "building" things in the source (that's what the build phase is for).

There are some third-party apps that allow you to serve Sphinx-based
documentation, but they either still require you to manually perform the build step or require a database or index server.

If all you want to do is serve the static files and not have to remember the build step, there is another alternative. You could have setup.py build the Sphinx docs into your static/ directory at build time. Here's how you could do it.

A typical reusable Django app's structure might look like this:

.
├── hello_world
│   ├── admin.py
│   ├── apps.py
│   ├── __init__.py
│   ├── migrations
│   │   └── __init__.py
│   ├── models.py
│   ├── static
│   ├── templates
│   ├── tests.py
│   └── views.py
├── MANIFEST.in
├── README.rst
└── setup.py

For this to work, we create our Sphinx documentation in the, e.g., hello_world/docs/ directory.

$ sphinx-quickstart hello_world/docs

Now for the fun part. The trick to getting setuptools to build and install the Sphinx docs at build time, we must interject a step in the build process. setuptools uses the build_py step to install the Python files. build_py is a class that comes with setuptools. We can subclass it and tell setuptools to use our subclass instead. So in our setup.py we define our class as such:

# setup.py
from os import path  
from subprocess import check_call

from setuptools import find_packages, setup  
from setuptools.command.build_py import build_py

name = 'hello_world'  
version = '0.0'  
description = 'World greeter'


class build_py_with_docs(build_py):  
    def run(self):
        build_py.run(self)

        docs_source = path.join(self.get_package_dir(name), 'docs')
        docs_target = path.join(self.build_lib, name, 'static', 'docs', name)
        check_call(('sphinx-build', '-b', 'html', docs_source, docs_target))


setup(  
    name=name,
    version=version,
    description=description,
    author='Albert Hopkins',
    author_email='marduk@python.net',
    url='http://starship.python.net/crew/marduk/',
    packages=find_packages(),
    include_package_data=True,
    cmdclass={'build_py': build_py_with_docs},
)

The only thing that is unordinary about this setup.py is the build_py_with_docs class definition and its accompanying cmdclass argument to setup(). The custom class simply overrides build_py's .run() method and adds a step to build the Sphinx documentation in the app's static/ directory. The cmdclass informs setuptools to use our custom class instead of the built-in build_py class for the "build_py" step.

One caveat: Sphinx must be installed on the system at build time. Likely this is already the case if you are already manually building the docs. If you instead installing from Python wheel files, Sphinx need only be installed on the host(s) generating the wheels.

The take-away is that by simply pip installing hello_world gives us the app and the HTML-formatted Sphinx docs in the app's static directory. Once installed (and collected) by your Django app, one can access the documentation as simple static files, e.g.:

http://statichost/static/docs/hello_world/index.html

December 03, 2016 01:31 PM


Vasudev Ram

Simple directory lister with multiple wildcard arguments

By Vasudev Ram

$ python file_glob.py f[!2-5]*xt

I was browsing the Python standard library (got to use those batteries!) and thought of writing this simple utility - a command-line directory lister that supports multiple wildcard filename arguments, like *, ?, etc. - as the OS shells bash on Unix and CMD on Windows do. It uses the glob function from the glob module. The Python documentation for glob says:

[ No tilde expansion is done, but *, ?, and character ranges expressed with [] will be correctly matched. This is done by using the os.listdir() and fnmatch.fnmatch() functions in concert. ]

Note: no environment variable expansion is done either, but see os.path.expanduser() and os.path.expandvars() in the stdlib.

I actually wrote this program just to try out glob (and fnmatch before it), not to make a production or even throwaway tool, but it turns out that, due to the functionality of glob(), even this simple program is somewhat useful, as the examples of its use below show, particularly with multiple arguments, etc.

Of course, this is not a full-fledged directory lister like DIR (Windows) or ls (Unix) but many of those features can be implemented in this or similar tools, by using the stat module (which I've used in my PySiteCreator and other programs); the Python stat must be a wrapper over the C library with the same name, at least on Unix (AFAIK the native Windows SDK's directory and file system manipulation functions are different from POSIX ones, though the C standard library on Windows has many C stdio functions for compatibility and convenience).

Here is the code for the program, file_glob.py:

from __future__ import print_function
'''
file_glob.py
Lists filenames matching one or more wildcard patterns.
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import sys
import glob

sa = sys.argv
lsa = len(sys.argv)

if lsa 2:
print("{}: Must give one or more filename wildcard arguments.".
format(sa[0]))
sys.exit(1)

for arg in sa[1:]:
print("Files matching pattern {}:".format(arg))
for filename in glob.glob(arg):
print(filename)

I ran it multiple times with these files in my current directory. All of them are regular files except dir1 which is a directory.
$ dir /b
dir1
f1.txt
f2.txt
f3.txt
f4.txt
f5.txt
f6.txt
f7.txt
f8.txt
f9.txt
file_glob.py
o1.txt
out1
out3
test_fnmatch1.py
test_fnmatch2.py
test_glob.py~
Here are a few different runs of the program with 0, 1 or 2 arguments, giving different outputs based on the patterns used.
$ python file_glob.py
Must give one or more filename wildcard arguments.
$ python file_glob.py a
Files matching pattern a

$ python file_glob.py *1
Files matching pattern *1:
dir1
out1
$ python file_glob.py *txt
Files matching pattern *txt:
f1.txt
f2.txt
f3.txt
f4.txt
f5.txt
f6.txt
f7.txt
f8.txt
f9.txt
o1.txt
$ python file_glob.py *txt *1
Files matching pattern *txt:
f1.txt
f2.txt
f3.txt
f4.txt
f5.txt
f6.txt
f7.txt
f8.txt
f9.txt
o1.txt
Files matching pattern *1:
dir1
out1
$ python file_glob.py *txt *py
Files matching pattern *txt:
f1.txt
f2.txt
f3.txt
f4.txt
f5.txt
f6.txt
f7.txt
f8.txt
f9.txt
o1.txt
Files matching pattern *py:
file_glob.py
test_fnmatch1.py
test_fnmatch2.py
$ python file_glob.py f[2-5]*xt
Files matching pattern f[2-5]*xt:
f2.txt
f3.txt
f4.txt
f5.txt
$ python file_glob.py f[!2-5]*xt
Files matching pattern f[!2-5]*xt:
f1.txt
f6.txt
f7.txt
f8.txt
f9.txt
$ python file_glob.py *mat*
Files matching pattern *mat*:
test_fnmatch1.py
test_fnmatch2.py
$ python file_glob.py *e* *[5-8]*
Files matching pattern *e*:
file_glob.py
test_fnmatch1.py
test_fnmatch2.py
test_glob.py~
Files matching pattern *[5-8]*:
f5.txt
f6.txt
f7.txt
f8.txt
$ python file_glob.py *[1-4]*
Files matching pattern *[1-4]*:
dir1
f1.txt
f2.txt
f3.txt
f4.txt
o1.txt
out1
out3
test_fnmatch1.py
test_fnmatch2.py
$ python file_glob.py a *txt b
Files matching pattern a:
Files matching pattern *txt:
f1.txt
f2.txt
f3.txt
f4.txt
f5.txt
f6.txt
f7.txt
f8.txt
f9.txt
o1.txt
Files matching pattern b:
As you can see from the runs, it works, including for ranges of wildcard characters, and the negation of them too (using the ! character inside the square brackets before the range).

Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf

Subscribe to my blog by email

My ActiveState recipes

Managed WordPress Hosting by FlyWheel


December 03, 2016 01:16 AM

December 02, 2016


Will McGugan

Strings hiding in plain sight

It's not often I come across something in Python that surprises me. Especially in something as mundane as string operations, but I guess Python still has a trick or two up its sleeve.

Have a look at this string:

>>> s = "A"

How many possible sub-strings are in s? To put it another away, how many values of x are there where the expression x in s is true?

Turns out it is 2.

2?

Yes, 2.

>>> "A" in s
True
>>> "" in s
True

The empty string is in the string "A". In fact, it's in all the strings.

>>> "" in "foo"
True
>>> "" in ""
True
>>> "" in "here"
True

Turns out the empty string has been hiding every where in my code.

Not a complaint, I'm sure the rationale is completely sound. And it turned out to be quite useful. I had a couple of lines of code that looked something like this:

if character in ('', '\n'):
   do_something(character)

In essence I wanted to know if character was an empty string or a newline. But knowing the empty string thang, I can replace it with this:

if character in '\n':
    do_something(character)

Which has exactly the same effect, but I suspect is a few nanoseconds faster.

Don't knock it. When you look after the nanoseconds, the microseconds look after themselves.

December 02, 2016 10:09 PM


Dataquest

Pandas Tutorial: Data analysis with Python: Part 2

We covered a lot of ground in Part 1 of our pandas tutorial. We went from the basics of pandas DataFrames to indexing and computations. If you’re still not confident with Pandas, you might want to check out the Dataquest pandas Course.

In this tutorial, we’ll dive into one of the most powerful aspects of pandas – its grouping and aggregation functionality. With this functionality, it’s dead simple to compute group summary statistics, discover patterns, and slice up your data in various ways.

Since Thanksgiving was just last week, we’ll use a dataset on what Americans typically eat for Thanksgiving dinner as we explore the pandas library. You can download the dataset here. It contains 1058 online survey responses collected by FiveThirtyEight. Each survey respondent was asked questions about what they typically eat for Thanksgiving, along with some demographic questions, like their gender, income, and location. This dataset will allow us to discover regional and income-based patterns in what Americans eat for Thanksgiving dinner. As we explore the data and try to find patterns, we’ll be heavily using the grouping and aggregation functionality of pandas.

...

December 02, 2016 08:00 PM


Weekly Python StackOverflow Report

(xlviii) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2016-12-02 18:37:04 GMT


  1. Is there a difference between [] and list() when using id()? - [30/1]
  2. Why does incorrect assignment to a global variable raise exception early? - [13/3]
  3. What is the fastest way to get an arbitrary element out of a Python dictionary? - [12/4]
  4. Why can't I import from a module alias? - [10/4]
  5. Fastest way to "grep" big files - [9/3]
  6. Why is @staticmethod not preserved across classes, when @classmethod is? - [9/2]
  7. How to convert a list of strings to list of dictonaries in python? - [6/6]
  8. how variable scope works in generators in classes - [6/3]
  9. Numpy: make batched version of quaternion multiplication - [6/3]
  10. Pandas count consecutive date observations within groupby object - [6/2]

December 02, 2016 06:37 PM


Andrew Dalke

Weininger's Realization

Dave Weininger passed away recently. He was very well known in the chemical informatics community because of his contribution to the field and his personality. Dave and Yosi Taitz founded Daylight Chemical Information Systems to turn some of these ideas into a business, back in the 1980s. It was very profitable. (As a bit of trivia, the "Day" in "Daylight" comes from "Dave And Yosi".)

Some of the key ideas that Dave and Daylight introduced are SMILES, SMARTS, and fingerprints (both the name and the hash-based approach). Together these made for a new way to handle chemical information search, and do so in significantly less memory. The key realization which I think lead to the business success of the comany, is that the cost of memory was decreasing faster than the creation of chemical information. This trend, combined with the memory savings of SMILES and fingerprints, made it possible to store a corporate dataset in RAM, and do chemical searches about 10,000 times faster than the previous generation of hard-disk based tools, and do it before any competition could. I call this "Weininger's Realization". As a result, the Daylight Thor and Merlin databases, along with the chemistry toolkits, became part of the core infrastructure of many pharmaceutical companies.

I don't know if there was a specific "a-ha" moment when that realization occurred. It certainly wasn't what drove Dave to work on those ideas in the first place. He was a revolutionary, a Prometheus who wanted to take chemical information from what he derisively called 'the high priests' and bring it to the people.

An interest of mine in the last few years is to understand more about the history of chemical information. The best way I know to describe the impact of Dave and Daylight is to take some of the concepts back to the roots.

You may also be interested in reading Anthony Nicholls description of some of the ways that Dave influenced him, and Derek Lowe's appreciation of SMILES.

Errors and Omissions

Before I get there, I want to emphasize that the success of Daylight cannot be attributed to just Dave, or Dave and Yosi. Dave's brother Art and his father Joseph were coauthors on the SMILES canonicalization paper. The company hired people to help with the development, both as employees and consultants. I don't know the details of who did what, so I will say "Dave and Daylight" and hopefully reduce the all too easy tendency to give all the credit on the most visible and charismatic person.

I'm unfortunately going to omit many parts of the Daylight technologies, like SMIRKS, where I don't know enough about the topic or its effect on cheminformatics. I'll also omit other important but invisible aspects of Daylight, like documentation or the work Craig James did to make the database servers more robust to system failures. Unfortunately, it's the jockeys and horses which attract the limelight, not those who muck the stables or shoe the horses.

Also, I wrote this essay mostly from what I have in my head and from presentations I've given, which means I've almost certainly made mistakes that could be fixed by going to my notes and primary sources. Over time I hope to spot and fix those mistakes in this essay. Please let me know of anything you want me to change or improve.

Dyson and Wiswesser notations

SMILES is a "line notation", that is, a molecular representation which can be described as a line of text. Many people reading this may have only a vague idea of the history of line notations. Without that history, it's hard to understand what helped make SMILES successful.

The original line notations were developed in the 1800s. By the late 1800s chemists began to systematize the language into what is now called the IUPAC nomenclature. For example, caffeine is "1,3,7-trimethylpurine-2,6-dione". The basics of this system are taught in high school chemistry class. It takes years of specialized training to learn how to generate the correct name for complex structures.

Chemical nomenclature helps chemists index the world's information about chemical structures. In short, if you can assign a unique name to a chemical structure (a "canonical" name), then it you can use standard library science techniques to find information about the structure.

The IUPAC nomenclature was developed when books and index cards were the best way to organize data. Punched card machines brought a new way of thinking about line notations. In 1946, G. Malcolm Dyson proposed a new line notation meant for punched cards. The Dyson notion was developed as a way to mechanize the process of organizing and publishing a chemical structure index. It became a formal IUPAC notation in 1960, but was already on its last legs and dead within a few years. While it might have been useful for mechanical punched card machines, it wasn't easily repurposed for the computer needs of the 1960s. For one, it depended on superscripts and subscripts, and used characters which didn't exist on the IBM punched cards.

William J. Wiswesser in 1949 proposed the Wiswesser Line Notation, universally called WLN, which could be represented in EBCIDIC and (later) ASCII in a single line of text. More importantly, unlike the Dyson notation, which follows the IUPAC nomenclature tradition of starting with the longest carbon chain, WLN focuses on functional groups, and encodes many functional groups directly as symbols.

Chemists tend to be more interested in functional groups, and want to search based on those groups. For many types of searches, WLN acts as its own screen, that is, it's possible to do some types of substructure search directly on the symbols of the WLN, without having to convert the name into a molecular structure for a full substructure search. To search for structures containing a single sulfur, look for WLNs with a single occurrence of S, but not VS or US or SU. The chemical information scientists of the 1960s and 1970s developed several hundred such clever pattern searches to make effective use of the relatively limited hardware of that era.

WLNs started to disappear in the early 1980s, before SMILES came on the scene. Wendy Warr summarized the advantages and disadvantages of WLNs in 1982. She wrote "The principle disadvantage of WLN is that it is not user friendly. This can only be overcome by programs which will derive a canonical WLN from something else (but no one has yet produced a cost-effective program to do this for over 90% of compounds), by writing programs to generate canonical connection tables from noncanonical WLNs, or by accepting the intervention of a skilled "middle man"."

Dyson/IUPAC and WLNs were just two of dozens, if not hundreds, of proposed line notations. Nearly every proposal suffered from a fatal flaw - they could not easily be automated on a computer. Most required postgraduate-level knowledge of chemistry, and were error-prone. The more rigorous proposals evaluated the number of mistakes made during data entry.

One of the few exceptions are the "parentheses free" notations from a pair of papers from 1964, one by Hiz and the other by Eisman, in the same issue of the Journal of Chemical Documentation. In modern eyes, they very much like SMILES but represented in a postfix notation. Indeed, the Eisman paper gives a very SMILES-like notation for a tree structure, "H(N(C(HC(CIC(N(HH)C(HN(IH)))I)H)H))" and a less SMILES-like notation for a cyclic structure, before describing how to convert them into a postfix form.

I consider the parentheses-free nomenclatures a precursor to SMILES, but they were not influential to the larger field of chemical information. I find this a bit odd, and part of my research has been to try and figure out why. It's not like it had no influence. A version of this notation was in the Chemical Information and Data System (CIDS) project in the 1960s and early 1970s. In 1965, "CIDS was the first system to demonstrate online [that is, not batch processing] searching of chemical structures", and CIDS wasn't completely unknown in the chemical information field.

But most of the field in the 1970s went for WLN for a line notation, or a canonical connection table.

SMILES

Dave did not know about the parentheses free line notations when he started work on SMILES, but he drew from similar roots in linguistics. Dave was influenced by Chomsky's writings on linguistics. Hiz, mentioned earlier, was at the Department of Linguistics at the University of Pennsylvania, and that's also where Eugene Garfield did his PhD work on the linguistics of chemical nomenclature.

Dave's interest in chemical representations started when he was a kid. His father, Joseph Weininger, was a chemist at G.E., with several patents to his hame. He would draw pictures of chemical compounds for Dave, and Dave would, at a non-chemical level, describe how they were put together. These seeds grew into what became SMILES.

SMILES as we know it started when Dave was working for the EPA in Duluth. They needed to develop a database of environmental compounds, to be entered by untrained staff. (For the full story of this, read Ed Regis's book "The Info Mesa: Science, Business, and New Age Alchemy on the Santa Fe Plateau.") As I recall, SMILES was going to the the internal language, with a GUI for data entry, but it turns out that SMILES was easy enough for untrained data entry people to write it directly.

And it's simple. I've taught the basics of SMILES to non-chemist programmers in a matter of minutes, while WLN, Dyson, and InChI, as example of other line notations, are much more difficult to generate either by hand or by machine. Granted, those three notations have canonicalization rules built into them, which is part of the difficulty. Still, I asked Dave why something like SMILES didn't appear earlier, given that the underlying concepts existed in the literature by then.

He said he believes it's because the generation of people before him didn't grow up with the software development background. I think he's right. When I go to a non-chemist programer, I say "it's a spanning tree with special rules to connect the cycles", and they understand. But that vocabulary was still new in the 1960s, and very specialized.

There's also some conservatism in how people work. Dyson defended the Dyson/IUPAC notation, saying that it was better than WLN because it was based on the longest carbon chain principle that chemists were already familiar with, even though the underlying reasons for that choice were becoming less important because of computer search. People know what works, and it's hard to change that mindset when new techniques become available.

Why the name SMILES? Dave Cosgrove told me it was because when Dave woud tell people he had a new line notations, most would "groan, or worse. When he demonstrated it to them, that would rapidly turn to a broad smile as they realised how simple and powerful it is."

Exchange vs. Canonical SMILES

I not infrequently come across people who say that SMILES is a proprietary format. I disagree. I think the reason for the disagreement is that two different concepts go under the name "SMILES". SMILES is an exchange language for chemistry, and it's an identifier for chemical database search. Only the second is proprietary.

Dave wanted SMILES as a way for chemists from around the world and through time to communicate. SMILES describes a certain molecular valence model view of the chemistry. This does not need to be canonical, because you can always do that yourself once you have the information. I can specify hydrogen cyanide as "C#N", "N#C", or "[H][C]#[N]" and you will be able to know what I am talking about, without needing to consult some large IUPAC standard.

He wanted people to use SMILES that way, without restriction. The first SMILES paper describes the grammar. Later work at Daylight in the 1990s extended SMILES to handle isotopes and stereochemistry. (This was originally called "isomeric SMILES", but it's the SMILES that people think of when they want a SMILES.) Daylight published the updated grammar on their website. It was later included as part of Gasteiger's "Handbook of Chemoinformatics: From Data to Knowledge in 4 Volumes". Dave also helped people at other companies develop their own SMILES parsers.

To say that SMILES as an exchange format is proprietary is opposite to what Dave wanted and what Daylight did.

What is proprietary is the canonicalization algorithm. The second SMILES paper describes the CANGEN algorithm, although it is incomplete and doesn't actually work. Nor does it handle stereochemistry, which was added years later. Even internally at Daylight, it took many years to work out all of the bugs in the implementation.

There's a good financial reason to keep the algorithm proprietary. People were willing to pay a lot of money for a good, fast chemical database, and the canonicalization algorithm was a key part of Daylight's Thor and Merlin database servers. In business speak, this is part of Daylight's "secret sauce".

On the other hand, there's little reason for why that algorithm should be published. Abstractly speaking, it would mean that different tools would generate the same canonical SMILES, so a federated data search would reduce to a text search, rather than require a re-canonicalization. This is one of the goals of the InChI project, but they discovered that Google didn't index the long InChI strings in a chemically useful way. They created the InChI key as a solution. SMILES has the same problem and would need a similar solution.

Noel O'Boyle published a paper pointed out that the InChI canonicalization assignment could be used to assign the atom ordering for a SMILES string. This would give a universal SMILES that anyone could implement. There's been very little uptake of that idea, which gives a feel of how little demand there is.

Sometimes people also include about the governance model to decide if something is proprietary or not, or point to the copyright restrictions on the specification. I don't agree with these interpretations, and would gladly talk about them at a conference meeting if you're interested.

Line notations and connection tables

There are decades of debate on the advantages of line notations over connection tables, or vice versa. In short, connection tables are easy to understand and parse into an internal molecule data structure, while line notations are usually more compact and can be printed on a single line. And in either case, at some point you need to turn the text representation into a data structure and treat it as a chemical compound rather than a string.

Line notations are a sort of intellectual challenge. This alone seems to justify some of the many papers proposing a new line notation. By comparison, Open Babel alone supports over 160 connection table formats, and there are untold more in-house or internal formats. Very few of these formats have ever been published, except perhaps in an appendix in a manual.

Programmers like simple formats because they are easy to parse, often easy to parse quickly, and easy to maintain. Going back the Warr quote earlier, it's hard to parse WLN efficiently.

On the other hand, line notations fit better with text-oriented systems. Back in the 1960s and 1970s, ISI (the Institute for Scientific Information) indexed a large subset of the chemistry literature and distributed the WLNs as paper publications, in a permuted table to help chemists search the publication by hand. ISI was a big proponent of WLN. And of course it was easy to put a WLN on a punched card and search it mechanically, without an expensive computer,

Even now, a lot of people use Excel or Spotfire to display their tabular data. It's very convenient to store the SMILES as a "word" in a text cell.

Line notations also tend to be smaller than connection tables. As an example, the connection table lines from the PubChem SD files (excluding the tag data) average about 4K per record. The PUBCHEM_OPENEYE_ISO_SMILES tag values average about 60 bytes in length.

Don't take the factor of 70 as being all that meaningful. The molfile format is not particularly compact, PubChem includes a bunch of "0" entries which could be omitted, and the molfile stores the coordinates, which the SMILES does not. The CAS search system in the late 1980s used about 256 bytes for each compact connection table, which is still 4x larger than the equivalent SMILES.

Dave is right. SMILES, unlike most earlier line notations, really is built with computer parsing in mind. Its context-free grammar is easy to parse using simple stack, thought still not as easy as a connection table. It doesn't require much in the way of lookup tables or state information. There's also a pretty natural mapping from the SMILES to the corresponding topology.

What happens if you had a really fast SMILES parser? As a thought experiment which doesn't reflect real hardware, suppose you could convert 60 bytes of SMILES string to a molecule data structure faster than you could read the additional 400 bytes of connection table data. (Let's say the 10 GHz CPU is connected to the data through a 2400 baud modem.) Then clearly it's best to use a SMILES, even if it takes longer to process.

A goal for the Daylight toolkit was to make SMILES parsing so fast that there was no reason to store structures in a special internal binary format or data structure. Instead, when it's needed, parse the SMILES into a molecule object, use the molecule, and throw it away.

On the topic of muck and horseshoes, as I recall Daylight hired an outside company at one point to go through the code and optimize it for performance.

SMARTS

SMARTS is a natural recasting and extension of the SMILES grammar to define a related grammar for substructure search.

I started in chemical information in the late 1990s, with the Daylight toolkit and a background which included university courses on computer grammars like regular expressions. The analogy of SMARTS to SMILES is like regular expressions to strings seems obvious, and I modeled my PyDaylight API on the equivalent Python regular expression API.

Only years later did I start to get interested in the history of chemical information, though I've only gotten up to the early 1970s so there's a big gap that I'm missing. Clearly there were molecular query representations before SMARTS. What I haven't found is a query line notation, much less one implemented in multiple systems. This is a topic I need to research more.

The term "fingerprint"

Fingerprints are a core part of modern cheminformatics. I was therefore surprised to discover that Daylight introduced the term "fingerprint" to the field, around 1990 or so.

The concept existed before then. Adamson and Bush did some of the initial work in using fingerprint similarity as a proxy for molecular similarity in 1973, and Willett and Winterman's 1986 papers [1, 2] (the latter also with Bawden) reevaluated the earlier work and informed the world of the effectiveness of the Tanimoto. (We call it "Tanimoto" instead of "Jaccard" precisely because of those Sheffield papers.)

But up until the early 1990s, published papers referred to fingerprints as the "screening set" or "bit screens", which describes the source of the fingerprint data, and didn't reifying them into an independent concept. The very first papers which used "fingerprint" were by Yvonne Martin, an early Daylight user at Abbott, and John Barnard, who uses "fingerprint", in quotes, in reference specifically to Daylight technology.

I spent a while trying to figure out the etymology of the term. I asked Dave about it, but it isn't the sort of topic which interests him, and he didn't remember. "Fingerprint" already existed in chemistry, for IR spectra, and the methods for matching spectra are somewhat similar to those of cheminformatics fingerprint similarity, but not enough for me to be happy with the connection. Early in his career Dave wrote software for a mass spectrometer, so I'm also not rejecting the possibility.

The term "fingerprint" was also used in cryptographic hash functions, like "Fingerprinting by Random Polynomial" by Rabin (1981). However, these fingerprints can only be used to test if two fingerprints are identical. They are specifically designed to make it hard to use the fingerprints to test for similarity of the source data.

I've also found many papers talking about image fingerprints or audio fingerprints which can be used for both identity and similarity testing; so-called "perceptual hashes". However, their use of "fingerprint" seems to have started a good decade after Daylight popularized it in cheminformatics.

Hash fingerprints

Daylight needed a new name for fingerprints because they used a new approach to screening.

Fingerprint-like molecular descriptions go back to at least the Frear code of the 1940s. Nearly all of the work in the intervening 45 years was focused on finding fragments, or fragment patterns, which would improve substructure screens.

Screen selection was driven almost entirely by economics. Screens are cheap, with data storage as the primary cost. Atom-by-atom matching, on the other hand, had a very expensive CPU cost. The more effective the screens, the better the screenout, the lower the overall cost for an exact substructure search.

The best screen would have around 50% selection/rejection on each bit, with no correlation between the bits. If that could exist, then an effective screen for 1 million structures would need only 20 bits. This doesn't exist, because few fragments meet that criteria. The Sheffield group in the early 1970s (who often quoted Mooers as the source of the observation) looked instead at more generic fragment descriptions, rather than specific patterns. This approach was further refined by BASIC in Basel and then at CAS to become the CAS Online screens. This is likely the pinnacle of the 1970s screen development.

Even then, it had about 4000 patterns assigned to 2048 bits. (Multiple rare fragments can be assigned to the same bit with little reduction in selectivity.)

A problem with a fragment dictionary is that it can't take advantage of unknown fragments. Suppose your fragment dictionary is optimized for pharmaceutical compounds, then someone does a search for plutonium. If there isn't an existing fragment definition like "transuranic element" or "unusual atom", then the screen will not be able to reject any structures. Instead, it will slowly go through the entire data set only to return no matches.

This specific problem is well known, and the reason for the "OTHER" bit of the MACCS keys. However, other types of queries may still have an excessive number of false positives during screening.

Daylight's new approach was the enumeration-based hash fingerprint. Enumerate all subgraphs of a certain size and type (traditionally all linear paths with up to 7 atoms), choose a canonical order, and use the atom and bond types in order t generate a hash value. Use this value to seed a pseudo random number generator, then generate a few values to set bits in the fingerprint; the specific number depends on the size of the subgraph. (The details of the hash function and the number of bits to set were also part of Daylight's "secret sauce.")

Information theory was not new in chemical information. Calvin Mooers developed superimposed coding back in the 1940s in part to improve the information density of chemical information on punched cards (and later complained about how computer scientists rediscovered it as hash tables). The Sheffield group also used information theory to guide their understanding of the screen selection problem. Feldman and Hodes in the 1970s developed screens by the full enumeration of common subgraphs of the target set and a variant of Mooers' superimposed coding.

But Daylight was able to combine information theory and computer science theory (i.e., hash tables) to develop a fingerprint generation technique which was completely new. And I do mean completely new.

Remember how I mentioned there are SMILES-like line notations in the literature, even if people never really used them? I've looked hard, and only with a large stretch of optimism can I find anything like the Daylight fingerprints before Daylight, and mostly as handwaving proposal for what might be possible. Nowadays, almost every fingerprint is based on a variation of the hash approach, rather than a fragment dictionary.

In addition, because of the higher information density, Daylight fingerprints were effective as both a substructure screen and a similarity fingerprint using only 1024 bits, instead of the 2048 bits of the CAS Online screen. This will be important in the next section.

Chemical data vs. compute power and storage size

CAS had 4 million chemical records in 1968. The only cost-effective way to store that data was on tape. Companies could and did order data tapes from CAS for use on their own corporate computers.

Software is designed for the hardware, so the early systems were built first for tape drives and then, as they became affordable, for the random-access capabilities of hard disks. A substructure search would first check against the screens to reject obvious mismatches, then for each of the remaining candidates, read the corresponding record off disk and do the full atom-by-atom match.

Apparently Derek Price came up with the "law of exponential increase" in "Science Since Babylon" (1961), which describes how science information has exponential growth. I've only heard about that second hand. Chemical data is no exception, and its exponential growth was noted, I believe, in the 1950s by Perry.

In their 1971 text book, Lynch, et al. observed the doubling period was about 12 years. I've recalculated that number over a longer baseline, and it still holds. CAS has 4 million structures in 1968 and 100 million structure in 2015, which is a doubling every 10-11 years.

On the other hand, computers have gotten more powerful at a faster rate. For decades the effective computing power doubled every 2 years, and the amount of RAM and data storage for constant dollars has doubled even faster than that.

In retrospect it's clear that at some point it would be possible to store all of the world's chemistry data, or at least a corporate database, in memory.

Weininger's Realization

Disks are slow. Remember how the atom-by-atom search needed to pull a record off the disk? That means the computer needs to move the disk arm to the right spot and wait for the data to come by, while the CPU simply waits. If the data were in RAM then it would be 10,000x faster to fetch a randomly chosen record.

Putting it all in RAM sounds like the right solution, but in in the early 1980s memory was something like $2,000 per MB while hard disk space was about $300 per MB. One million compounds with 256 bytes per connection table and 256 bytes per screen requires almost 500MB of space. No wonder people kept the data on disk.

By 1990, RAM was about $80/MB while hard disk storage was $4/MB, while the amount of chemistry data had only doubled.

Dave, or at least someone at Daylight, must have realized that the two different exponential growth rates make for a game changer, and that the Daylight approach would give them a head start over the established vendors. This is explicit in the Daylight documentation for Merlin:

The basic idea behind Merlin is that data in a computer's main memory can be manipulated roughly five orders of magnitude faster than data on its disk. Throughout the history of computers, there has been a price-capacity-speed tradeoff for data storage: Large-capacity storage (tapes, drums, disks, CD-ROMS) is affordable but slow; high-speed storage ("core", RAM) is expensive but fast. Until recently, high-speed memory was so costly that even a modest amount of chemical information had to be stored on tapes or disks.

But technology has a way of overwhelming problems like this. The amount of chemical information is growing at an alarming rate, but the size of computer memories is growing even faster: at an exponential rate. In the mid-1980's it became possible for a moderately large minicomputer to fit a chemical database of several tens of thousands of structures into its memory. By the early 1990's, a desktop "workstation" could be purchased that could hold all of the known chemicals in the world (ca. 15 million structures) in its memory, along with a bit of information about each.

On the surface, in-memory operations seem like a straightforward good deal: A computer's memory is typically 105 times faster than its disk, so everything you could do on disk is 100000 times faster when you do it in memory. But these simple numbers, while impressive, don't capture the real differences between disk- and memory-based searches:

  • With disk-based systems, you formulate a search carefully, because it can take minutes to days to get your answer back. With Merlin it is usually much faster to get the answer than it is to think up the question. This has a profound effect on user's attitudes towards the EDA system.
  • In disk-based systems, you typically approach with a specific question, often a question of enough significance that you are willing to invest significant effort to find the answer. With Merlin, it is possible to "explore" the database in "real-time" - to poke around and see what is there. Searches are so fast that users adopt a whole new approach to exploratory data analysis.

Scaling down

I pointed out earlier that SMILES and fingerprints take up less space. I estimate it was 1/3 the space of what CAS needed, which is the only comparison I've been able to figure out. That let Daylight scale up to larger data set for a given price, but also scale down to smaller hardware.

Let's say you had 250,000 structures in the early 1990s. With the Daylight system you would need just under 128 MB of RAM, which meant you could buy a Sun 3, which maxed out at 128 MB, instead of a more expensive computer.

It still requires a lot of RAM, and that's where Yosi comes in. His background was in hardware sales, and he know how to get a computer with a lot of RAM in it. Once the system was ready, Dave and his brother Art put it in the back of a van and went around the country to potential customers to give a demo, often to much astonishment that it could be so fast.

I think the price of RAM was the most important hardware factor to the success of Daylight, but it's not the only one. When I presented some of these ideas at Novartis in 2015, Bernhard Rohde correctly pointed out that decreasing price of hardware also meant that computers were no longer big purchase items bought and managed by IT, but something that even individual researchers could buy. That's another aspect of scaling down.

While Daylight did sell to corporate IT, their heart was in providing tools and especially toolkits to the programmer-chemists who would further develop solutions for their company.

Success and competitors

By about 1990, Daylight was a market success. I have no real idea how much profit the company made, but it was enough that Dave bought his own planes, including a fighter jet. When I was at the Daylight Summer School in 1998, the students over dinner came up with a minimum of $15 million in income, and maximum of $2 million in expenses.

It was also a scientific success, measured by the number of people talking about SMILES and fingerprints in the literature.

I am not a market analyst so I can't give that context. I'm more of a scholar. I've been reading through the old issues of JCICS (now titled JCIM) trying to identify the breakthrough transition for Daylight. There is no bright line, but there is tantalizing between-the-lines.

In 1992 or so (I'll have to track it down), there's a review of a database vendors' product. The reviewer mentions that the vendor plans to have an in-memory database the next year. I can't help but interpret it as a competitor responding to the the new Daylight system, and having to deal with customers who now understood Weininger's Realization.

Dave is a revolutionary

The decreasing price of RAM and hardware may help explain the Daylight's market success, but Dave wasn't driven by trying to be the next big company. You can see that in how the company acted. Before the Oracle cartridge, they catered more towards the programmer-chemist. They sold VMS and then Unix database servers and toolkits, with somewhat primitive database clients written using the XView widget toolkit for X. I remember Dave once saying that the clients were meant as examples of what users could do, rather than as complete applications. A different sort of company would have developed Windows clients and servers, more tools for non-programmer chemists, and focused more on selling enterprise solutions to IT and middle managers.

A different sort of company would have tried to be the next MDL. Dave didn't think they were having fun at MDL, so why would he want to be like them?

Dave was driven by the idea of bringing chemical information away from the "high priests" who held the secret knowledge of how to make things work. Look at SMILES - Dyson and WLN required extensive training, while SMILES could be taught to non-chemists in an hour or so. Look at fingerprints - the CAS Online screens were the result of years of research in picking out just the right fragments, based on close analysis of the types of queries people do, while the hash fingerprints can be implemented in a day. Look even at the Daylight depictions, which were well known as being ugly. But Dave like to point out that the code, at least originally, needed only 4K. That's the sort of code a non-priest could understand, and the sort of justification a revolutionary could appreciate.

Dave is a successful revolutionary, which is rare. SMILES, SMARTS and fingerprints are part of the way we think about modern cheminformatics. Innumerable tools implement them, or variations of them.

High priests of chemical information

Revolutionary zeal is powerful. I remember hearing Dave's "high priests" talk back in 1998 and the feeling empowered, that yes, even as a new person in the field, cheminformatics is something I can take on on my own.

As I learn more about the history of the field, I've also learned that Dave's view is not that uncommon. In the post-war era the new field of information retrieval wanted to challenge the high priests of library science. (Unfortunately I can't find that reference now.)

Michael Lynch would have been the high priest of chemical information if there ever was one. Yet at ICCS 1988 he comments "I can recollect very little sense, in 1973, that this revolution was imminent. Georges Anderla .. noted that the future impact of very large scale integration (VLSI) was evident only to a very few at that time, so that he quite properly based his projections on the characteristics of the mainframe and minicomputer types then extant. As a result, he noted, he quite failed to see, first, that the PC would result in expertise becoming vastly more widely disseminated, with power passing out of the hands of the small priesthood of computer experts, thus tapping a huge reservoir of innovative thinking, and, second, that the workstation, rather than the dumb terminal, would become standard."

A few years I talked with Egon Willighagen. He is one of the CDK developers and an advocate of free and open source software for chemical information. He also used the metaphor of talking information from the high priests to the people, but in his case he meant the previous generation of closed commercial tools, like the Daylight toolkit.

Indeed, one way to think of it is that Dave the firebrand of Daylight became it high priest of Daylight, and only the priests of Daylight control the secret knowledge of fingerprint generation and canonicalization.

That's why I no longer like the metaphor. Lynch and the Sheffield group published many books and papers, including multiple textbooks on how to work with chemical information. Dave and Daylight did a lot of work to disseminate the Daylight way of thinking about cheminformatics. These are not high priests hoarding occult knowledge, but humans trying to do the right thing in an imperfect world.

There's also danger in the metaphor. Firebrand revolutionaries doesn't tend to know history. Perhaps some of the temple should be saved? At the very least there might be bad feelings if you declare your ideas revolutionary only to find out that not only they are not new, and you are talking to the previous revolutionary who proposed them.

John Barnard told me a story of Dave and Lynch meeting at ICCS in 1988, I believe. Dave explained how his fingerprint algorithm worked. Lynch commented something like "so it's like Calvin Mooers' superimposed coding"? Lynch knew his history, and he was correct - fingerprints and superimposed coding are related, though not the same. Dave did not know the history or how to answer the question.

My view has its own danger. With 75 years of chemical information history, one might feel a paralysis of not doing something out of worry that it's been done before and you just don't know about it.

Leaving Daylight

In the early 2000s Dave became less interested in chemical information. He had grown increasingly disenchanted with the ethics of how pharmaceutical companies work and do research. Those who swear allegience to money would have no problems just making a sale and profits, but he was more interested in ideas and people. He had plenty of money.

He was concerned about the ethics of how people used his work, though I don't know how big a role that was overall. Early on, when Daylight was still located at Pomona, they got purchase request from the US military. He didn't want the Daylight tools to be used to make chemical weapons, and put off fulfilling the sale. The military group that wanted the software contacted him and pointed out that they were actually going to use the tools to develop defenses against chemical weapons, which helped him change his mind.

I think the Daylight Oracle cartridge marked the big transition for Daylight. The 1990s was the end of custom domain-specific databases like Thor and Merlin and the rise of object-relational databases with domain-specific extensions. Norah MacCuish (then Shemetulskis) helped show how the Daylight functionality could work as an Informix DataBlade. Oracle then developed their data cartridge, and most of the industry, including the corporate IT that used the Daylight servers, followed suit.

Most of Daylight's income came from the databases, not the toolkit. Companies would rather use widely-used technology because it's easier to find people with the skills to use it, and there can be better integration with other tools. If Daylight didn't switch, then companies would turn to competitive products which, while perhaps not as elegant, were a better fit for IT needs. Daylight did switch, and the Daylight user group meetings (known as "MUG" because it started off as the Medchem User Group) started being filled by DBAs and IT support, not the geek programmer-chemist that were interested in linguistics and information theory and the other topics that excited Dave.

It didn't help that the Daylight tools were somewhat insular and static. The toolkit didn't have an officially supported way to import SD file, though there was a user-contributed package for that, which Daylight staff did maintain. Ragu Bharadwaj developed Daylight's JavaGrins chemical sketcher, which was a first step towards are more GUI-centered Daylight system, but also the only step. The RDKit, as an example of a modern chemical informatics toolkit, includes algorithms Murko scaffolds, matched molecular pairs, and maximum common substructure, and continues to get new ones. But Daylight didn't go that route either.

What I think it comes down to is that Dave was really interested in databases and compound collections. Daylight was also a reseller of commercial chemical databases, and Dave enjoyed looking through the different sets. He told me there was a different feel to the chemistry done in different countries, and he could spot that by looking at the data. He was less interested in the other parts of cheminformatics, and as the importance of old-school "chemical information" diminished in the newly renamed field of "chem(o)informatics", so too did his interest, as did the interest from Daylight customers.

Post-Daylight

Dave left chemical information and switched to other topics. He tried to make theobromine-free chocolate, for reasons I don't really understand, though I see now that many people buy carob as a chocolate alternative because it's theobromine free and thus stimulant free. He was also interested in binaural recording and hearing in general. He bought the house next door to turn it into a binaural recoding studio.

He became very big into solar power, as a way to help improve the world. He bought 12 power panels, which he called The Twelve Muses, from a German company and installed them for his houses. These were sun trackers, to maximize the power. Now, Santa Fe is at 2,300 m/7,000 ft. elevation, in a dry, near-desert environment. He managed to overload the transformers because they produced a lot more power than the German-made manufacturer expected. Once that was fixed, both houses were solar powered, plus he had several electric cars and motorcycles, and could feed power back into the grid for profit. He provided a recharging station for the homebrew electric car community who wanted to drive from, say, Albuquerque to Los Alamos. (This was years before Teslas were on the market). Because he had his own DC power source, he was also able to provide a balanced power system to his recording studio and minimize the power hum noise.

He tried to persuade the state of New Mexico to invest more in solar power. It makes sense, and he showed that it was possible. But while he was still a revolutionary, he was not a politician, and wasn't able to make the change he wanted.

When I last saw him in late 2015, he was making a cookbook of New Mexico desserts.

Dave will always have a place in my heart.

Andrew Dalke
Trollhättan, Sweden
2 December 2016

Changes

December 02, 2016 12:00 PM


PyCharm

“What’s New in PyCharm 2016.3” Webinar Recording Available Now!

Our webinar “What’s New in PyCharm 2016.3” that we organized on Wednesday November 30th is now available on YouTube:

In this webinar Paul Everitt, PyCharm Developer Advocate, shows you around the new features in PyCharm 2016.3 (and shows some other great features that were introduced in 2016).

He discusses improvements to Django (at 1:33), web development support (at 6:07), database support (at 14:45), and more. PyCharm Professional bundles all the new functionality of the latest versions of WebStorm and DataGrip.

After introducing some of these features, he shows how to create a 2D game in Python using the Python Arcade library (from 17:26 onwards). In the process he shows how to unit test the game, profile the game, and how to debug the game faster.

If you’d like to follow along and play with the code, you can get it from Paul’s GitHub repository.

Keep up with the latest PyCharm news on this blog, and follow us on Twitter (@PyCharm)

The Drive to Develop

-PyCharm Team

December 02, 2016 10:00 AM


Kushal Das

Atomic Working Group update from this week's meeting

Two days back we had a very productive meeting in the Fedora Atomic Working Group. This post is a summary of the meeting. You can find all the open issues of the working group in this Pagure repo. There were 14 people present at the meeting, which happens on every Wednesday 5PM UTC at the #fedora-meeting-1 channel on Freenode IRC server.

Fedora 26 change proposal ideas discussion

This topic was the first point of discussion in the meeting. Walters informed that he will continue working on the Openshift related items, mostly the installer, system containers etc, and also the rpm-ostree. I have started a thread on the mailing list about the change idea, we also decided to create a wiki page to capture all the ideas.

During the Fedora 25 release cycle, we marked couple of Autocloud tests as non-gating as they were present in the system for some time (we added the actual tests after we found the real issue). Now with Fedora 25 out, we decided to reopen the ticket and mark those tests back as gating tests. Means in future if they fail, Fedora 25 Atomic release will be blocked.

The suggestion of creating the rkt base image as release artifact in Fedora 26 cycle, brought out some interesting discussion about the working group. Dusty Mabe mentioned to fix the current issues first, and then only jump into the new world. If this means we will support rkt in the Atomic Host or not, was the other concern. My reaction was maybe not, as to decide that we need to first coordinate between many other teams. rkt is packaged in Fedora, you can just install it in a normal system by the following command.

$ sudo dnf install rkt -y

But this does not mean we will be able to add support for rkt building into OSBS, and Adam Miller reminded us that will take a major development effort. It is also not in the road map of the release-infrastructure team. My proposal is to have only the base image build officially for rkt, and then let our users consume that image. I will be digging more on this as suggested by Dusty, and report back to the working group.

Next, the discussion moved towards a number of technical debts the working group is carrying. One of the major issue (just before F25 release) was about missing Atomic builds, but we managed to fix the issue on time. Jason Brooks commented that this release went much more promptly, and we are making a progress in that :) Few other points from the discussion were

Documentation

Then the discussion moved to documentation, the biggest pain point of the working group in my mind. For any project, documentation can define if it will become a success or not. Users will move on unless we can provide clearly written instructions (which actually works). For the Atomic Working Group, the major problem is not enough writers. After the discussion in the last Cloud FAD in June, we managed to dig through old wiki pages. Trishna Guha is helping us to move them under this repo. The docs are staying live on https://fedoracloud.rtfd.io. I have sent out another reminder about this effort to the mailing list. If you think of any example which can help others, please write it down, and send in a pull request. It is perfectly okay if you publish it in any other document format. We will help you out to convert it into the correct format.

You can read the full meeting log here.

December 02, 2016 06:09 AM


Wesley Chun

Replacing text & images with the Google Slides API with Python

NOTE: The code covered in this post are also available in a video walkthrough however the code here differs slightly, featuring some minor improvements to the code in the video.

Introduction

One of the critical things developers have not been able to do previously was access Google Slides presentations programmatically. To address this "shortfall," the Slides team pre-announced their first API a few months ago at Google I/O 2016—also see full announcement video (40+ mins). In early November, the G Suite product team officially launched the API, finally giving all developers access to build or edit Slides presentations from their applications.

In this post, I'll walk through a simple example featuring an existing Slides presentation template with a single slide. On this slide are placeholders for a presentation name and company logo, as illustrated below:

One of the obvious use cases that will come to mind is to take a presentation template replete with "variables" and placeholders, and auto-generate decks from the same source but created with different data for different customers. For example, here's what a "completed" slide would look like after the proxies have been replaced with "real data:"

Using the Google Slides API

We need to edit/write into a Google Slides presentation, meaning the read-write scope from all Slides API scopes below:
Why is the Google Drive API scope listed above? Well, think of it this way: APIs like the Google Sheets and Slides APIs were created to perform spreadsheet and presentation operations. However, importing/exporting, copying, and sharing are all file-based operations, thus where the Drive API fits in. If you need a review of its scopes, check out the Drive auth scopes page in the docs. Copying a file requires the full Drive API scope, hence why it's listed above. If you're not going to copy any files and only performing actions with the Slides API, you can of course leave it out.

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action.

Getting started

What are we doing in today's code sample? We start with a slide template file that has "variables" or placeholders for a title and an image. The application code will go then replace these proxies with the actual desired text and image, with the goal being that this scaffolding will allow you to automatically generate multiple slide decks but "tweaked" with "real" data that gets substituted into each slide deck.

The title slide template file is TMPFILE, and the image we're using as the company logo is the Google Slides product icon whose filename is stored as the IMG_FILE variable in my Google Drive. Be sure to use your own image and template files! These definitions plus the scopes to be used in this script are defined like this:
IMG_FILE = 'google-slides.png'     # use your own!
TMPLFILE = 'title slide template' # use your own!
SCOPES = (
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/presentations',
)
Skipping past most of the OAuth2 boilerplate, let's move ahead to creating the API service endpoints. The Drive API name is (of course) 'drive', currently on 'v3', while the Slides API is 'slides' and 'v1' in the following call to create a signed HTTP client that's shared with a pair of calls to the apiclient.discovery.build() function to create the API service endpoints:
HTTP = creds.authorize(Http())
DRIVE = discovery.build('drive', 'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

Copy template file

The first step of the "real" app is to find and copy the template file TMPLFILE. To do this, we'll use DRIVE.files().list() to query for the file, then grab the first match found. Then we'll use DRIVE.files().copy() to copy the file and name it 'Google Slides API template DEMO':
rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

Find image placeholder

Next, we'll ask the Slides API to get the data on the first (and only) slide in the deck. Specifically, we want the dimensions of the image placeholder. Later on, we will use those properties when replacing it with the company logo, so that it will be automatically resized and centered into the same spot as the image placeholder.
The SLIDES.presentations().get() method is used to read the presentation metadata. Returned is a payload consisting of everything in the presentation, the masters, layouts, and of course, the slides themselves. We only care about the slides, so we get that from the payload. And since there's only one slide, we grab it at index 0. Once we have the slide, we're loop through all of the elements on that page and stop when we find the rectangle (image placeholder):
print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID
).execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
if obj['shape']['shapeType'] == 'RECTANGLE':
break

Find image file

At this point, the obj variable points to that rectangle. What are we going to replace it with? The company logo, which we now query for using the Drive API:
print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token) 
The query code is similar to when we searched for the template file earlier. The trickiest thing about this snippet is that we need a full URL that points directly to the company logo. We use the DRIVE.files().get_media() method to create that request but don't execute it. Instead, we dig inside the request object itself and grab the file's URI and merge it with the current access token so what we're left with is a valid URL that the Slides API can use to read the image file and create it in the presentation.

Replace text and image

Back to the Slides API for the final steps: replace the title (text variable) with the desired text, add the company logo with the same size and transform as the image placeholder, and delete the image placeholder as it's no longer needed:
print('** Replacing placeholder text and icon')
reqs = [
{'replaceAllText': {
'containsText': {'text': '{{NAME}}'},
'replaceText': 'Hello World!'
}},
{'createImage': {
'url': img_url,
'elementProperties': {
'pageObjectId': slide['objectId'],
'size': obj['size'],
'transform': obj['transform'],
}
}},
{'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
presentationId=DECK_ID).execute()
print('DONE')
Once all the requests have been created, send them to the Slides API then let the user know everything is done.

Conclusion

That's the entire script, just under 60 lines of code. If you watched the video, you may notice a few minor differences in the code. One is use of the fields parameter in the Slides API calls. They represent the use of field masks, which is a separate topic on its own. As you're learning the API now, it may cause unnecessary confusion, so it's okay to disregard them for now. The other difference is an improvement in the replaceAllText request—the old way in the video is now deprecated, so go with what we've replaced it with in this post.

If your template slide deck and image is in your Google Drive, and you've modified the filenames and run the script, you should get output that looks something like this:
$ python3 slides_template.py
** Copying template 'title slide template' as 'Google Slides API template DEMO'
** Get slide objects, search for image placeholder
** Searching for icon file
- Found image 'google-slides.png'
** Replacing placeholder text and icon
DONE
Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into major sections, they would be:
Here's the complete script—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

IMG_FILE = 'google-slides.png' # use your own!
TMPLFILE = 'title slide template' # use your own!
SCOPES = (
'https://www.googleapis.com/auth/drive',
'https://www.googleapis.com/auth/presentations',
)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
creds = tools.run_flow(flow, store)
HTTP = creds.authorize(Http())
DRIVE = discovery.build('drive', 'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID,
fields='slides').execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
if obj['shape']['shapeType'] == 'RECTANGLE':
break

print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token)

print('** Replacing placeholder text and icon')
reqs = [
{'replaceAllText': {
'containsText': {'text': '{{NAME}}'},
'replaceText': 'Hello World!'
}},
{'createImage': {
'url': img_url,
'elementProperties': {
'pageObjectId': slide['objectId'],
'size': obj['size'],
'transform': obj['transform'],
}
}},
{'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
presentationId=DECK_ID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Add more slides and/or text variables and modify the script replace them too. EXTRA CREDIT: Change the image-based image placeholder to a text-based image placeholder, say a textbox with the text, "{{COMPANY_LOGO}}" and use the replaceAllShapesWithImage request to perform the image replacement. By making this one change, your code should be simplified from the image-based image replacement solution we used in this post.

December 02, 2016 01:33 AM

December 01, 2016


Django Weblog

Django bugfix release issued: 1.10.4, 1.9.12, 1.8.17

Today we've issued the 1.10.4, 1.9.12, and 1.8.17 bugfix releases.

The release package and checksums are available from our downloads page, as well as from the Python Package Index. The PGP key ID used for this release is Tim Graham: 1E8ABDC773EDE252.

December 01, 2016 11:53 PM