skip to navigation
skip to content

Planet Python

Last update: December 06, 2021 07:41 PM UTC

December 06, 2021


Python Morsels

Turn a list into a string

Let's talk about how you can convert a list into a string in Python.

Converting a list into a string with str

Let's say we have a list of strings called things:

>>> things = ["apples", "ducks", "lemons", "cats", "potatoes"]

If we'd like to turn this list into a single string, we could pass it to the built-in str function:

>>> str(things)
"['apples', 'ducks', 'lemons', 'cats', 'potatoes']"

But the output we get probably isn't what we were looking for. At least not if we're trying to pass this to an end-user rather than another programmer.

So we need to ask ourselves, what are we really trying to do here? Basically, what I'd like to do is join together each of the strings in this list by some kind of delimiter (like a space for example).

Joining a list of strings together

Many programming languages have a join method that you can call on the list type or the array type, passing it a delimiter to join together all the strings in that list by a given delimiter. Python doesn't have this!

>>> things.join(" ")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'join'

In Python, our list type does not have a join method. Instead, our string type has a join method:

>>> " ".join(things)
'apples ducks lemons cats potatoes'

Here, we've asked the space character (" ") to kindly use itself as a delimiter, joining together the items in the list of strings we've passed to it.

Concatenating a list of strings with an arbitrary delimeter

We can use any string as a delimiter. For example, using ", " as a delimiter would put a comma and a space between each of these strings:

>>> ", ".join(things)
'apples, ducks, lemons, cats, potatoes'

Or an empty string ("") would put nothing between each of them, concatenating them all (smooshing them together into one long word):

>>> "".join(things)
'applesduckslemonscatspotatoes'

Python's join method will accept any iterable of strings

Why does Python do it this way? Why is the join method on strings instead of the lists? This seems kind of backwards, right?

Python does it this way for the sake of flexibility.

In Python, we have lots of types of iterables (not just lists) and we tend to think in terms of duck typing (if it looks like a duck and quacks like a duck, it's a duck). That is, we care about the behavior (e.g. iterability) of an object instead of its type (e.g. list).

Because the join method is on the string type, we can pass in any iterable of strings to it. For example we can join together a tuple of strings:

>>> words = ("apple", "animal", "Australia")
>>> " ".join(words)
'apple animal Australia'

Or we can join together a file object (file objects are iterable in Python and when you loop over them you get lines):

>>> my_file = open("things.txt")
>>> "".join(my_file)
'apples\nducks\nlemons\ncats\npotatoes\n'

We can also join together a generator object:

>>> my_file = open("things.txt")
>>> ", ".join(line.rstrip("\n") for line in my_file)
'apples, ducks, lemons, cats, potatoes'

That generator expression removes new lines from the end of each line in a file and we're using the string join method to join those lines together with a comma and a space.

Converting an iterable of non-strings into a single string

What if the iterable that we're joining together isn't an iterable of strings? What if it's an iterable of numbers?

When we try to join together an iterable of items that aren't strings we'll get a TypeError:

>>> numbers = [2, 1, 3, 4, 7, 11, 18]
>>> " ".join(numbers)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sequence item 0: expected str instance, int found

Because we know that the join method accepts any iterable of strings, we could write a generator expression that converts each of our items into a string by passing it to the built-in str function:

>>> " ".join(str(n) for n in numbers)
'2 1 3 4 7 11 18'

So even if you're working with an iterable of non-strings, you can join them together as long as you do a little bit of pre-processing first.

Summary

If you have a list of strings in Python or any iterable of strings, and you'd like it to turn it into a single string by joining each item together with a delimiter, you can make a delimiter string, call the join method on it, and passing in your iterable of strings.

December 06, 2021 04:30 PM UTC


Real Python

Prettify Your Data Structures With Pretty Print in Python

Dealing with data is essential for any Pythonista, but sometimes that data is just not very pretty. Computers don’t care about formatting, but without good formatting, humans may find something hard to read. The output isn’t pretty when you use print() on large dictionaries or long lists—it’s efficient, but not pretty.

The pprint module in Python is a utility module that you can use to print data structures in a readable, pretty way. It’s a part of the standard library that’s especially useful for debugging code dealing with API requests, large JSON files, and data in general.

By the end of this tutorial, you’ll:

  • Understand why the pprint module is necessary
  • Learn how to use pprint(), PrettyPrinter, and their parameters
  • Be able to create your own instance of PrettyPrinter
  • Save formatted string output instead of printing it
  • Print and recognize recursive data structures

Along the way, you’ll also see an HTTP request to a public API and JSON parsing in action.

Free Bonus: Click here to get a Python Cheat Sheet and learn the basics of Python 3, like working with data types, dictionaries, lists, and Python functions.

Understanding the Need for Python’s Pretty Print

The Python pprint module is helpful in many situations. It comes in handy when making API requests, dealing with JSON files, or handling complicated and nested data. You’ll probably find that using the normal print() function isn’t adequate to efficiently explore your data and debug your application. When you use print() with dictionaries and lists, the output doesn’t contain any newlines.

Before you start exploring pprint, you’ll first use urllib to make a request to get some data. You’ll make a request to {JSON} Placeholder for some mock user information. The first thing to do is to make the HTTP GET request and put the response into a dictionary:

>>>
>>> from urllib import request
>>> response = request.urlopen("https://jsonplaceholder.typicode.com/users")
>>> json_response = response.read()
>>> import json
>>> users = json.loads(json_response)

Here, you make a basic GET request and then parse the response into a dictionary with json.loads(). With the dictionary now in a variable, a common next step is to print the contents with print():

>>>
>>> print(users)
[{'id': 1, 'name': 'Leanne Graham', 'username': 'Bret', 'email': 'Sincere@april.biz', 'address': {'street': 'Kulas Light', 'suite': 'Apt. 556', 'city': 'Gwenborough', 'zipcode': '92998-3874', 'geo': {'lat': '-37.3159', 'lng': '81.1496'}}, 'phone': '1-770-736-8031 x56442', 'website': 'hildegard.org', 'company': {'name': 'Romaguera-Crona', 'catchPhrase': 'Multi-layered client-server neural-net', 'bs': 'harness real-time e-markets'}}, {'id': 2, 'name': 'Ervin Howell', 'username': 'Antonette', 'email': 'Shanna@melissa.tv', 'address': {'street': 'Victor Plains', 'suite': 'Suite 879', 'city': 'Wisokyburgh', 'zipcode': '90566-7771', 'geo': {'lat': '-43.9509', 'lng': '-34.4618'}}, 'phone': '010-692-6593 x09125', 'website': 'anastasia.net', 'company': {'name': 'Deckow-Crist', 'catchPhrase': 'Proactive didactic contingency', 'bs': 'synergize scalable supply-chains'}}, {'id': 3, 'name': 'Clementine Bauch', 'username': 'Samantha', 'email': 'Nathan@yesenia.net', 'address': {'street': 'Douglas Extension', 'suite': 'Suite 847', 'city': 'McKenziehaven', 'zipcode': '59590-4157', 'geo': {'lat': '-68.6102', 'lng': '-47.0653'}}, 'phone': '1-463-123-4447', 'website': 'ramiro.info', 'company': {'name': 'Romaguera-Jacobson', 'catchPhrase': 'Face to face bifurcated interface', 'bs': 'e-enable strategic applications'}}, {'id': 4, 'name': 'Patricia Lebsack', 'username': 'Karianne', 'email': 'Julianne.OConner@kory.org', 'address': {'street': 'Hoeger Mall', 'suite': 'Apt. 692', 'city': 'South Elvis', 'zipcode': '53919-4257', 'geo': {'lat': '29.4572', 'lng': '-164.2990'}}, 'phone': '493-170-9623 x156', 'website': 'kale.biz', 'company': {'name': 'Robel-Corkery', 'catchPhrase': 'Multi-tiered zero tolerance productivity', 'bs': 'transition cutting-edge web services'}}, {'id': 5, 'name': 'Chelsey Dietrich', 'username': 'Kamren', 'email': 'Lucio_Hettinger@annie.ca', 'address': {'street': 'Skiles Walks', 'suite': 'Suite 351', 'city': 'Roscoeview', 'zipcode': '33263', 'geo': {'lat': '-31.8129', 'lng': '62.5342'}}, 'phone': '(254)954-1289', 'website': 'demarco.info', 'company': {'name': 'Keebler LLC', 'catchPhrase': 'User-centric fault-tolerant solution', 'bs': 'revolutionize end-to-end systems'}}, {'id': 6, 'name': 'Mrs. Dennis Schulist', 'username': 'Leopoldo_Corkery', 'email': 'Karley_Dach@jasper.info', 'address': {'street': 'Norberto Crossing', 'suite': 'Apt. 950', 'city': 'South Christy', 'zipcode': '23505-1337', 'geo': {'lat': '-71.4197', 'lng': '71.7478'}}, 'phone': '1-477-935-8478 x6430', 'website': 'ola.org', 'company': {'name': 'Considine-Lockman', 'catchPhrase': 'Synchronised bottom-line interface', 'bs': 'e-enable innovative applications'}}, {'id': 7, 'name': 'Kurtis Weissnat', 'username': 'Elwyn.Skiles', 'email': 'Telly.Hoeger@billy.biz', 'address': {'street': 'Rex Trail', 'suite': 'Suite 280', 'city': 'Howemouth', 'zipcode': '58804-1099', 'geo': {'lat': '24.8918', 'lng': '21.8984'}}, 'phone': '210.067.6132', 'website': 'elvis.io', 'company': {'name': 'Johns Group', 'catchPhrase': 'Configurable multimedia task-force', 'bs': 'generate enterprise e-tailers'}}, {'id': 8, 'name': 'Nicholas Runolfsdottir V', 'username': 'Maxime_Nienow', 'email': 'Sherwood@rosamond.me', 'address': {'street': 'Ellsworth Summit', 'suite': 'Suite 729', 'city': 'Aliyaview', 'zipcode': '45169', 'geo': {'lat': '-14.3990', 'lng': '-120.7677'}}, 'phone': '586.493.6943 x140', 'website': 'jacynthe.com', 'company': {'name': 'Abernathy Group', 'catchPhrase': 'Implemented secondary concept', 'bs': 'e-enable extensible e-tailers'}}, {'id': 9, 'name': 'Glenna Reichert', 'username': 'Delphine', 'email': 'Chaim_McDermott@dana.io', 'address': {'street': 'Dayna Park', 'suite': 'Suite 449', 'city': 'Bartholomebury', 'zipcode': '76495-3109', 'geo': {'lat': '24.6463', 'lng': '-168.8889'}}, 'phone': '(775)976-6794 x41206', 'website': 'conrad.com', 'company': {'name': 'Yost and Sons', 'catchPhrase': 'Switchable contextually-based project', 'bs': 'aggregate real-time technologies'}}, {'id': 10, 'name': 'Clementina DuBuque', 'username': 'Moriah.Stanton', 'email': 'Rey.Padberg@karina.biz', 'address': {'street': 'Kattie Turnpike', 'suite': 'Suite 198', 'city': 'Lebsackbury', 'zipcode': '31428-2261', 'geo': {'lat': '-38.2386', 'lng': '57.2232'}}, 'phone': '024-648-3804', 'website': 'ambrose.net', 'company': {'name': 'Hoeger LLC', 'catchPhrase': 'Centralized empowering task-force', 'bs': 'target end-to-end models'}}]

Oh dear! One huge line with no newlines. Depending on your console settings, this might appear as one very long line. Alternatively, your console output might have its word-wrapping mode on, which is the most common situation. Unfortunately, that doesn’t make the output much friendlier!

If you look at the first and last characters, you can see that this appears to be a list. You might be tempted to start writing a loop to print the items:

for user in users:
    print(user)

This for loop would print each object on a separate line, but even then, each object takes up way more space than can fit on a single line. Printing in this way does make things a bit better, but it’s by no means ideal. The above example is a relatively simple data structure, but what would you do with a deeply nested dictionary 100 times the size?

Sure, you could write a function that uses recursion to find a way to print everything. Unfortunately, you’ll likely run into some edge cases where this won’t work. You might even find yourself writing a whole module of functions just to get to grips with the structure of the data!

Enter the pprint module!

Working With pprint

pprint is a Python module made to print data structures in a pretty way. It has long been part of the Python standard library, so installing it separately isn’t necessary. All you need to do is to import its pprint() function:

>>>
>>> from pprint import pprint

Then, instead of going with the normal print(users) approach as you did in the example above, you can call your new favorite function to make the output pretty:

>>>
>>> pprint(users)

This function prints users—but in a new-and-improved pretty way:

>>>
>>> pprint(users)
[{'address': {'city': 'Gwenborough',
              'geo': {'lat': '-37.3159', 'lng': '81.1496'},
              'street': 'Kulas Light',
              'suite': 'Apt. 556',
              'zipcode': '92998-3874'},
  'company': {'bs': 'harness real-time e-markets',
              'catchPhrase': 'Multi-layered client-server neural-net',
              'name': 'Romaguera-Crona'},
  'email': 'Sincere@april.biz',
  'id': 1,
  'name': 'Leanne Graham',
  'phone': '1-770-736-8031 x56442',
  'username': 'Bret',
  'website': 'hildegard.org'},
 {'address': {'city': 'Wisokyburgh',
              'geo': {'lat': '-43.9509', 'lng': '-34.4618'},
              'street': 'Victor Plains',
              'suite': 'Suite 879',
              'zipcode': '90566-7771'},
  'company': {'bs': 'synergize scalable supply-chains',
              'catchPhrase': 'Proactive didactic contingency',
              'name': 'Deckow-Crist'},
  'email': 'Shanna@melissa.tv',
  'id': 2,
  'name': 'Ervin Howell',
  'phone': '010-692-6593 x09125',
  'username': 'Antonette',
  'website': 'anastasia.net'},

 ...

 {'address': {'city': 'Lebsackbury',
              'geo': {'lat': '-38.2386', 'lng': '57.2232'},
              'street': 'Kattie Turnpike',
              'suite': 'Suite 198',
              'zipcode': '31428-2261'},
  'company': {'bs': 'target end-to-end models',
              'catchPhrase': 'Centralized empowering task-force',
              'name': 'Hoeger LLC'},
  'email': 'Rey.Padberg@karina.biz',
  'id': 10,
  'name': 'Clementina DuBuque',
  'phone': '024-648-3804',
  'username': 'Moriah.Stanton',
  'website': 'ambrose.net'}]

Read the full article at https://realpython.com/python-pretty-print/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

December 06, 2021 02:00 PM UTC


Mike Driscoll

PyDev of the Week: Fanilo Andrianasolo

This week we welcome Fanilo Andrianasolo (@andfanilo) as our PyDev of the Week! Fanilo is a core developer for several Streamlit extensions and a Community Ambassador for Streamlit itself. He also has several tutorials on Streamlit that you can check out.

Fanilo Andrianasolo

Let's spend some time getting to know Fanilo!

Can you tell us a little about yourself (hobbies, education, etc)

Hello everyone! My name is Fanilo, I've been doing data analysis for around 8 years now, and am currently working as a Data Science & Business Intelligence Advocate / Tech Lead at Worldline.

I graduated from Ecole Centrale Lyon, one of the French "grandes ecoles" where we are taught a broad set of scientific domains, from acoustic engineering to fluid mechanics. I liked most of the tutorials around algorithms and numerical computing, so I decided to take a semester at the University of Queensland in Australia to study software engineering and machine learning. I loved this course abroad so much I decided to make a career out of data analytics (I like to think koalas and surfing also contributed to this enthusiasm).

Aside from work, I play and coach in badminton on a regional competitive level, try to play jazz piano while sipping a cup of tea and am learning video & music production.

Why did you start using Python?

Years ago, our Data Mining stack was gravitating around SAS and R. One of my main activities was converting R code to Spark in Scala for production on a Hadoop cluster. At times it was challenging as you are juggling between two very different paradigms.

I knew there was a Python binding to Spark and I wanted to find an easier bridge between Data Scientists and Software Engineers in the Hadoop ecosystem; so I started rebuilding some of our data mining projects in Python with a senior data scientist colleague.

We both grew very fond of the language! The syntax felt simple and readable, yet we could build powerful and complex data processing pipelines. What struck me the most was the ecosystem: did you know you could "pip install wget" on Windows to have a pseudo-wget command? That day I jokingly messaged my colleague "Python has a library for everything!", and still regularly browse Pypi for niche and useful packages.

What other programming languages do you know and which is your favorite?

I've done my share of Scala during my Apache Spark years, and know my way around Java as it is the predominant language in the company I currently work in. In the JVM space, I'd like to try Kotlin one day, it looks like their community and Data Science ecosystem are growing and the syntax looks nice.

I'm also a fan of building web applications to showcase my works. I don't pretend to be a Frontend engineer, but I can write small Typescript/Vue/React apps. I find the Javascript world has matured a lot those past years and the Typescript compiler ranting about my code has definitely helped.

Favorite language? I've been using Python pretty much everywhere now, from "check the quality of merchant data in our master customer database" to "downloading attachments from your email" processes. I have to thank the book "Automate the Boring Stuff with Python" for opening my eyes to using Python for every daily task. Who knows if Go, Rust, or Julia challenges it someday, and I'd like to add in C++ to build fancy audio processing tools.

What projects are you working on now?

I'm mostly involved with prototyping data-driven features for projects, reviewing and deploying Python code on an online learning project, as well as promoting Business Intelligence and Machine Learning to internal product/engineering teams and external customers.

Outside of work, I started editing and publishing tutorials as slide carousels, as well as short Data Science skits with the hopes to build educational yet entertaining longer videos about Data Analysis later on. I also contribute a lot to the Streamlit community, but we will talk about this in a few questions.

Which Python libraries are your favorite (core or 3rd party)?

I am a big fan of Streamlit (https://streamlit.io/) as it enables me to quickly showcase and share visual data analysis projects. For example, one of my Machine Learning demos involved using a Tensorflow model to recognize live drawings in the same vein as the "Quick, Draw" game (https://quickdraw.withgoogle.com/). I struggled for 5 days with ECharts, Fabric.js, and Tensorflow.js, having to convert Python models to their JS counterpart and agonizing on CSS. Today with Streamlit I think this would take me less than 5 hours to build. Now I pretty much build a CLI and a Streamlit app as interfaces for every data quality and processing app I create at work.

I like using Plotly Express and Altair for interactive plots, and FastAPI/Pydantic are pretty high on my list too. The collections and itertools core libraries have a lot of hidden gems I rediscover now and then.

What are your contributions to Streamlit?

I had never really contributed to any open source project or online community before. A year and a half ago when I first toyed with Streamlit, the forum and core team were still small. I would regularly see the founders Adrien, Amanda, and Thiago, along with some colleagues advise to other users on the 2-month old forum. The tone was very open and friendly, so I decided to help users on the forum too. I became very active there (I almost got the "365 days in a row" award!), so much that I got contacted by the team, became a forum moderator, was later invited as a guest on their chatroom, and participated in multiple beta testings. I am now part of the "Streamlit Creators" program (https://streamlit.io/creators) which is like being a Community Ambassador for Streamlit, and it comes with nice goodies!

Today I am still very involved in the community in different ways:

If you needed to create a full-fledged website with Python, which framework would you use and why?

There are a lot of options nowadays to build a web application in Python, from the top of my head I can think of Streamlit, Dash, Panel, Gradio, Voilà, Django, FastAPI delivering static pages...they all serve different use cases and come with different constraints regarding the mapping between widget and state.

Whenever I need to show off and interact with some data processing or analysis, I will use Streamlit. I love the simplicity of its design and the low barrier of entry, and I believe you can still do very complex tools with it. But I also understand developers who are put off by the "rerun the whole script on every user interaction, put into cache or session state any heavy computation" lifecycle and prefer Dash or Panel for callbacks to define the mapping between user interaction and backend computation, especially for bigger, multipage web apps. To choose between those libraries, I don't usually give recommendations (and there are plenty of articles on the web on this), rather I ask users to test each library, get a feeling of their API, and ask the community if some more advanced tasks you would need to dig into are possible in this framework.

I did not have the opportunity to use Django yet, as my usual ML demos are single-page static apps without authentication, so worst-case scenario I can get by with React and FastAPI. I'm pretty sure Django is here to stay as one of the preferred frameworks for building "full-fledged websites with administration tools" though, whereas Streamlit/Dash/Panel/Gradio/Voilà would tend towards "providing Python users a way to create a web UI for their works".

Is there anything else you’d like to say?

Have fun in what you do, don't be scared to contribute in online communities and build a lot of small and silly projects to improve at first, as consistency beats intensity!

Thanks for doing the interview, Fanilo!

The post PyDev of the Week: Fanilo Andrianasolo appeared first on Mouse Vs Python.

December 06, 2021 06:05 AM UTC

December 05, 2021


ItsMyCode

How To Convert Python String To Array

ItsMyCode |

In Python, we do not have an in-built array data type. However, we can convert Python string to list, which can be used as an array type.

Python String to Array

In the earlier tutorial, we learned how to convert list to string in Python. Here we will look at converting string to list with examples.

We will be using the String.split() method to convert string to array in Python.

Python’s split() method splits the string using a specified delimiter and returns it as a list item. The delimiter can be passed as an argument to the split() method. If we don’t give any delimiter, the default will be taken as whitespace.

split() Syntax

The Syntax of split() method is

string.split(separator, maxsplit)

split() Parameters

The split() method takes two parameters, and both are optional.

Example 1: Split the string using the default arguments

In this example, we are not passing any arguments to the split() method. Hence it takes whitespace as a separator and splits the string into a list.

# Split the string using the default arguments

text= "Welcome to Python Tutorials !!!"
print(text.split())

Output

['Welcome', 'to', 'Python', 'Tutorials', '!!!']

Example 2: Split the string using the specific character

In this example, we split the string using a specific character. We will be using a comma as a separator to split the string. 

# Split the string using the separator

text= "Orange,Apple,Grapes,WaterMelon,Kiwi"
print(text.split(','))

Output

['Orange', 'Apple', 'Grapes', 'WaterMelon', 'Kiwi']

Python String to Array of Characters

If you want to convert a string to an array of characters, you can use the list() method, an inbuilt function in Python.

Note: If the string contains whitespace, it will be treated as characters, and whitespace also will be converted to a list.

Example 1: String to array using list() method

# Split the string to array of characters

text1= "ABCDEFGH"
print(list(text1))

text2="A P P L E"
print(list(text2))

Output

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
['A', ' ', 'P', ' ', 'P', ' ', 'L', ' ', 'E']

You can also use list comprehension to split strings into an array of characters, as shown below.

Example 2: String to array using list comprehension

# Split the string to array of characters using list Comprehension

text1= "ABCDEFGH"
output1= [x for x in text1]
print(output1)

text2="A P P L E"
output2=[x for x in text2]
print(list(text2))

Output

['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H']
['A', ' ', 'P', ' ', 'P', ' ', 'L', ' ', 'E']

The post How To Convert Python String To Array appeared first on ItsMyCode.

December 05, 2021 10:38 PM UTC

Python Split list into chunks

ItsMyCode |

In this tutorial, you will learn how to split a list into chunks in Python using different ways with examples.

Python Split list into chunks

Lists are mutable and heterogenous, meaning they can be changed and contain different data types. We can access the elements of the list using their index position.

There are five various ways to split a list into chunks.

  1. Using a For-Loop
  2. Using the List Comprehension Method
  3. Using the itertools Method
  4. Using the NumPy Method
  5. Using the lambda Function

Method 1: Using a For-Loop

The naive way to split a list is using the for loop with help of range() function. 

The range function would read range(0, 10, 2), meaning we would loop over the items 0,2,4,6,8.

We then index our list from i:i+chunk_size, meaning the first loop would be 0:2, then 2:4, and so on.

# Split a Python List into Chunks using For Loops

sample_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
chunked_list = list()
chunk_size = 2
for i in range(0, len(sample_list), chunk_size):
    chunked_list.append(sample_list[i:i+chunk_size])
print(chunked_list)

Output

[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

Method 2: Using the List Comprehension Method

The list comprehension is an effective way to split a list into chunks when compared to for loop, and it’s more readable.

We have a sample_list and contain ten elements in it. We will split the list into equal parts with a chunk_size of 2.

# Split a Python List into Chunks using list comprehensions
sample_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
chunk_size=2
result=[sample_list[i:i + chunk_size] for i in range(0, len(sample_list), chunk_size)]
print(result)

Output

[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

sample_list[i:i + chunk_size] give us each chunk. For example, if i=0, the items included in the chunk are i to i+chunk_size, which is from 0:2 index. In the next iteration, the items included would be 2:4 index and so on.

Method 3: Using the itertools Method

We can leverage the itertools module to split a list into chunks. The zip_longest() function returns a generator that must be iterated using for loop. It’s a straightforward implementation and returns a list of tuples, as shown below.

# Split a Python List into Chunks using itertools

from itertools import zip_longest

sample_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
chunk_size=2
result=list(zip_longest(*[iter(sample_list)]*chunk_size, fillvalue=''))
print(result)

chunked_list = [list(item) for item in list(zip_longest(*[iter(sample_list)]*chunk_size, fillvalue=''))]
print(chunked_list)

Output

[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

Method 4: Using the NumPy Method

We can use the NumPy library to divide the list into n-sized chunks. The array_split() function splits the list into sublists of specific size defined as n.

# Split a Python List into Chunks using numpy

import numpy as np

# define array and chunk_szie
sample_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
our_array = np.array(sample_list)
chunk_size = 2

# split the array into chunks
chunked_arrays = np.array_split(our_array, len(sample_list) / chunk_size)
print(chunked_arrays)

# Covert chunked array into list
chunked_list = [list(array) for array in chunked_arrays]
print(chunked_list)

Output

[array([1, 2]), array([3, 4]), array([5, 6]), array([7, 8]), array([ 9, 10])]
[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

Method 5: Using the lambda Method

We can use the lambda function to divide the list into chunks. The lambda function will iterate over the elements in the list and divide them into N-Sized chunks, as shown below.

# Split a Python List into Chunks using lambda function

sample_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
chunk_size = 2

lst= lambda sample_list, chunk_size: [sample_list[i:i+chunk_size] for i in range(0, len(sample_list), chunk_size)]
result=lst(sample_list, chunk_size)
print(result)

Output

[[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]

The post Python Split list into chunks appeared first on ItsMyCode.

December 05, 2021 09:37 PM UTC

Python Print to File

ItsMyCode |

We always use print statements in Python to display the output in the console or command line terminal. However, sometimes we want to change this behavior to print to a text file instead of the console.

The print() method accepts the file parameter as one of the arguments, and using this, we can print the standard output to a file.

The default value of the file argument is sys.stdout, which prints the output on the screen.

Example 1: Print to the text file

The below program opens the existing text file in the write mode using the open() function (if file is not available it will create a new file) and writes all the text specified inside the print statement. Once the content is written to a file, we close the file using the close() method.

# Print to the text file
file = open('log.txt', 'w')
print('This is a sample print statement', file = file)
print('Hello World', file = file)

file.close()

Output

This is a sample print statement
Hello World

Example 2: Print to the text file using inline file argument

If you want to control only a few print statements to output to a text file, you could inline the file argument inside the print statement, as shown below.

# Appends the print statement into a log.txt

print("Welcome to ItsMyCode", file=open('log.txt','a'))

Output

Welcome to ItsMyCode

Redirecting the Standard Output Stream to a file

Specifying the file argument helps debug the smaller programs but is not desirable in some situations. 

In this case, we can redirect all the standard output stream to a file. Setting the standard output to a file ensures that the text specified inside the print() function will be written to a file instead of displaying in the console. 

The standard output can be set in Python by importing the sys module and setting the stdout to a file or file-like object.

import sys

# Prints in a console/terminal
print('Default mode: Prints on a console.')

# Store the reference of original standard output into variable
original_stdout = sys.stdout 

# Create or open an existing file in write mode
with open('log.txt', 'w') as file:
    # Set the stdout to file object
    sys.stdout = file
    print('File Mode: Print text to a file.')
    
    # Set the stdout back to the original or default mode
    sys.stdout = original_stdout

print('Default Mode: Prints again on a console.')

Output

Hello Welcome to Python Tutorials !!!
File Mode: Print text to a file.
Default mode: Prints on a console.
Default Mode: Prints again on a console.

Redirect the Python Script Output to File

The easiest and dirty way is to redirect the Python script’s output directly from the command line window while executing the script. 

For example, if you have a Python script file named main.py with the following code.

We can redirect the output of the Python script using the right angle bracket, as shown below.

python3 main.py > output.txt  
print("Hello World !!!")

If you open the output.txt file, you can see the below content written into the text file.

Hello World !!!

The post Python Print to File appeared first on ItsMyCode.

December 05, 2021 07:59 PM UTC

Remove Character From String Python

ItsMyCode |

We can remove a character from String in Python using replace() and translate() methods. In this tutorial, let’s look at How to remove a character from a string in Python with examples.

Python Remove a Character from a String

There are many scenarios where we need to replace all occurrences of a character from a string or remove a specific character from a string. The two recommended approaches are :

Python Remove Character from String using replace()

The replace() method replaces the character with a new character. We can use the replace() method to remove a character from a string by passing an empty string as an argument to the replace() method.

Note: In Python, strings are immutable, and the replace() function will return a new string, and the original string will be left unmodified.

Remove a single character from a string

If you want to remove the first occurrence of a character from a string, then you can use pass a count argument as 1 to the replace method, as shown below.

# Python program to remove single occurrences of a character from a string
text= 'ItsMyCoode'
print(text.replace('o','',1))

Output

ItsMyCode
Note: The count argument in replace() method indicates the number of times the replacement should be performed in a string.

Remove all occurrences of a character from a string

If you want to remove all the occurrences of a character from a string, then you can exclude the count argument as shown below.

# Python program to remove all occurrences of a character from a string
text= 'Welcome, to, Python, World'
print(text.replace(',',''))

Output

Welcome to Python World

Python Remove Character from String using translate()

The other alternative is to use the translate() method. The translate() method accepts one argument, which is a translation table or Unicode code point of a character that you need to replace. 

We can get the Unicode code point of any character using the ord() method.

You need to map ‘None‘ as a replacement character which in turn removes a specified character from a string as shown below.

# Python program to remove a character from a string using translate() method
text= '_User_'
print(text.translate({ord('_'):None}))

Output

User

Python remove last character from string

If you want to remove the last character from a string in Python, you can use slice notation [:-1]. The slice notation selects the character at the index position -1 (the last character in a string). Then it returns every character except the last one.

# Python program to remove last character from a string using slice notation

text= 'Hello World!'
print(text[:-1])

Output

Hello World

Python remove spaces from string

# Python program to remove white spaces from a string
text= 'A B C D E F G H'

# Using replace method
print(text.replace(' ',''))

# Using translate method
print(text.translate({ord(' '):None}))

Output

ABCDEFGH
ABCDEFGH

Python remove punctuation from a string

# Python program to remove punctuation from a string

import string
text= 'Hello, W_orl$d#!'

# Using translate method
print(text.translate(str.maketrans('', '', string.punctuation)))

Output

Hello World

The post Remove Character From String Python appeared first on ItsMyCode.

December 05, 2021 03:20 PM UTC


PyPy

Error Message Style Guides of Various Languages

Error Message Style Guides of Various Languages

PyPy has been trying to produce good SyntaxErrors and other errors for a long time. CPython has also made an enormous push to improve its SyntaxErrors in the last few releases. These improvements are great, but the process feels somewhat arbitrary sometimes. To see what other languages are doing, I asked people on Twitter whether they know of error message style guides for other programming languages.

Wonderfully, people answered me with lots of helpful links (full list at the end of the post), thank you everybody! All those sources are very interesting and contain many great points, I recommend reading them directly! In this post, I'll try to summarize some common themes or topics that I thought were particularly interesting.

Language Use

Almost all guides stress the need for plain and simple English, as well as conciseness and clarity [Flix, Racket, Rust, Flow]. Flow suggests to put coding effort into making the grammar correct, for example in the case of plurals or to distinguish between "a" and "an".

The suggested tone should be friendly and neutral, the messages should not blame the Programmer [Flow]. Rust and Flix suggest to not use the term 'illegal' and use something like 'invalid' instead.

Flow suggests to avoid "compiler speak". For example terms like 'token' and 'identifier' should be avoided and terms that are more familiar to programmers be used (eg "name" is better). The Racket guide goes further and has a list of allowed technical terms and some prohibited terms.

Structure

Several guides (such as Flix and Flow) point out a 80/20 rule: 80% of the times an error message is read, the developer knows that message well and knows exactly what to do. For this use case it's important that the message is short. On the other hand, 20% of the times this same message will have to be understood by a developer who has never seen it before and is confused, and so the message needs to contain enough information to allow them to find out what is going on. So the error message needs to strike a balance between brevity and clarity.

The Racket guide proposes to use the following general structure for errors: 'State the constraint that was violated ("expected a"), followed by what was found instead.'

The Rust guides says to avoid "Did you mean?" and questions in general, and wants the compiler to instead be explicit about why something was suggested. The example the Rust guide gives is: 'Compare "did you mean: Foo" vs. "there is a struct with a similar name: Foo".' Racket goes further and forbids suggestions altogether because "Students will follow well‐meaning‐but‐wrong advice uncritically, if only because they have no reason to doubt the authoritative voice of the tool."

Formatting and Source Positions

The Rust guide suggests to put all identifiers into backticks (like in Markdown), Flow formats the error messages using full Markdown.

The Clang, Flow and Rust guides point out the importance of using precise source code spans to point to errors, which is especially important if the compiler information is used in the context of an IDE to show a red squiggly underline or some other highlighting. The spans should be as small as possible to point out the source of the error [Flow].

Conclusion

I am quite impressed how advanced and well-thought out the approaches are. I wonder whether it would makes sense for Python to adopt a (probably minimal, to get started) subset of these ideas as guidelines for its own errors.

Sources

December 05, 2021 02:00 PM UTC

December 04, 2021


Jaime Buelta

“Python Architecture Patterns” book announced!

We are getting close to the end of the year, but I have great news! A new Python book is on the way, and will be released soon. Current software systems can be extremely big and complex, and Software Architecture deals with the design and tweaking of fundamental structures that shape the big picture of those systems.The book talks about in greater detail about what Software Architecture is, what are its challenges and describes different architecture patterns that can be used when dealing with complex systems. All to create an “Architectural View”... Read More

December 04, 2021 08:01 PM UTC


Sandipan Dey

Implementing a few algorithms with python

In this blog, we shall focus on implementing a few famous of algorithms with python – the algorithms will be from various topics, such as graph theory, compiler construction, theory of computation, numerical analysis, data structures etc. and all of the implementations will be from scratch. Let’s start with a problem called Segmented Least Squares, … Continue reading Implementing a few algorithms with python

December 04, 2021 07:08 PM UTC


Paolo Amoroso

Suite8080 0.4.0

I released version 0.4.0 of Suite8080, the suite of Intel 8080 Assembly cross-development tools I’m writing in Python. It bundles some minor features and changes I did while thinking about what major task to work on next.


New features

There are two main new features in this release.

SID debugging session in the z80pack CP/M emulator
A SID debugging session in the z80pack CP/M emulator. SID loaded the greet.com hello world program assembled with asm80, along with the greet.sym symbol table. The l command disassembled the program and showed the symbols MESSAGE and BDOS. The d command dumped memory from the address of the MESSAGE symbol.

The first is the ability of asm80 to save the assembled program’s symbol table in the .sym CP/M file format. The other feature enhances the assembler to accept the double-quote character as a string delimiter, which means strings and character constants may be written as ”This is a string” and ”F”.

In addition, the output of the assembler's help message (-h option) and verbose mode (-v) is now slightly more descriptive and complete.


Saving the symbol table in .sym format

The Digital Research assembler that came with CP/M, the most popular operating system for 8080 microcomputers in the 1970s and early 1980s, could save symbol tables to .sym files, which map the labels of Assembly programs to their memory addresses.

Why did I add .sym support to the Suite8080 asm80 assembler?

Because the Digital Research symbolic debugger, SID, lets me load symbol tables and refer to program labels by name instead of raw memory addresses. For example, from SID I can load a program, run it from an instruction pointed to by a label, set a breakpoint at a label, and so on.


Implementing .sym support

Section 1.1 "SID Startup" on page 4 of the SID Users Guide, the Digital Research symbolic debugger, describes this simple text file format. A .sym file contains entries comprising a hexadecimal 4-digit address and the corresponding symbol name truncated to 16 characters, separated by a space. Although a line can hold more entries separated by tabs, for simplicity the files asm80 saves have only one per line.

What does a symbol table file look like? This is the greet.sym file of the greet.asm hello world CP/M Assembly demo of Suite8080:

0005 BDOS
0009 WSTRF
0109 MESSAGE

All I did to implement .sym support was adding the single function write_symbol_table() to the assembler. Here’s the code:

def write_symbol_table(table, filename):
symbol_count = len(table)
if symbol_count == 0:
return symbol_count

with open(filename, 'w', encoding='utf-8') as file:
for symbol in table:
print(f'{table[symbol]:04X} {symbol[:16].upper()}', file=file)

return symbol_count

symbol_table is a global dictionary holding the symbol table.

The only specification of the file format I found makes no mention of symbol case. But the Digital Research assembler uppercases symbols, so asm80 does too.

Although the Digital Research assembler sorts the entries of .sym files alphabetically by symbol name, asm80 diverges from this practice and leaves them unsorted. Again, the SID manual doesn’t mention sorting and some examples it presents are unsorted. SID accepts unsorted files.

To complete the implementation, I added to function main() the code to process the -s command line option for saving the symbol table. With the new option, and the way it combines with other options, the logic to process them grew a bit in main(). Perhaps the argument processing code is ripe for breaking out of main() into its own function.


Documentation and demos

Suite8080 0.4.0 includes a new document describing the usage of the Suite8080 tools and expands the design notes by covering the assembler’s parser. Some minor updates to the README.md file list only the currently available Suite8080 tools and add links to the source tree directories it references.

I changed the memcpy.asm and upcase.asm Assembly demos to make them run on CP/M. These programs produce no visible output. Therefore, to inspect how they alter the machine state, you need to run them inside a debugger such as SID or DDT.


Bug fixes and other changes

The work on the 0.4.0 release involved also some refactoring of asm80 to remove redundant declarations of global variables, break the code that saves the assembled program out of main() into its own function, and tweak the code that processes the default output file name.

Finally, I fixed a bug that made the assembler ignore the -o option when building the output file name and - was supplied as the input file.

December 04, 2021 07:04 PM UTC


Weekly Python StackOverflow Report

(ccciv) stackoverflow python report

These are the ten most rated questions at Stack Overflow last week.
Between brackets: [question score / answers count]
Build date: 2021-12-04 16:23:51 GMT


  1. Sorting multiple lists together in place - [8/1]
  2. Trouble printing a generator expression as a list - [6/2]
  3. Getting the value of a return after a yield in python - [5/5]
  4. Round off date in data frame in Python - [5/3]
  5. In pandas, how to pivot a dataframe on a categorical series with missing categories? - [5/2]
  6. on what systems does Python not use IEEE-754 double precision floats - [5/1]
  7. understanding sklearn calibratedClassifierCV - [5/1]
  8. Fastest way to find all pairs of close numbers in a Numpy array - [4/5]
  9. Python best way to 'swap' words (multiple characters) in a string? - [4/4]
  10. How to reorder rows in pandas dataframe by factor level in python? - [4/3]

December 04, 2021 04:24 PM UTC


Hynek Schlawack

How to Ditch Codecov for Python Projects

Codecov’s unreliability breaking CI on my open source projects has been a constant source of frustration for me for years. I have found a way to enforce coverage over a whole GitHub Actions build matrix that doesn’t rely on third-party services.

December 04, 2021 12:00 AM UTC

December 03, 2021


"Mathspp Pydon'ts"

Why mastering Python is impossible, and why that's ok | Pydon't 🐍

Let me tell you why it is impossible to truly master Python, but also show you how to get as close to it as possible.

A rocky trail heading up a hill.Photo by Migle Siauciulyte on Unsplash.

Introduction

It has been said that you need 10,000 hours to master a skill. I won't dispute if that's true or not. What I'll tell you is that, even if that's true, I'm not sure it applies to Python!

In this Pydon't, I'll explain why I think you can't really master Python, but I'll also tell you why I think that's ok: I'll give you a series of practical tips that you can use to make sure you keep improving your Python knowledge.

Finally, by the end of the Pydon't, I'll share a little anecdote from my own personal experience with Python, to support my claims.

You can now get your free copy of the ebook “Pydon'ts – Write beautiful Python code” on Gumroad to help support the series of “Pydon't” articles 💪.

“to master”, verb

Here's the dictionary definition of the verb “to master”:

“master”, verb – to learn or understand something completely

From my personal experience, there are two levels at which I believe one cannot master Python; I'll lay both of them down now.

Python is an evolving language

The Python language is an evolving language: it isn't a finished product. As such, it keeps growing:

Therefore, I can never know everything about it! As soon as I think I just learned all the things there are to learn, new things pop up.

This is something I believe in, but it is also almost a philosophical point of view. There is also a practical side to this argument.

Python is just too big

Not only does the language keep changing, one can argue that the Python language is already too big for you to be able to master it.

For example, most of us are familiar with the list methods .append or .pop. But, from my experience, most people aren't familiar with the list methods .copy, or .extend, for example. In fact, let's do an experiment: can you name the 11 existing list methods?

Scroll to the bottom of the page and write them down as a comment. If not the 11, write down as many as you can remember.

Here are they:

>>> [name for name in dir(list) if not name.startswith("__")]
['append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

No idea what dir is? Just scroll down.

Maybe you even knew about all of them, but being able to name them is hard, right?

Let's do a similar thing for strings! First, jot down as many string methods that you can remember.

Done?

Great. Now count them. How many did you get?

Now, how many string methods do you think there are?

There are 47 (!) string methods!

Probably, you never even heard about some...

December 03, 2021 11:00 PM UTC


Lucas Cimon

Undying Dusk : a PDF video game

UNDYING DUSK

Undying Dusk is a video game in a PDF format, with a gameplay based on exploration and logic puzzles, in the tradition of dungeon crawlers.

A curse set by the Empress keeps the world in an eternal dusk. You are have recently found shelter in an eerie monastery.

GIF trailer [#1](https://chezsoi.org/shaarli/./add-tag/1)

Featuring:

  • ~ 200 …

Permalink

December 03, 2021 02:17 PM UTC


Real Python

The Real Python Podcast – Episode #88: Discussing Type Hints, Protocols, and Ducks in Python

There seem to be three kinds of Python developers: those unaware of type hints or have no opinion, ones that embrace them, and others who have an allergic reaction at the mention of them. Python is famously a dynamically typed language, but there are advantages to adding type hints to your code. This week on the show, we have Luciano Ramalho to discuss his recent talk titled, "Type hints, protocols, and good sense."


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

December 03, 2021 12:00 PM UTC


Python Bytes

#261 Please re-enable spacebar heating

<p><strong>Watch the live stream:</strong></p> <a href='https://www.youtube.com/watch?v=ySdl6JIxzec' style='font-weight: bold;'>Watch on YouTube</a><br> <br> <p><strong>About the show</strong></p> <p>Sponsored by <strong>us:</strong></p> <ul> <li>Check out the <a href="https://training.talkpython.fm/courses/all"><strong>courses over at Talk Python</strong></a></li> <li>And <a href="https://pythontest.com/pytest-book/"><strong>Brian’s book too</strong></a>!</li> </ul> <p>Special guest: <a href="https://twitter.com/ChelleGentemann"><strong>Dr. Chelle Gentemann</strong></a></p> <p><strong>Michael #1:</strong> <a href="https://rclone.org/"><strong>rClone</strong></a></p> <ul> <li>via Mark Pender</li> <li>Not much Python but useful for Python people :)</li> <li>Rclone is a command line program to manage files on cloud storage.</li> <li><a href="https://rclone.org/#providers">Over 40 cloud storage products</a> support rclone including S3 object stores</li> <li>Rclone has powerful cloud equivalents to the unix commands rsync, cp, mv, mount, ls, ncdu, tree, rm, and cat.</li> </ul> <p><strong>Brian #2:</strong> <a href="https://pypi.org/project/check-wheel-contents/"><strong>check-wheel-contents</strong></a></p> <ul> <li>Suggested by several listeners, thank you.</li> <li>“Getting the right files into your wheel is tricky, and sometimes we mess up and publish a wheel containing __pycache__ directories or tests/”</li> <li>usage: <code>check-wheel-contents [[HTML_REMOVED]] &amp;lt;wheel or dir&gt;</code></li> <li>ex:</li> </ul> <pre><code> (venv) $ pwd /Users/okken/projects/cards (venv) $ check-wheel-contents dist dist/cards-1.0.0-py3-none-any.whl: OK </code></pre> <ul> <li>Checks</li> </ul> <pre><code> - W001 - Wheel contains .pyc/.pyo files - W002 - Wheel contains duplicate files - W003 - Wheel contains non-module at library toplevel - W004 - Module is not located at importable path - W005 - Wheel contains common toplevel name in library - W006 - __init__.py at top level of library - W007 - Wheel library is empty - W008 - Wheel is empty - W009 - Wheel contains multiple toplevel library entries - W010 - Toplevel library directory contains no Python modules - W101 - Wheel library is missing files in package tree - W102 - Wheel library contains files not in package tree - W201 - Wheel library is missing specified toplevel entry - W202 - Wheel library has undeclared toplevel entry </code></pre> <ul> <li>Readme has good description of each check, including common causes and solutions.</li> </ul> <p><strong>Chelle #3:</strong> <a href="http://xarray.pydata.org/en/stable/"><strong>xarray</strong></a></p> <ul> <li>Where can I find climate and weather data?</li> <li>Binary to netCDF to Zarr… data is all its gory-ness</li> <li>Data formats are critical for data providers but should be invisible to users</li> <li>What is Xarray</li> <li>An example reading climate data and making some maps</li> </ul> <p><strong>Michael #4:</strong> <a href="https://www.jetbrains.com/remote-development/"><strong>JetBrains Remote Development</strong></a></p> <ul> <li>If you can SSH to it, that can be your dev machine</li> <li>Keep sensitive code and connections on a dedicated machine</li> <li>Reproducible environments for the team</li> <li>Spin up per-configured environments (venvs, services, etc)</li> <li>Treat your dev machine like a temp git branch checkout for testing PRs, etc</li> <li>They did bury the lead <a href="https://www.jetbrains.com/fleet/">with Fleet in here too</a></li> </ul> <p><strong>Brian #5:</strong> <strong>The XY Problem</strong></p> <ul> <li>This topic is important because many of us, including listeners, are <ul> <li>novices in some topics and ask questions, sometimes without giving enough context.</li> <li>experts in some topics and answer questions of others.</li> </ul></li> <li>The XY Problem <ul> <li>“… You are trying to solve problem <em>X</em>, and you think solution <em>Y</em> would work, but instead of asking about <em>X</em> when you run into trouble, you ask about *Y.” </li> <li><ul> <li><a href="https://meta.stackexchange.com/a/66378"><em>From a Stack Exchange Answer</em></a> </li> </ul></li> </ul></li> <li>Example from <a href="https://xyproblem.info/">xyproblem.info</a></li> <li>[n00b] How can I echo the last three characters in a filename?</li> <li>[feline] If they're in a variable: echo ${foo: -3}</li> <li>[feline] Why 3 characters? What do you REALLY want?</li> <li>[feline] Do you want the extension?</li> <li>[n00b] Yes.</li> <li>[feline] There's no guarantee that every filename will have a three-letter extension,</li> <li>[feline] so blindly grabbing three characters does not solve the problem.</li> <li><p>[feline] echo ${foo##*.}</p></li> <li><p>Reason why it’s common and almost unavoidable:</p> <ul> <li>Almost all design processes for software <ul> <li>I can achieve A if I do B and C.</li> <li>I can achieve B if I do D and E.</li> <li>And I can achieve C if I do F and G. </li> <li>… I can achieve X if I do Y.</li> </ul></li> </ul></li> <li>More important questions than “What is the XY Problem?”: <ul> <li>Is it possible to avoid? - not really</li> <li>Is it possible to mitigate when asking questions? - yes</li> <li>When answering questions where you expect XY might be an issue, how do you pull out information while providing information and be respectful to the asker? </li> </ul></li> <li><a href="https://meta.stackexchange.com/a/269222">One great response included</a> <ul> <li><strong>Asking Questions where you risk falling into XY</strong> <ul> <li>State your problem</li> <li>State what you are trying to achieve</li> <li>State how it fits into your wider design</li> </ul></li> <li><strong>Giving Answers to XY problems</strong> <ol> <li>Answer the question (answer Y)</li> <li>Discuss the attempted solution (ask questions about context) <ul> <li>“Just curious. Are you trying to do (possible X)? If so, Y might not be appropriate because …”</li> <li>“What is the answer to Y going to be used for?”</li> </ul></li> <li>Solve X</li> </ol></li> </ul></li> <li>Also interesting reading <ul> <li><a href="https://en.wikipedia.org/wiki/Einstellung_effect">Einstellung effect</a> - The Einstellung effect is the negative effect of previous experience when solving new problems.</li> </ul></li> </ul> <p><strong>Chelle #6:</strong> <a href="https://github.com/fsspec/kerchunk"><strong>kerchunk</strong></a> - Making data access fast and invisible</p> <ul> <li>S3 is pretty slow, especially when you have LOTS of files</li> <li>We can speed it up by creating json files that just collect info from files and act as a reference</li> <li>Then we can collate the references into MEGAJSON and just access lots of data at once</li> <li>Make it easy to get data!</li> </ul> <p><strong>Extras</strong> </p> <p>Michael:</p> <ul> <li><a href="https://twitter.com/_rlivingston/status/1463277526210416644"><strong>Xojo</strong></a> - like modern VB6?</li> <li><a href="https://www.youtube.com/watch?v=sJriZQsMHrw"><strong>10 Reasons You'll Love PyCharm Even More in 2021 webcast</strong></a></li> <li><a href="https://arstechnica.com/information-technology/2021/11/microsoft-plans-to-integrate-a-buy-now-pay-later-app-into-edge/"><strong>Users revolt as Microsoft bolts a short-term financing app onto Edge</strong></a></li> </ul> <p><strong>Chelle</strong>: </p> <ul> <li>Why we need python &amp; FOSS to solve the climate crisis</li> </ul> <p><strong>Joke:</strong> <a href="https://xkcd.com/1172/"><strong>Spacebar Heating</strong></a></p>

December 03, 2021 08:00 AM UTC

December 02, 2021


Python for Beginners

Postorder Tree Traversal Algorithm in Python

Binary trees are very useful in representing hierarchical data. In this article, we will discuss how to print all the elements in a binary tree using postorder tree traversal. We will also implement the postorder tree traversal algorithm in python.

What is the postorder tree traversal algorithm?

Postorder traversal algorithm is a depth first traversal algorithm. Here, we start from a root node and traverse a branch of the tree until we reach the end of the branch. After that, we move to the next branch. This process continues until all the nodes in the tree are printed. 

The postorder tree traversal algorithm gets its name from the order in which the nodes of a tree are printed. In this algorithm, we first print the left sub-tree of the node, then we print the right sub-tree of the current node. At last, we print the current node. This process is recursive in nature. Here, the node is only printed when all the nodes in the left sub-tree and the right sub-tree of the current node have already been printed.  

Let us understand the process using the binary tree given in the following image.

Binary treeBinary Tree

Let us print all of the nodes in the above binary tree using the postorder traversal algorithm.

You can observe that we have printed the values in the order 11, 22, 20,52, 78, 53, 50. Let us now formulate an algorithm for the postorder tree traversal algorithm.

Algorithm for postorder tree traversal

As you have an overview of the entire process, we can formulate the algorithm for postorder tree traversal as follows.

  1. Start from the root node.
  2. If the root is empty, return.
  3. Traverse the left sub-tree recursively.
  4. Traverse the right sub-tree recursively.
  5. Print the root node.
  6. Stop.

Implementation of postorder tree traversal in Python

 As we have understood the algorithm for postorder tree traversal and its working, Let us implement the algorithm and execute it for the binary tree given in the above image.

class BinaryTreeNode:
    def __init__(self, data):
        self.data = data
        self.leftChild = None
        self.rightChild = None


def postorder(root):
    # if root is None,return
    if root is None:
        return
    # traverse left subtree
    postorder(root.leftChild)

    # traverse right subtree
    postorder(root.rightChild)
    # print the current node
    print(root.data, end=" ,")


def insert(root, newValue):
    # if binary search tree is empty, create a new node and declare it as root
    if root is None:
        root = BinaryTreeNode(newValue)
        return root
    # if newValue is less than value of data in root, add it to left subtree and proceed recursively
    if newValue < root.data:
        root.leftChild = insert(root.leftChild, newValue)
    else:
        # if newValue is greater than value of data in root, add it to right subtree and proceed recursively
        root.rightChild = insert(root.rightChild, newValue)
    return root


root = insert(None, 50)
insert(root, 20)
insert(root, 53)
insert(root, 11)
insert(root, 22)
insert(root, 52)
insert(root, 78)
print("Postorder traversal of the binary tree is:")
postorder(root)

Output:

Postorder traversal of the binary tree is:
11 ,22 ,20 ,52 ,78 ,53 ,50 ,

Conclusion

In this article, we have discussed and implemented the postorder tree traversal algorithm. To learn more about other tree traversal algorithms, you can read this article on Inorder tree traversal algorithm or level order tree traversal algorithm in python.

The post Postorder Tree Traversal Algorithm in Python appeared first on PythonForBeginners.com.

December 02, 2021 04:26 PM UTC


Inspired Python

Testing your Python Code with Hypothesis

Testing your Python Code with Hypothesis

I can think of a several Python packages that greatly improved the quality of the software I write. Two of them are pytest and hypothesis. The former adds an ergonomic framework for writing tests and fixtures and a feature-rich test runner. The latter adds property-based testing that can ferret out all but the most stubborn bugs using clever algorithms, and that’s the package we’ll explore in this course.

In an ordinary test you interface with the code you want to test by generating one or more inputs to test against, and then you validate that it returns the right answer. But that, then, raises a tantalizing question: what about all the inputs you didn’t test? Your code coverage tool may well report 100% test coverage, but that does not, ipso facto, mean the code is bug-free.

One of the defining features of Hypothesis is its ability to generate test cases automatically in a manner that is:

So let’s look at how Hypothesis can help you discover errors in your code.



Read More ->

December 02, 2021 03:31 PM UTC


John Ludhi/nbshare.io

Amazon Review Summarization Using GPT-2 And PyTorch

Amazon Review Summarization Using GPT-2 And PyTorch

Since its reveal in 2017 in the popular paper Attention Is All You Need (https://arxiv.org/abs/1706.03762), the Transformer quickly became the most popular model in NLP. The ability to process text in a non-sequential way (as opposed to RNNs) allowed for training of big models. The attention mechanism it introduced proved extremely useful in generalizing text.

Following the paper, several popular transformers surfaced, the most popular of which is GPT. GPT models are developed and trained by OpenAI, one of the leaders in AI research. The latest release of GPT is GPT-3, which has 175 billion parameters. The model was very advanced to the point where OpenAI chose not to open-source it. People can access it through an API after a signup process and a long queue.

However, GPT-2, their previous release is open-source and available on many deep learning frameworks.

In this excercise, we use Huggingface and PyTorch to fine-tune a GPT-2 model for review summarization.

Overview:

  • Imports and Data Loading
  • Data Preprocessing
  • Setup and Training
  • Summary Writing

Imports and Data Loading

In [ ]:
!pip install transformers
In [2]:
import re
import random
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from transformers import AutoTokenizer, AutoModelWithLMHead
import torch.optim as optim

We set the device to enable GPU processing.

In [3]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
device
Out[3]:
device(type='cuda', index=0)
In [4]:
from google.colab import drive
drive.mount("/content/drive")
Mounted at /content/drive

The data we will use for training summarization is the Amazon review dataset, which can be found at https://www.kaggle.com/currie32/summarizing-text-with-amazon-reviews.

When writing a review on Amazon, customers write a review and a title for the review. The dataset treats the title as the summary of the review.

In [5]:
reviews_path = "/content/drive/My Drive/Colab Notebooks/reviews.txt"

We use the standard python method of opening txt files:

In [6]:
with open(reviews_path, "r") as reviews_raw:
    reviews = reviews_raw.readlines()

Showing 5 instances:

In [7]:
reviews[:5]
Out[7]:
['I have bought several of the Vitality canned dog food products and have found them all to be of good quality. The product looks more like a stew than a processed meat and it smells better. My Labrador is finicky and she appreciates this product better than  most. = Good Quality Dog Food\n',
 'Product arrived labeled as Jumbo Salted Peanuts...the peanuts were actually small sized unsalted. Not sure if this was an error or if the vendor intended to represent the product as "Jumbo". = Not as Advertised\n',
 'This is a confection that has been around a few centuries.  It is a light, pillowy citrus gelatin with nuts - in this case Filberts. And it is cut into tiny squares and then liberally coated with powdered sugar.  And it is a tiny mouthful of heaven.  Not too chewy, and very flavorful.  I highly recommend this yummy treat.  If you are familiar with the story of C.S. Lewis\' "The Lion, The Witch, and The Wardrobe" - this is the treat that seduces Edmund into selling out his Brother and Sisters to the Witch. = "Delight" says it all\n',
 'If you are looking for the secret ingredient in Robitussin I believe I have found it.  I got this in addition to the Root Beer Extract I ordered (which was good) and made some cherry soda.  The flavor is very medicinal. = Cough Medicine\n',
 'Great taffy at a great price.  There was a wide assortment of yummy taffy.  Delivery was very quick.  If your a taffy lover, this is a deal. = Great taffy\n']

As shown, each sample consists of the review followed by its summary, separated by the equals (=) sign.

In [8]:
len(reviews)
Out[8]:
70993

There are ~71K instances in the dataset, which is sufficient to train a GPT-2 model.

Data Preprocessing

The beauty of GPT-2 is its ability to multi-task. The same model can be trained on more than 1 task at a time. However, we should adhere to the correct task designators, as specified by the oriningal paper.

For summarization, the appropriate task designator is the TL;DR symbol, which stands for "too long; didn't read".

The "TL;DR" token should be between the input text and the summary.

Thus, we will replace the equals symbol in the data with the correct task designator:

In [9]:
reviews = [review.replace(" = ", " TL;DR ") for review in reviews]
In [10]:
reviews[10]
Out[10]:
'One of my boys needed to lose some weight and the other didn\'t.  I put this food on the floor for the chubby guy, and the protein-rich, no by-product food up higher where only my skinny boy can jump.  The higher food sits going stale.  They both really go for this food.  And my chubby boy has been losing about an ounce a week. TL;DR My cats LOVE this "diet" food better than their regular food\n'

So far, so good.

Finally for preprocessing, we should acquire a fixed length input. We use the average review length (in words) as an estimator:

In [11]:
avg_length = sum([len(review.split()) for review in reviews])/len(reviews)
avg_length
Out[11]:
53.41132224303804

Since the average instance length in words is 53.3, we can assume that a max length of 100 will cover most of the instances.

In [12]:
max_length = 100

Setup and Training

Before creating the Dataset object, we download the model and the tokenizer. We need the tokenizer in order to tokenize the data.

In [ ]:
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelWithLMHead.from_pretrained("gpt2")
In [ ]:
model_pth = "/content/drive/My Drive/Colab Notebooks/gpt2_weights_reviews"
model.load_state_dict(torch.load(model_pth))

We send the model to the device and initialize the optimizer

In [14]:
model = model.to(device)
In [15]:
optimizer = optim.AdamW(model.parameters(), lr=3e-4)

To correctly pad and truncate the instances, we find the number of tokens used by the designator " TL;DR ":

In [16]:
tokenizer.encode(" TL;DR ")
Out[16]:
[24811, 26, 7707, 220]
In [17]:
extra_length = len(tokenizer.encode(" TL;DR ")) 

We create a simple dataset that extends the PyTorch Dataset class:

In [18]:
class ReviewDataset(Dataset):  
    def __init__(self, tokenizer, reviews, max_len):
        self.max_len = max_len
        self.tokenizer = tokenizer
        self.eos = self.tokenizer.eos_token
        self.eos_id = self.tokenizer.eos_token_id
        self.reviews = reviews
        self.result = []

        for review in self.reviews:
            # Encode the text using tokenizer.encode(). We add EOS at the end
            tokenized = self.tokenizer.encode(review + self.eos)
            
            # Padding/truncating the encoded sequence to max_len 
            padded = self.pad_truncate(tokenized)            

            # Creating a tensor and adding to the result
            self.result.append(torch.tensor(padded))

    def __len__(self):
        return len(self.result)


    def __getitem__(self, item):
        return self.result[item]

    def pad_truncate(self, name):
        name_length = len(name) - extra_length
        if name_length < self.max_len:
            difference = self.max_len - name_length
            result = name + [self.eos_id] * difference
        elif name_length > self.max_len:
            result = name[:self.max_len + 3]+[self.eos_id] 
        else:
            result = name
        return result

Then, we create the dataset:

In [19]:
dataset = ReviewDataset(tokenizer, reviews, max_length)

Using a batch_size of 32, we create the dataloader (Since the reviews are long, increasing the batch size can result in out of memory errors):

In [20]:
dataloader = DataLoader(dataset, batch_size=32, shuffle=True, drop_last=True)

GPT-2 is capable of several tasks, including summarization, generation, and translation. To train for summarization, use the same as input as labels:

In [21]:
def train(model, optimizer, dl, epochs):    
    for epoch in range(epochs):
        for idx, batch in enumerate(dl):
             with torch.set_grad_enabled(True):
                optimizer.zero_grad()
                batch = batch.to(device)
                output = model(batch, labels=batch)
                loss = output[0]
                loss.backward()
                optimizer.step()
                if idx % 50 == 0:
                    print("loss: %f, %d"%(loss, idx))
In [22]:
train(model=model, optimizer=optimizer, dl=dataloader, epochs=1)
loss: 6.946306, 0
loss: 2.313275, 50
loss: 2.081371, 100
loss: 2.080384, 150
loss: 2.071196, 200
loss: 2.179309, 250
loss: 1.935419, 300
loss: 2.011451, 350
loss: 1.980574, 400
loss: 1.855210, 450
loss: 1.986903, 500
loss: 2.003548, 550
loss: 2.083431, 600
loss: 1.981340, 650
loss: 1.922457, 700
loss: 2.141630, 750
loss: 2.239510, 800
loss: 2.168324, 850
loss: 2.148268, 900
loss: 1.916848, 950
loss: 1.999705, 1000
loss: 2.087286, 1050
loss: 1.794339, 1100
loss: 2.022352, 1150
loss: 1.772905, 1200
loss: 2.076683, 1250
loss: 1.713505, 1300
loss: 1.870195, 1350
loss: 1.819874, 1400
loss: 2.044860, 1450
loss: 1.827045, 1500
loss: 2.027030, 1550
loss: 1.979240, 1600
loss: 1.786424, 1650
loss: 2.288711, 1700
loss: 1.786224, 1750
loss: 2.204020, 1800
loss: 1.959004, 1850
loss: 1.924462, 1900
loss: 1.971964, 1950
loss: 1.797068, 2000
loss: 1.862133, 2050
loss: 1.898281, 2100
loss: 2.193818, 2150
loss: 2.005977, 2200

The online server I used was going to go offline, therefore I had to stop training a few batches early. The KeyboardInterrupt error should not be an issue, since the model's weights are saved.

The loss decreased consistently, which means that the model was learning.

Review Summarization

The summarization methodology is as follows:

  1. A review is initially fed to the model.
  2. A choice from the top-k choices is selected.
  3. The choice is added to the summary and the current sequence is fed to the model.
  4. Repeat steps 2 and 3 until either max_len is achieved or the EOS token is generated.
In [23]:
def topk(probs, n=9):
    # The scores are initially softmaxed to convert to probabilities
    probs = torch.softmax(probs, dim= -1)
    
    # PyTorch has its own topk method, which we use here
    tokensProb, topIx = torch.topk(probs, k=n)
    
    # The new selection pool (9 choices) is normalized
    tokensProb = tokensProb / torch.sum(tokensProb)

    # Send to CPU for numpy handling
    tokensProb = tokensProb.cpu().detach().numpy()

    # Make a random choice from the pool based on the new prob distribution
    choice = np.random.choice(n, 1, p = tokensProb)
    tokenId = topIx[choice][0]

    return int(tokenId)
In [24]:
def model_infer(model, tokenizer, review, max_length=15):
    # Preprocess the init token (task designator)
    review_encoded = tokenizer.encode(review)
    result = review_encoded
    initial_input = torch.tensor(review_encoded).unsqueeze(0).to(device)

    with torch.set_grad_enabled(False):
        # Feed the init token to the model
        output = model(initial_input)

        # Flatten the logits at the final time step
        logits = output.logits[0,-1]

        # Make a top-k choice and append to the result
        result.append(topk(logits))

        # For max_length times:
        for _ in range(max_length):
            # Feed the current sequence to the model and make a choice
            input = torch.tensor(result).unsqueeze(0).to(device)
            output = model(input)
            logits = output.logits[0,-1]
            res_id = topk(logits)

            # If the chosen token is EOS, return the result
            if res_id == tokenizer.eos_token_id:
                return tokenizer.decode(result)
            else: # Append to the sequence 
                result.append(res_id)
    # IF no EOS is generated, return after the max_len
    return tokenizer.decode(result)

Generating unique summaries for a 5 sample reviews:

In [30]:
sample_reviews = [review.split(" TL;DR ")[0] for review in random.sample(reviews, 5)]
sample_reviews
Out[30]:
["My local coffee shop has me addicted to their 20 oz vanilla chai lattes. At $3.90 a pop I was spending a lot of money.  I asked what brand they used, need nutritional information, of course!  They told me it was Big Train Chai Vanilla.<br />It's important to follow the directions on the can.  I made mine with just milk with a yucky result.  Use the water with a little milk as there is milk powder in the mix.<br /><br />WARNING:It's addicting!!!",
 'popcorn is very good. but only makes about half of it.tast so good like moive theater popcorn.so so so goooooooooooooooooood',
 "Love these chips. Good taste,very crispy and very easy to clean up the entire 3 oz. bag in one sitting.  NO greasy after-taste.  Original and barbecue flavors are my favorites but I haven't tried all flavors.  Great product.",
 'We have not had saltines for many years because of unwanted ingredients.  This brand is yummy and contains no unwanted ingredients.  It was also a lot cheaper by the case than at the local supermarket.',
 "Best English Breakfast tea for a lover of this variety and I've tried so many including importing it from England.  After s 20 year search I've found a very reasonable price for a most falvorful tea."]
In [31]:
for review in sample_reviews:
    summaries = set()
    print(review)
    while len(summaries) < 3:
        summary = model_infer(model, tokenizer, review + " TL;DR ").split(" TL;DR ")[1].strip()
        if summary not in summaries:
            summaries.add(summary)
    print("Summaries: "+ str(summaries) +"\n")
My local coffee shop has me addicted to their 20 oz vanilla chai lattes. At $3.90 a pop I was spending a lot of money.  I asked what brand they used, need nutritional information, of course!  They told me it was Big Train Chai Vanilla.<br />It's important to follow the directions on the can.  I made mine with just milk with a yucky result.  Use the water with a little milk as there is milk powder in the mix.<br /><br />WARNING:It's addicting!!!
Summaries: {'ADDICTING!!!', 'Addictive!!!', 'Beware!!!'}

popcorn is very good. but only makes about half of it.tast so good like moive theater popcorn.so so so goooooooooooooooooood
Summaries: {'very good', 'good taste', 'not bad, but not great.'}

Love these chips. Good taste,very crispy and very easy to clean up the entire 3 oz. bag in one sitting.  NO greasy after-taste.  Original and barbecue flavors are my favorites but I haven't tried all flavors.  Great product.
Summaries: {'very yummy', 'Love these chips!', 'My favorite Kettle chip'}

We have not had saltines for many years because of unwanted ingredients.  This brand is yummy and contains no unwanted ingredients.  It was also a lot cheaper by the case than at the local supermarket.
Summaries: {'yummo', 'yummy', 'Great product!'}

Best English Breakfast tea for a lover of this variety and I've tried so many including importing it from England.  After s 20 year search I've found a very reasonable price for a most falvorful tea.
Summaries: {'Wonderful Tea', 'The BEST tea for a lover of a cup of tea', 'Excellent tea for a lover of tea'}

The summaries reflect the content of the review. Feel free to try other reviews to test the capbailities of the model.

In this tutorial, we learned how to fine-tune the Huggingface GPT model to perform Amazon review summarization. The same methodology can be applied to any language model available on https://huggingface.co/models.

December 02, 2021 10:38 AM UTC

December 01, 2021


ItsMyCode

Python ImportError: No module named PIL Solution

ItsMyCode |

If you use the Python image library and import PIL, you might get ImportError: No module named PIL while running the project. It happens due to the depreciation of the PIL library. Instead, it would help if you install and use its successor pillow library to resolve the issue.

What is ImportError: No module named PIL?

If you use Python version 3 and try to install and use the PIL library, you will get the ImportError: No module named PIL while importing it, as shown below.

PIL is the Python Imaging Library developed by Fredrik Lundh and Contributors. Currently, PIL is depreciated, and Pillow is the friendly PIL fork by Alex Clark and Contributors. As of 2019, Pillow development is supported by Tidelift.

How to fix ImportError: No module named PIL?

If you are using Python version 3, the best way to resolve this is by uninstalling the existing PIL package and performing a clean installation of the Pillow package, as shown below.

Step 1: Uninstall the PIL package.

pip uninstall PIL

Step 2: Install the Pillow using pip as shown below on different operating systems.

On Windows

python3 -m pip install --upgrade pip
python3 -m pip install --upgrade Pillow

On Linux

easy_install Pillow 

On OSX  

brew install Pillow 

Note: Sometimes, while importing matplotlib in your Jupyter notebook, you might face this issue and doing a standard install of Pillow may not work out. You could do a force install of Pillow, as shown below, to resolve the error.

pip install --upgrade --force-reinstall Pillow
pip install --upgrade --force-reinstall matplotlib

Step 3: The most crucial class in the Python Imaging Library is the Image class, and you can import this as shown below.

from PIL import Image
im = Image.open("myimage.jpg")

If successful, this function returns an Image object. You can now use instance attributes to examine the file contents:

print(im.format, im.size, im.mode)

#Output: PPM (512, 512) RGB

Note: If you use Python version 2.7, you need to install image and Pillow packages to resolve the issue.

python -m pip install image 
python -m pip install Pillow

The post Python ImportError: No module named PIL Solution appeared first on ItsMyCode.

December 01, 2021 07:51 PM UTC


Python GUIs

Simple multithreading in PyQt/PySide apps with QThreadPool.start() — Easily run Python functions and methods in a separate CPU thread

In PyQt5 version 5.15.0, the .start() method of QThreadPool was given the ability to take a Python function, a Python method, or a PyQt/PySide slot, besides taking only a QRunnable object.

PySide was slow to join the party, but that ability was finally added in version 6.2.0.

And this ability dramatically simplifies running Python code in a separate CPU thread, avoiding the hassle of creating a QRunnable object.

For more information on creating a QRunnable object for CPU threading, see the multithreading tutorial.

The .start() method schedules the execution of a given function/method/slot on an entirely new CPU thread using QThreadPool, so it avoids blocking the main GUI thread of your app. Therefore, if you have a long-running task that needs to be done on another CPU thread, pass it to .start() and be done.

We'll build a simple demo GUI app that simulates a long-running task to demonstrate how .start() can move a user-defined Python function/method or a PyQt/PySide slot on a separate CPU thread.

But first, let’s begin with a flawed approach.

A flawed GUI design approach

Our demo GUI app is a simple sheep counter that counts upwards from 1. While this is happening, you can press a button to pick a sheep. And since this is challenging, it takes some time for the task to complete.

Demo GUI app This is how the demo GUI app looks like.

If you use PyQt5, make sure that its version is 5.15.0 or greater; otherwise, the demo GUI app won’t work for you.

python
import time

from PySide6.QtCore import Slot, QTimer
from PySide6.QtWidgets import (
    QLabel,
    QWidget,
    QMainWindow,
    QPushButton,
    QVBoxLayout,
    QApplication,
)


class MainWindow(QMainWindow):

    def __init__(self, *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)

        self.setFixedSize(250, 100)
        self.setWindowTitle("Sheep Picker")

        self.sheep_number = 1
        self.timer = QTimer()
        self.picked_sheep_label = QLabel()
        self.counted_sheep_label = QLabel()

        self.layout = QVBoxLayout()
        self.main_widget = QWidget()
        self.pick_sheep_button = QPushButton("Pick a sheep!")

        self.layout.addWidget(self.counted_sheep_label)
        self.layout.addWidget(self.pick_sheep_button)
        self.layout.addWidget(self.picked_sheep_label)

        self.main_widget.setLayout(self.layout)
        self.setCentralWidget(self.main_widget)

        self.timer.timeout.connect(self.count_sheep)
        self.pick_sheep_button.pressed.connect(self.pick_sheep)

        self.timer.start()

    @Slot()
    def count_sheep(self):
        self.sheep_number += 1
        self.counted_sheep_label.setText(f"Counted {self.sheep_number} sheep.")

    @Slot()
    def pick_sheep(self):
        self.picked_sheep_label.setText(f"Sheep {self.sheep_number} picked!")
        time.sleep(5)  # This function represents a long-running task!


if __name__ == "__main__":
    app = QApplication([])

    main_window = MainWindow()
    main_window.show()

    app.exec()
python
import time

from PyQt6.QtCore import pyqtSlot, QTimer
from PyQt6.QtWidgets import (
    QLabel,
    QWidget,
    QMainWindow,
    QPushButton,
    QVBoxLayout,
    QApplication,
)


class MainWindow(QMainWindow):

    def __init__(self, *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)

        self.setFixedSize(250, 100)
        self.setWindowTitle("Sheep Picker")

        self.sheep_number = 1
        self.timer = QTimer()
        self.picked_sheep_label = QLabel()
        self.counted_sheep_label = QLabel()

        self.layout = QVBoxLayout()
        self.main_widget = QWidget()
        self.pick_sheep_button = QPushButton("Pick a sheep!")

        self.layout.addWidget(self.counted_sheep_label)
        self.layout.addWidget(self.pick_sheep_button)
        self.layout.addWidget(self.picked_sheep_label)

        self.main_widget.setLayout(self.layout)
        self.setCentralWidget(self.main_widget)

        self.timer.timeout.connect(self.count_sheep)
        self.pick_sheep_button.pressed.connect(self.pick_sheep)

        self.timer.start()

    @pyqtSlot()
    def count_sheep(self):
        self.sheep_number += 1
        self.counted_sheep_label.setText(f"Counted {self.sheep_number} sheep.")

    @pyqtSlot()
    def pick_sheep(self):
        self.picked_sheep_label.setText(f"Sheep {self.sheep_number} picked!")
        time.sleep(5)  # This function represents a long-running task!


if __name__ == "__main__":
    app = QApplication([])

    main_window = MainWindow()
    main_window.show()

    app.exec()
python
import time

from PyQt5.QtCore import pyqtSlot, QTimer
from PyQt5.QtWidgets import (
    QLabel,
    QWidget,
    QMainWindow,
    QPushButton,
    QVBoxLayout,
    QApplication,
)


class MainWindow(QMainWindow):

    def __init__(self, *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)

        self.setFixedSize(250, 100)
        self.setWindowTitle("Sheep Picker")

        self.sheep_number = 1
        self.timer = QTimer()
        self.picked_sheep_label = QLabel()
        self.counted_sheep_label = QLabel()

        self.layout = QVBoxLayout()
        self.main_widget = QWidget()
        self.pick_sheep_button = QPushButton("Pick a sheep!")

        self.layout.addWidget(self.counted_sheep_label)
        self.layout.addWidget(self.pick_sheep_button)
        self.layout.addWidget(self.picked_sheep_label)

        self.main_widget.setLayout(self.layout)
        self.setCentralWidget(self.main_widget)

        self.timer.timeout.connect(self.count_sheep)
        self.pick_sheep_button.pressed.connect(self.pick_sheep)

        self.timer.start()

    @pyqtSlot()
    def count_sheep(self):
        self.sheep_number += 1
        self.counted_sheep_label.setText(f"Counted {self.sheep_number} sheep.")

    @pyqtSlot()
    def pick_sheep(self):
        self.picked_sheep_label.setText(f"Sheep {self.sheep_number} picked!")
        time.sleep(5)  # This function represents a long-running task!


if __name__ == "__main__":
    app = QApplication([])

    main_window = MainWindow()
    main_window.show()

    app.exec()

When you run the demo GUI app and press the Pick a sheep! button, you’ll notice that for 5 seconds, the GUI is completely unresponsive. That's not good.

The delay in GUI responsiveness comes from the line time.sleep(5) which pauses the execution of Python code for 5 seconds. This was added to simulate a long-running task – which can be helped by CPU threading, as you’ll see later on.

You can experiment by increasing the length of the delay – pass a number greater than 5 to .sleep() – and you may notice that your operating system starts complaining about the app not responding.

A proper GUI design approach

So, how can we fix our problem in the GUI responsiveness delay and improve the demo GUI app? Well, this is where the new .start() method of QThreadPool comes in!

First, we need to import QThreadPool, so let’s do that.

python
from PySide6.QtCore import QThreadPool
python
from PyQt6.QtCore import QThreadPool
python
from PyQt5.QtCore import QThreadPool

Next, we need to create a QThreadPool instance. Let’s add

python
self.thread_manager = QThreadPool()

to the __init__ block of the MainWindow class.

Now, let’s create a pick_sheep_safely() slot. This new slot will use the .start() method of QThreadPool to call the long-running pick_sheep() slot and move it from the main GUI thread to an entirely new CPU thread.

python
@Slot()
def pick_sheep_safely(self):
    self.thread_manager.start(self.pick_sheep)  # This is where the magic happens!
python
@pyqtSlot()
def pick_sheep_safely(self):
    self.thread_manager.start(self.pick_sheep)  # This is where the magic happens!

Also, make sure that you connect the pick_sheep_safely() slot with the pressed signal of self.pick_sheep_button. So, in the __init__ block of the MainWindow class, you should have

python
self.pick_sheep_button.pressed.connect(self.pick_sheep_safely)

And if you have made all these changes, the code of our improved demo GUI app should now be:

python
import time

from PySide6.QtCore import Slot, QThreadPool, QTimer
from PySide6.QtWidgets import (
    QLabel,
    QWidget,
    QMainWindow,
    QPushButton,
    QVBoxLayout,
    QApplication,
)


class MainWindow(QMainWindow):

    def __init__(self, *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)

        self.setFixedSize(250, 100)
        self.setWindowTitle("Sheep Picker")

        self.sheep_number = 1
        self.timer = QTimer()
        self.picked_sheep_label = QLabel()
        self.counted_sheep_label = QLabel()

        self.layout = QVBoxLayout()
        self.main_widget = QWidget()
        self.thread_manager = QThreadPool()
        self.pick_sheep_button = QPushButton("Pick a sheep!")

        self.layout.addWidget(self.counted_sheep_label)
        self.layout.addWidget(self.pick_sheep_button)
        self.layout.addWidget(self.picked_sheep_label)

        self.main_widget.setLayout(self.layout)
        self.setCentralWidget(self.main_widget)

        self.timer.timeout.connect(self.count_sheep)
        self.pick_sheep_button.pressed.connect(self.pick_sheep_safely)

        self.timer.start()

    @Slot()
    def count_sheep(self):
        self.sheep_number += 1
        self.counted_sheep_label.setText(f"Counted {self.sheep_number} sheep.")

    @Slot()
    def pick_sheep(self):
        self.picked_sheep_label.setText(f"Sheep {self.sheep_number} picked!")
        time.sleep(5)  # This function doesn't affect GUI responsiveness anymore...

    @Slot()
    def pick_sheep_safely(self):
        self.thread_manager.start(self.pick_sheep)  # ...since .start() is used!


if __name__ == "__main__":
    app = QApplication([])

    main_window = MainWindow()
    main_window.show()

    app.exec()
python
import time

from PyQt6.QtCore import pyqtSlot, QThreadPool, QTimer
from PyQt6.QtWidgets import (
    QLabel,
    QWidget,
    QMainWindow,
    QPushButton,
    QVBoxLayout,
    QApplication,
)


class MainWindow(QMainWindow):

    def __init__(self, *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)

        self.setFixedSize(250, 100)
        self.setWindowTitle("Sheep Picker")

        self.sheep_number = 1
        self.timer = QTimer()
        self.picked_sheep_label = QLabel()
        self.counted_sheep_label = QLabel()

        self.layout = QVBoxLayout()
        self.main_widget = QWidget()
        self.thread_manager = QThreadPool()
        self.pick_sheep_button = QPushButton("Pick a sheep!")

        self.layout.addWidget(self.counted_sheep_label)
        self.layout.addWidget(self.pick_sheep_button)
        self.layout.addWidget(self.picked_sheep_label)

        self.main_widget.setLayout(self.layout)
        self.setCentralWidget(self.main_widget)

        self.timer.timeout.connect(self.count_sheep)
        self.pick_sheep_button.pressed.connect(self.pick_sheep_safely)

        self.timer.start()

    @pyqtSlot()
    def count_sheep(self):
        self.sheep_number += 1
        self.counted_sheep_label.setText(f"Counted {self.sheep_number} sheep.")

    @pyqtSlot()
    def pick_sheep(self):
        self.picked_sheep_label.setText(f"Sheep {self.sheep_number} picked!")
        time.sleep(5)  # This function doesn't affect GUI responsiveness anymore...

    @pyqtSlot()
    def pick_sheep_safely(self):
        self.thread_manager.start(self.pick_sheep)  # ...since .start() is used!


if __name__ == "__main__":
    app = QApplication([])

    main_window = MainWindow()
    main_window.show()

    app.exec()
python
import time

from PyQt5.QtCore import pyqtSlot, QThreadPool, QTimer
from PyQt5.QtWidgets import (
    QLabel,
    QWidget,
    QMainWindow,
    QPushButton,
    QVBoxLayout,
    QApplication,
)


class MainWindow(QMainWindow):

    def __init__(self, *args, **kwargs):
        super(MainWindow, self).__init__(*args, **kwargs)

        self.setFixedSize(250, 100)
        self.setWindowTitle("Sheep Picker")

        self.sheep_number = 1
        self.timer = QTimer()
        self.picked_sheep_label = QLabel()
        self.counted_sheep_label = QLabel()

        self.layout = QVBoxLayout()
        self.main_widget = QWidget()
        self.thread_manager = QThreadPool()
        self.pick_sheep_button = QPushButton("Pick a sheep!")

        self.layout.addWidget(self.counted_sheep_label)
        self.layout.addWidget(self.pick_sheep_button)
        self.layout.addWidget(self.picked_sheep_label)

        self.main_widget.setLayout(self.layout)
        self.setCentralWidget(self.main_widget)

        self.timer.timeout.connect(self.count_sheep)
        self.pick_sheep_button.pressed.connect(self.pick_sheep_safely)

        self.timer.start()

    @pyqtSlot()
    def count_sheep(self):
        self.sheep_number += 1
        self.counted_sheep_label.setText(f"Counted {self.sheep_number} sheep.")

    @pyqtSlot()
    def pick_sheep(self):
        self.picked_sheep_label.setText(f"Sheep {self.sheep_number} picked!")
        time.sleep(5)  # This function doesn't affect GUI responsiveness anymore...

    @pyqtSlot()
    def pick_sheep_safely(self):
        self.thread_manager.start(self.pick_sheep)  # ...since .start() is used!


if __name__ == "__main__":
    app = QApplication([])

    main_window = MainWindow()
    main_window.show()

    app.exec()

When you press the Pick a sheep! button now, the pick_sheep slot is executed separately on a new CPU thread and does not block the main GUI thread. The sheep counting continues, and our GUI remains responsive – even though our demo GUI app still has to do a long-running task in the background.

Try increasing the length of the delay in our improved version of the demo GUI app – for example, time.sleep(10) – and notice that it does not affect the responsiveness of the GUI anymore.

Conclusion

And that’s it! I hope you’ll find this .start() method of QThreadPool useful for any of your PyQt/PySide GUI apps with a long-running task to be done in the background.

For an in-depth guide to building Python GUIs with PySide6 see my book, Create GUI Applications with Python & Qt6.

December 01, 2021 04:57 PM UTC


Paolo Amoroso

An Intel 8080 Assembly Suite in Python

A blog post I stumbled upon made me start a new project, crank out lots of Python code, slip down a rabbit hole of arcane and fascinating corners of retrocomputing, and overflow with fun.

The project is Suite8080, a suite of Intel 8080 Assembly cross-development tools comprising an assembler and a disassembler. I developed it in Python entirely with Replit. At over 1,500 lines of code, it’s my second and largest Python project after Spacestills, a NASA TV still image viewer of about 340 lines of code.

Hello world Intel 8080 program running in the z80pack CP/M emulator
A hello world Intel 8080 program running in the z80pack CP/M emulator on Crostini Linux. I assembled the program with asm80, the Suite8080 assembler.

Why did I write software for a half-century old CPU? This is the story of how I got started with Suite8080, how I developed it, the challenges I faced, and what I learned.

Let’s start before the beginning.

Background

I’m a hobby programmer and a Python beginner, not a professional developer.

To practice with the language, I finally set out to work on programming projects. Like Spacestills, which draws on my major interests in astronomy and space and my work experience with the outreach and education of these fields.

I’ve always been a computer enthusiast too. I’m old enough to have lived through the personal computer revolution of the 1970s and 1980s as a teenager, a revolution to which the Intel 8080 contributed a lot. It was the CPU of the early popular personal computers running the CP/M operating system, such as the Altair 8800 and IMSAI.

Witnessing the birth of technologies, devices, and companies that promised to change the world was a unique experience.

Back then, there was little distinction between using and programming computers. So, along with the popular high-level languages like BASIC and Pascal, I experimented with Zilog Z80 and Motorola 68000 Assembly programming. These experiences seeded my interest in low-level programming, languages, and development tools.

The stars aligned again in 2021, when I stumbled upon the idea that became Suite8080.


Origin of the project

A post shared on Hacker News from the blog of Dr. Brian Robert Callahan, a computer science academic and developer, kick-started my project. While browsing his blog archive I found an intriguing series of posts on demystifying programs that create programs, where he describes the development of an Intel 8080 disassembler, an assembler, and other tools.

Brian demystifies these technologies by a combination of simplified design, a preference for straightforward techniques over established algorithms, and clear commentary. He wrote:

«Second, we are not necessarily writing the assembler that someone who has decades of experience writing such tools would write. [...] We are writing an assembler that someone who has little to no programming experience or knowledge can come to understand purely through engagement with the code itself and a series of blog posts.»

Brian’s code and explanations are not just clear, they are motivating and fun.

The code in his posts is written in D (also known as Dlang), a system programming language in the C family I didn’t even know existed. But D is so readable, and Brian’s commentary so good, that his published D code is as effective as pseudocode.

As I read the posts, I frequently nodded thinking to myself "I can do that". So the idea of rewriting the Assembly tools in Python was born.

Developing assemblers seemed beyond my abilities, but two insights on Brian’s assembler sold me on the project idea.

First, Brian pointed out an assembler needs to examine just one source line with a simple syntax at a time, so the complexity of processing an entire program reduces to processing one line. Second, no fancy traditional parsing algorithms are required. An assembler can just search for a few symbols that separate the syntactic elements and split the line into the elements.


Goals

What could I learn from the project?

Besides improving my Python skills, for Suite8080, I set the goal of exploring interesting application domains such as Assembly programming and development tools. And having fun, lots of it.

The venerable Intel 8080 is a valuable learning playground at a sweet spot between simplicity, functionality, and depth. The chip comes from an era when CPUs and computer systems were simple and could be understood in full, memory layouts were fixed and simple, and there were no concerns over instruction pipelines, timing, power management issues, or other complications of modern processors.

Another goal was extending Brian’s tools with useful features, deviating from his design as necessary, and following the path to where it led.

Running executable 8080 code is not only a necessity for testing an assembler but also an opportunity for experiencing low-level programming. Therefore, I wanted my code to run on 8080 and CP/M emulators, another retrocomputing rabbit hole.

As for the Python learning opportunities, I knew my next project after Spacestills would be larger and require more structure, a few modules, and packages. Therefore, I wanted to try tools and techniques I skipped with Spacestills such as multi-module systems, automated testing, command-line scripts, and publishing packages on PyPI.

Don’t tell anyone, but I enjoy writing tests.

In a previous life, I was a Lisp enthusiast and always interactively tested expressions, functions, and code snippets in the REPL. This drove home the importance of creating tested and reliable building blocks to combine and expand on.


The Suite8080 tools

Suite8080 includes the first two tools Brian covers in his series, an 8080 disassembler and a cross-assembler. I named mine dis80 and asm80, which are command-line Python scripts.

This Linux shell session demonstrates how to run the tools and what their output looks like:

(venv) paoloamoroso@penguin:~/python/suite8080$ asm80 -v greet.asm 
Wrote 35 bytes
(venv) paoloamoroso@penguin:~/python/suite8080$ dis80 greet.com
0000 0e 09 mvi c, 09h
0002 11 09 01 lxi d, 0109h
0005 cd 05 00 call 0005h
0008 c9 ret
0009 47 mov b, a
000a 72 mov m, d
000b 65 mov h, l
000c 65 mov h, l
000d 74 mov m, h
000e 69 mov l, c
000f 6e mov l, m
0010 67 mov h, a
0011 73 mov m, e
0012 20 nop
0013 66 mov h, m
0014 72 mov m, d
0015 6f mov l, a
0016 6d mov l, l
0017 20 nop
0018 53 mov d, e
0019 75 mov m, l
001a 69 mov l, c
001b 74 mov m, h
001c 65 mov h, l
001d 38 nop
001e 30 nop
001f 38 nop
0020 30 nop
0021 2e 24 mvi l, 24h
(venv) paoloamoroso@penguin:~/python/suite8080$

The session starts by executing asm80 to assemble the greet.asm source file, a hello world program that runs on CP/M, and then disassembling it with dis80. The -v verbose option prints how many bytes the assembler wrote to the output file greet.com.

As with most other disassemblers, mine can’t tell code from data bytes. The above session shows this as the instructions after the address 0008 holding the ret instruction are spurious. The data area, which stores the $-terminated string Greetings from Suite8080.$ the program prints, starts at address 0009 and dis80 disassembles it without realizing there’s no code there.

Brian’s assembler accepts a good fraction of the Assembly language and directives of early 8080 assemblers, such as the ones by Intel, Microsoft, and Digital Research. But for explanatory purposes, he skipped some features to keep the code simple. For example, the db memory allocation directive takes only one argument. The consequence is initializing a memory block with several data bytes requires as many one-argument db clauses as the bytes or strings to initialize.

As the work proceeded and I gained more experience with Intel 8080 programming, I extended the Suite8080 tools with features that add convenience and expressiveness.

For example, I changed the assembler to accept db with multiple arguments that may be character constants, such as ‘C’ or ‘*’, or labels. Character constants may also be immediate operands of Assembly instructions and the equ directive supports character constants too. Why is it important? Imagine an Assembly instruction for comparing the value of the accumulator with an ASCII character, which in the source can be the character itself instead of an integer number.

Brian’s assembler doesn’t support macros but, by cheating a bit, I rigged mine to turn asm80 into a macro-assembler. All I did was to extend asm80 to read the input file from standard input, which gave me macros for free via the standard Unix program m4. Here is a Linux session showing how to use m4 macros with asm80:

The file ldabcmac.m4 holds a macro definition included by the ldabc.m4 Assembly program. The session prints the source files, assembles them with asm80, and disassembles the executable program with the dis80 disassembler. The following commands are executed:

$ cat ldabcmac.m4
$ cat ldabc.m4
$ cat ldabc.m4 | more
$ cat ldabc.m4 | m4 | asm80 - -o ldabc.com
$ dis80 ldabc.com

In the above shell session, as well as the previous one, the hexadecimal dump of the opcode and operands next to the instruction address is an extension to Brian’s disassembler I added to dis80.


Development environment

On the desktop I use Chrome OS only and my daily driver is an ASUS Chromebox 3, an Intel i7 machine with 16 GB RAM and 256 GB storage.

Developing in Python with Replit is the best fit for my cloud lifestyle.

The Suite8080 Replit workspace on an ASUS Chromebox 3
The Suite8080 Replit workspace on my ASUS Chromebox 3.

I love Replit, a multi-language development environment that works fully in the cloud, requires no software installation, and synchronizes across devices out of the box. Replit also comes with a nice client for GitHub, which hosts the Suite8080 project repo.

Although I develop on Replit, I install and test Suite8080 also on Crostini, the Chrome OS Linux container that runs Debian. This helps ensure my code is portable and has no obvious system dependencies, as well as check that installing Suite8080 from PyPI works as intended.


Design and coding

Suite8080 began as a Python port of Brian’s D code.

I originally used equivalent Python features, focusing on making the tools run rather than writing Pythonic code that could come later. Implementing minimal or no input validation helped evolve the tools quickly to a mostly complete state, deferring refinements to later.

I closely followed Brian’s design and code structure, introducing a few changes to make identifiers less terse or more descriptive, or reorganizing functions for adapting the code to the features I wanted.

I decided to start this way and evolve the system to where it would lead, such as new features, better algorithms, or more Pythonic code.

I worked quickly and confidently on the disassembler and the assembler because of Brian’s clear design and commentary. I tried to make the code easily understandable — well, to me — with strategic comments and documentation.

Brian took two key design decisions that simplified his assembler, which I adopted in mine.

First, he didn’t rely on recursive descent or traditional parsing algorithms. Instead, Brian’s parser scans a source line for the symbols that separate syntactic elements, and splits the line at the symbols to isolate the elements.

For example, consider the syntax of a source line:

[label:] [mnemonic [operand1[, operand2]]] [; comment]

If the parser finds a semicolon, it splits the line at ; to isolate the comment text from the rest of the line to parse. Next, it looks for a comma separating the operands of an assembly instruction and splits there, thus isolating the second operand from the rest of the line to parse. And so on.

This approach has limitations, such as fragility and no support for arbitrary arithmetic expressions.

The other design decision Brian took for explanatory purposes is to store state in global variables, half a dozen in the assembler and a couple in the disassembler. As a result, most functions don’t return values but perform side effects.

This simplifies the implementation and description of the tools but, as with Spacestills where I used global state too, I’m not completely satisfied. Not much for the implications on maintainability and extensibility of the system, but because global state complicates testing.

It’s not that we beginners aren’t aware of the drawbacks of global variables, it’s just that few Python instructional resources explain the alternatives well.


Testing and debugging

I went blind when developing the disassembler, the first tool I coded.

Since I had no matching pairs of 8080 sources and binaries handy, I had to wait for the early work on the assembler to test dis80 by comparing the disassembled binaries assembled by asm80 with the corresponding sources.

I was lucky. Making the disassembler is really straightforward, and the tool worked correctly almost immediately.

Checking out the assembler was trickier.

Testing involved manually evaluating in the Python REPL expressions that called the various assembler functions and components, comparing the output with the expected behavior.

I implemented one new Assembly mnemonic or directive at a time and checked it out by assembling a one-line test program exercising the instruction or directive. Next, I changed the program to add variations, such as different operands or arguments, a label or comment, and so on. I occasionally tested by feeding the assembler longer demo 8080 programs I collected in the asm directory of the source tree.

This workflow gave me confidence in the correctness of processing individual instructions and directives.

Pytest helped a lot with automation. However, the global state and side effects, as well as most functions not returning values, made it hard to write the tests and allowed covering only a fraction of the system. In the tests, I found it difficult to reference the global variables and mock them, an area of Python still confusing for me.

As for debugging, the problem was not so much the manual tasks, but getting stuck with no clue on how to fix an issue.

Print-debugging saved the day on over one such occasion. Don’t tell the good folks at Replit, but I didn’t use their debugger. It’s awesome, but I couldn’t figure out how to invoke it in a multi-file system. Instead, I used strategically placed print statements that displayed the process state and made clear where my assumptions were wrong.


The code

I’m mostly pleased with the code I wrote.

Aside from debugging and testing issues, it’s still easy for me to read, understand and extend the code. To preserve the code equally accessible and open when it won’t be as fresh in my mind, I’m adding more documentation and comments.

To give an idea of what the code looks like, here is the full source of the disassembler, minus the comments and with the giant instruction table abbreviated for clarity.

import argparse

MNEMONIC = 0
SIZE = 1

instructions = [
('nop', 1),
('lxi b,', 3),
('stax b', 1),
...
('call', 3),
('cpi', 2),
('rst 7', 1)
]

program = b''


def disassemble():
address = 0
mnemonic = ''
program_length = len(program)

while address < program_length:
opcode = program[address]
instruction = instructions[opcode]
mnemonic = instruction[MNEMONIC]
size = instruction[SIZE]

if address + size > program_length:
break

arg1 = arg2 = ' '
lsb = msb = ''

if size > 1:
if size == 3:
arg2 = f'{program[address + 2]:02x}'
msb = f'{program[address + 2]:02x}'
arg1 = f'{program[address + 1]:02x}'
lsb = f'{program[address + 1]:02x}h'
output = f'{address:04x} {opcode:02x} {arg1} {arg2}\t\t{mnemonic} {msb}{lsb}'
print(output)

address += size


def main():
global program
dis80_description = f'Intel 8080 disassembler / Suite8080'

parser = argparse.ArgumentParser(description=dis80_description)
parser.add_argument('filename', type=str, help=' A file name')
args = parser.parse_args()
filename = args.filename

with open(filename, 'rb') as file:
program = file.read()
disassemble()


if __name__ == '__main__':
main()

After a couple of constants, there’s a table sorted by opcode holding entries comprising the symbolic instructions and their byte length. Next come the disassembly function and the main function. The disassembler dispatches on the opcode, fetches the argument bytes if any, and prints the resulting source line preceded by the address and a dump of the opcode and argument bytes.

To understand the details of the disassembly function note that:

That’s it. Brian really demystified these tools.

The assembler code is longer and less polished. The major source of complexity is the parser, the longest function. I made major changes to the parser to extend the functionality of the db directive and make it accept multiple arguments. Brian’s parser instead limits Assembly mnemonics or directives to take from zero to two arguments. This and other new features I implemented caused changes in a few more places.

See the full code of the assembler in the source tree.


Challenges

Working on Suite8080 meant learning pytest, 8080 programming, Assembly tools, and CP/M at the same time. Aside from bootstrapping my understanding of these topics, there were other challenges.

Some challenges came from Python itself. For example, I still don’t fully grasp importing modules and packages and referencing identifiers. My confusion had consequences also on testing with pytest as discussed earlier.

Other challenges came from the porting work.

The parser ended up as a bit of a mess because I misunderstood how the findSplit() function of D works. Brian’s parser uses the function several times, which scans a string from left to right. However, in Python I replaced it with str.rpartition() that instead scans from right to left. Also, I didn’t fully comprehend one of the special cases in Brian’s parser. All this made my parser more difficult to understand than I’d like.


Lessons learned

Part of the value of Suite8080 comes from what I learn, which I hope will help me in future projects.

The main takeaway is that design is the limit to the growth and functionality of a system. If the design is not robust and scalable, its complexity hinders growth without a complete rewrite.

With the current Suite8080 design, I’m still at a stage at which I know where to put my hands to make changes or add features despite the relatively large code base. And I’m aware the system has design issues, even if I don’t have a solution yet — hey, it’s progress.

Suite8080 is making me love Python even more. But sometimes the Python code I write, which is supposedly at a higher-level of abstraction than D, seems more verbose, like in the assembler’s parser.

Finally, I’m learning a lot about the tools I build Suite8080 with. For example, I ran across 4 things tutorials don’t tell about PyPI.


Next steps

My initial plan was to implement a couple of 8080 Assembly tools and call it a day. But this rabbit hole turned out so deep, and the fun was so great, I couldn’t stop working on Suite8080.

The next step is likely an IDE to tie together the assembler and the disassembler.

I’d also like to write a basic Intel 8080 simulator.

Why a simulator? I tried several 8080 emulators for testing my Assembly programs, but none are completely satisfactory for inspecting and debugging low-level code and machine state. Some are missing useful code or memory visualization features. Others don’t support directives or specific Assembly features. A few are just broken.

I can fix this only by building my own simulator. I’m not aiming at an emulator, which seems much more complex.

In his post series, Brian discusses also a linker and an object library archiver. I may implement these tools too, but extend them by storing object files in sqlite3 databases to see how far this gets me in terms of versatility.

It’s not as glamorous as making new tools, but I also need to refactor to more Pythonic code and add to the existing tools the features that daily usage will suggest.

My adventure with Suite8080 continues. If you’re interested in keeping up to date, subscribe to the RSS feed of the Python posts of my blog, or the complete RSS feed if you’re curious what other content I publish. You can also follow my @amoroso Twitter profile.

December 01, 2021 03:44 PM UTC


PyCharm

Introducing PyCharm 2021.3!

PyCharm 2021.3 release banner

We’ve been working hard to deliver features in PyCharm that will make you more productive and your coding smoother. This release cycle introduces Poetry support, the new FastAPI project type, the Beta version of our remote development support, a redesigned Jupyter notebook experience, and much more!

Download PyCharm 2021.3

In this blog post, we will go through some of the highlights.

Poetry Support

Poetry is becoming more and more popular among Python developers, and there were a lot of issues reported to our tracker to add support for it. The good news for Poetry users (and for those wanting to try it) is that PyCharm now supports Poetry out of the box. But that’s is not all! We’ve also bundled the TOML plugin so that you also get code completion for your pyproject.toml file.

Poetry support

This feature was made possible by merging the plugin created by Koudai Aono into the PyCharm source code. Thank you Koudai for all your hard work!

FastAPI Support [Pro]

FastAPI is a popular, high-performance, Python web framework for building APIs, and PyCharm Pro now offers support for it as a project and run configuration type.

To create a new FastAPI project, select the FastAPI project type and let PyCharm install its dependencies and create the run/debug configurations for you.

FastAPI project type

Alternatively, you can also open an existing FastAPI project with PyCharm Pro, let PyCharm handle creating the virtual environment for you, and create a FastAPI run configuration yourself. PyCharm will then detect your application and run Uvicorn for you.

FastAPI run configuration type

When working with endpoints in FastAPI, you will frequently need to test them to ensure that everything is working as expected. In PyCharm Pro you can do this from the comfort of your editor using the HTTP Client integration.

Just open the “file_name.http” file (already present for new FastAPI projects created using the PyCharm wizard) and use it to send requests to your application endpoints. You can read more about this in the documentation.

FastAPI http.test file

New Endpoints Tool Window for FastAPI and Flask [Pro]

If you develop APIs, we also have a great new feature to help you manage your endpoints. PyCharm Pro will scan FastAPI and Flask project routes and list them in the new Endpoints tool window, where you can have an overview of all your URLs, as well as code completion, navigation, and refactoring capabilities. The Endpoints tool window also displays documentation for each endpoint and allows you to test it using the HTTP client.

Endpoints tool window

New Jupyter Notebook Experience [Pro]

Our team has been working hard to improve PyCharm for software engineers working in the data science sphere. PyCharm Pro 2021.3 comes with new and improved support for Jupyter notebooks.

JetBrains dataspell logo
Jupyter support in PyCharm Pro is powered by DataSpell, our new IDE designed for professional data scientists. Are you a data scientist? Try DataSpell now!

New Notebook UI

For starters, PyCharm Pro now comes with support for the classic Jupyter notebook user interface out of the box, including full compatibility with its popular shortcuts.

New jupyter UI

Interactive Outputs

We’ve also added full support for both static and JavaScript-based outputs used by popular scientific libraries such as Plotly, Bokeh, Altair, ipywidgets, and others, as well as rich support for DataFrames, that you can explore in situ, or open in a dedicated tab.

Interactive outputs in Jupyter notebooks

IDE capabilities

Having Jupyter support inside the IDE has its perks. It means that you get to benefit from all of the powerful PyCharm tools such as auto import, code completion, debugging and refactoring capabilities, and more in your Jupyter notebooks. To debug inside a Jupyter notebook, just add a breakpoint and run the cell under the debugger.

Debugging Jupyter notebooks

Remote development Beta [Pro]

Remote development support has also been a commonly requested feature from our users, and 2021.3 brings beta support for it. PyCharm users can now connect to remote machines from anywhere in the world, run PyCharm’s backend, and take advantage of remote computing power while feeling that everything is running locally.

To try it out, just click Remote Development on the Welcome screen, select the SSH option and follow the wizard to provide your credentials, establish the connection, download the IDE on the server, and open your remote project in PyCharm. You can read the documentation for more details.

New beta support for remote development

This feature is still in Beta, and your feedback on it is highly appreciated.

User experience

We have also been working to improve the overall PyCharm user experience. Some of the highlights include:

Feature trainer: New onboarding tour and a series of Git lessons

For users who are new to PyCharm or just want a refresher on how to use it, the IDE feature trainer now comes with an onboarding tour and new lessons. Trying it out is as easy as clicking ‘Learn PyCharm’ on the Welcome screen or clicking Help | Learn IDE in the main menu in the IDE.

Updated feature trainer

Reorganized Version Control Settings

We’ve reorganized the VCS settings and made them more discoverable. In Preferences / Settings | Version Control you will now find a list of all the available settings, serving as a starting point for configuring your VCS. The settings inside the sections are now organized by the most important processes: Commit, Push, and Update.

Reorganized VCS

Data editor Aggregate view [Pro]

We’ve implemented an Aggregate view for ranges of cells. This highly anticipated feature will go a long way towards helping you manage your data by taking away the burden of writing additional queries! It also makes the data editor easier to use, bringing it a step closer to Microsoft Excel and Google Sheets.

To use this feature you will first need to select the cell range you want to see the view for, then right-click and select Show Aggregate View from the menu.

New aggregate view

End of support for Mako, Buildout, and Web2Py

As announced in our previous release (2021.2 announcements), from PyCharm 2021.3 onwards there will no longer be support for Mako, Buildout, or Web2Py.

These are all the features I’d like to highlight for this release. If you want to read more about the other features included in this release check out our What’s New page or read the release notes for the full list of features implemented and bugs fixed.

As always, your feedback is highly appreciated. Feel free to share it with us on Twitter (@pycharm), or by reporting any bugs you encounter to our tracker.

Happy coding!

The PyCharm team

December 01, 2021 03:40 PM UTC


Python for Beginners

Preorder Tree Traversal Algorithm in Python

Binary trees are very useful in representing hierarchical data. In this article, we will discuss how to print all the elements in a binary tree in python. For this, we will use the preorder tree traversal algorithm. We will also implement the preorder tree traversal in python.

What is the Preorder tree traversal ?

Preorder tree traversal is a depth first traversal algorithm. Here, we start from a root node and traverse a branch of the tree until we reach the end of the branch. After that, we move to the next branch. This process continues until all the nodes in the tree are printed. 

The preorder tree traversal algorithm gets its name from the order in which the nodes of a tree are printed. In this algorithm, we first print a node. After that, we print the left child of the node. And, at last, we print the right child of the node. This process is recursive in nature. Here, the right child of a node is only printed when all the nodes in the left subtree of the current node and the current node itself have already been printed.  

Let us understand the process using the binary tree given in the following image.

Binary Tree

Let us print all of the nodes in the above binary tree using the preorder traversal.

You can observe that we have printed the values in the order 50, 20, 11,22, 53, 52, 78. Let us now formulate an algorithm for the preorder tree traversal .

Algorithm for preorder tree traversal

As you have an overview of the entire process, we can formulate the algorithm for preorder tree traversal as follows.

  1. Start from the root node.
  2. If the root is empty, goto 6.
  3. Print the root node.
  4. Traverse the left subtree recursively.
  5. Traverse the right subtree recursively.
  6. Stop.

Implementation of preorder tree traversal in Python

 As we have discussed the algorithm for preorder tree traversal and its working, Let us implement the algorithm and execute it for the binary tree given in the above image.

class BinaryTreeNode:
    def __init__(self, data):
        self.data = data
        self.leftChild = None
        self.rightChild = None


def preorder(root):
    # if root is None,return
    if root is None:
        return
    # print the current node
    print(root.data, end=" ,")
    # traverse left subtree
    preorder(root.leftChild)

    # traverse right subtree
    preorder(root.rightChild)


def insert(root, newValue):
    # if binary search tree is empty, create a new node and declare it as root
    if root is None:
        root = BinaryTreeNode(newValue)
        return root
    # if newValue is less than value of data in root, add it to left subtree and proceed recursively
    if newValue < root.data:
        root.leftChild = insert(root.leftChild, newValue)
    else:
        # if newValue is greater than value of data in root, add it to right subtree and proceed recursively
        root.rightChild = insert(root.rightChild, newValue)
    return root


root = insert(None, 50)
insert(root, 20)
insert(root, 53)
insert(root, 11)
insert(root, 22)
insert(root, 52)
insert(root, 78)
print("Preorder traversal of the binary tree is:")
preorder(root)

Output:

Preorder traversal of the binary tree is:
50 ,20 ,11 ,22 ,53 ,52 ,78 ,

You can observe that the code gives the same output that we have derived while discussing this algorithm.

Conclusion

In this article, we have discussed and implemented the preorder tree traversal algorithm in Python. To learn more about other tree traversal algorithms, you can read this article on In order tree traversal in Python and Level order tree traversal in python.

The post Preorder Tree Traversal Algorithm in Python appeared first on PythonForBeginners.com.

December 01, 2021 02:23 PM UTC