skip to navigation
skip to content

Planet Python

Last update: June 24, 2024 01:43 PM UTC

June 24, 2024


Daniel Roy Greenfeld

London Tech Zero Hackathon on July 1 and 2!

On the 1st and 2nd of July is the first-ever London Tech Zero Hackathon, supported by Kraken Tech.

Taking place in the Vinyl Factory in Soho, for two days developers, designers, and others will hack out MVPs of solutions to resolve real-life sustainability and climate problems. APIs and guidance will be provided, and contestants can build out software or hardware solutions. Individuals are welcome to attend and companies are invited to send teams. There will be prizes besides bragging rights - including a £20k mini grant to develop the winning idea.

I'll be there to help! As an employee of the hosts, I can't build your projects for you but I can provide assistance. :-)

The event will provide:

Contests provide:

Schedule:

Space is limited, register your interest here.

London Tech Zero Hackathon

June 24, 2024 10:37 AM UTC


Zato Blog

Getting started with network automation in Python

Getting started with network automation in Python

Automation in Python

All network engineers and architects understand that automation is essential for efficiency and reliability, especially with the increasing shift to cloud-based environments, where it's no longer sufficient to merely connect network elements alone, without taking the bigger IT picture into account.

The latest article about network automation will introduce you to a Python-based integration platform designed to automate network tasks, connect diverse systems, and manage complex workflows.

You'll see not only how to automate a network device, but also how to connect cloud applications outside of what a typical automation tool would do, including Jira for tickets and Microsoft 365 to manage email, all using basic Python code.

Read the article here: Network Automation in Python.

More resources

➤ Python API integration tutorial
What is a Network Packet Broker? How to automate networks in Python?
What is an integration platform?
Python Integration platform as a Service (iPaaS)
What is an Enterprise Service Bus (ESB)? What is SOA?

June 24, 2024 08:00 AM UTC

June 22, 2024


Brett Cannon

My impressions of ReScript

I maintain a GitHub Action called check-for-changed-files. For the purpose of this blog post what the action does isn&apost important, but the fact that I authored it originally in TypeScript is. See, one day I tried to update the NPM dependencies. Unfortunately, that update broke everything in a really bad way due to how the libraries I used to access PR details changed and howthe TypeScript types changed. I had also gotten tired of updating the NPM dependencies for security concerns I didn&apost have since this code was only run in CI by others for their own use (i.e. regex denial-of-service isn&apost a big concern). As such I was getting close to burning out on the project as it was a nothing but a chore to keep it up-to-date and I wasn&apost motivated to keep the code up-to-date since TypeScript felt more like a cost than a benefit for such a small code base where I&aposm the sole maintainer (there&aposs only been one other contributor to the project since the initial commit 4.5 years ago). I converted the code base to JavaScript in hopes of simplifying my life and it went better than I expected, but it still wasn&apost enough to keep me interested in the project.

And so I did what I needed to in order to be engaged with the project again: I rewrote it in another programming language that could run easily under Node. 😁 I decided I wanted to do the rewrite piecemeal to make sure I could tell if I was going to like the eventual outcome quickly rather than a complete rewrite from scratch and being unhappy with where I ended up (doing this while on parental leave made me prioritize my spare team immensely, so failing fast was tantamount). During my parental leave I learned Gleam because I loved their statement on expectations for community conduct on their homepage, but while it does compile to JavaScript I realized it works better when JavaScript is used as an escape hatch instead using Gleam to port an existing code base and so it wasn&apost a good fit for this use case.

My next language to attempt the rewrite with was ReScript thanks to my friend Dusty liking it. One of the first things I liked about the language was it had a clear migration path from JavaScript to ReScript in 5 easy steps. And since step 1 was "wrap your JavaScript code in %%raw blocks and change nothing" and step 5 was the optional "clean up" step, there was really only 3 main steps (I did have a hiccup with step 1, though, due to a bug not escaping backticks for template literals appropriately, but it was a mostly mechanical change to undo the template literals and switch to string concatenation).

A key thing that drew me to the language is its OCaml history. ReScript can have very strict typing, but ReScript&aposs OCaml background also means there&aposs type inference, so the typing doesn&apost feel that heavy. ReScript also has a functional programming leaning which I appreciate.

💡
When people say "ML" for "machine learning" it still throws me as I instinctively think they are actually referring to "Standard ML".

But having said all of that, ReScript does realize folks will be migrating or working with a preexisting JavaScript code base or libraries, and so it tries to be pragmatic for that situation. For instance, while the language has roots in OCaml, the syntax would feel comfortable to JavaScript developers. While supporting a functional style of programming, the language still has things like if/else and for loops. While the language is strongly typed, ReScript as things like its object type where the types of the fields can be inferred based on usage to make it easier to bring over JavaScript objects.

As part of the rewrite I decided to lean in on testing to help make sure things worked as I expected them to. But I ran into an issue where the first 3 testing frameworks I looked into didn&apost work with ReScript 11 (which came out in January 2024 and is the latest major version as I write this). Luckily the 4th one, rescript-zora, worked without issue (it also happens to be by my friend, Dusty, so I was able to ask questions of the author directly 😁; I initially avoided it so I wouldn&apost pester him about stuff, but I made up for it by contributing back). Since ReScript&aposs community isn&apost massive it isn&apost unexpected to have some delays in projects keeping up with stuff. Luckily the ReScript forum is active so you can get your questions answered quickly if you get stuck. But this hiccup and the one involving %%raw and template literals, the process was overall rather smooth.

In the end I would say the experience was a good one. I liked the language and transitioning from JavaScript to ReScript went relatively smoothly. As such, I have ported check-for-changed-files over to ReScript permanently in the 1.2.1 release, and hopefully no one noticed the switch. 🤞

June 22, 2024 11:35 PM UTC


Python Software Foundation

Affirm your PSF Membership Voting Status

Every PSF voting-eligible Member (Supporting, Managing, Contributing, and Fellow) needs to affirm their membership to vote in this year’s election.

If you wish to vote in this year’s election, you must affirm your intention to vote no later than Tuesday, June 25th, 2:00 pm UTC to participate in this year’s election. This year’s Board Election vote begins Tuesday, July 2nd, 2:00 pm UTC and closes on Tuesday, July 16th, 2:00 pm UTC. You should have received an email from "psf@psfmember.org <Python Software Foundation>" with subject "[Action Required] Affirm your PSF Membership voting status" that contains information on how to affirm your voting status. If you were expecting to receive the email but have not (make sure to check your spam!), please email psf-elections@pyfound.org and we’ll assist you.

PSF Bylaws

Section 4.2 of the PSF Bylaws requires that “Members of any membership class with voting rights must affirm each year to the corporation in writing that such member intends to be a voting member for such year.”

The PSF did not enforce this before 2023 because it was technically challenging. Now that we can track our entire voting membership on psfmember.org, we are implementing this requirement.

Our motivation is to ensure that our elections can meet quorum as required by Section 3.9 of our bylaws. As our membership has grown, we have seen that an increasing number of Contributing, Managing, and Fellow members with indefinite membership do not engage with our annual election, making quorum difficult to reach. 

An election that does not reach quorum is invalid. This would cause the whole voting process to be re-held as well as create an undue amount of effort on the part of the PSF Staff.

Need to check your membership status?

You can see your membership record and status on your PSF Member User Information page (note you must be logged in to psfmember.org to view that page). If you are a voting-eligible member (active Supporting, Managing, Contributing, and Fellow members of the PSF) and do not already have a login, please create an account on psfmember.org first and then email psf-donations@python.org so we can link your membership to your account. Please ensure you have an account linked to your membership so that we can have the most up-to-date contact information for you in the future.

What happens next?

You’ll get an email from OpaVote with a ballot on or before July 2nd and then you can vote!

Check out our PSF Membership page to learn more. If you have questions about membership or nominations please email psf-donations@python.org. Join the PSF Discord for the Board Office Hours, June 11th, 4-5 PM UTC, and June 18th, 12-1 PM UTC. You are also welcome to join the discussion about the PSF Board election on our forum.

June 22, 2024 06:32 PM UTC

It’s time to make nominations for the PSF Board Election!

This year’s Board Election Nomination period opens tomorrow and closes on June 25th. Who runs for the board? People who care about the Python community, who want to see it flourish and grow, and also have a few hours a month to attend regular meetings, serve on committees, participate in conversations, and promote the Python community. Check out our Life as Python Software Foundation Director video to learn more about what being a part of the PSF Board entails. We also invite you to review our Annual Impact Report for 2023 to learn more about the PSF mission and what we do.

Current Board members want to share what being on the Board is like and are making themselves available to answer all your questions about responsibilities, activities, and time commitments via online chat. Please join us on the PSF Discord for the Board Election Office Hours (June 11th, 4-5 PM UTC and June 18th, 12-1 PM UTC)  to talk with us about running for the PSF Board. 

Board Election Timeline

Nominations open: Tuesday, June 11th, 2:00 pm UTC
Nominations close: Tuesday, June 25th, 2:00 pm UTC
Voter application cut-off date: Tuesday, June 25th, 2:00 pm UTC
Announce candidates: Thursday, June 27th
Voting start date: Tuesday, July 2nd, 2:00 pm UTC
Voting end date: Tuesday, July 16th, 2:00 pm UTC 

Not sure what UTC is for you locally? Check here!

Nomination details

You can nominate yourself or someone else. We encourage you to reach out to people before you nominate them to ensure they are enthusiastic about the potential of joining the Board. Nominations open on Tuesday, June 11th, 2:00 PM UTC, and end on June 25th, 2:00 PM UTC.

Please send nominations and questions regarding nominations to psf-board-nominations@pyfound.org. Include the name, email address, and an endorsement of the nominee's candidacy for the PSF Board. To get an idea of what a nomination looks like, check out the Nominees for 2023 PSF Board page. After the nomination period closes, we will request a statement and other relevant information from the nominees to publish for voter review.

Voting Reminder!

Every PSF Voting Member (Supporting, Managing, Contributing, and Fellow) needs to affirm their membership to vote in this year’s election. You should have received an email from "psf@psfmember.org <Python Software Foundation>" with subject "[Action Required] Affirm your PSF Membership voting status" that contains information on how to affirm your voting status.

You can see your membership record and status on your PSF Member User Information page. If you are a voting-eligible member and do not already have a login, please create an account on psfmember.org first and then email psf-donations@python.org so we can link your membership to your account.

June 22, 2024 06:32 PM UTC

June 21, 2024


DataWars.io

Replit Teams for Education Deprecation: All you need to know | DataWars

June 21, 2024 09:30 PM UTC


Ned Batchelder

Coverage at a crossroads

This is an interesting time for coverage.py: I’m trying to make use of new facilities in Python to drastically reduce the execution-time overhead, but it’s raising tricky questions about how coverage should work.

The current situation is a bit involved. I’ll try to explain, but this will get long, with a few sections. Come talk in Discord about anything here, or anything about coverage at all.

How coverage works today

Much of this is discussed in How coverage.py works, but I’ll give a quick overview here to set the stage for the rest of this post.

Trace function

Coverage knows what code is executed because it registers a trace function which gets called for each line of Python execution. This is the source of the overhead: calling a function is expensive. Worse, for statement coverage, we only need one bit of information for each line: was it ever executed? But the trace function will be invoked for every execution of the line, taking time but giving us no new information.

Arcs

The other thing to know about how coverage works today is arcs. Coverage measures branch coverage by tracking the previous line that was executed, then the trace function can record an arc: the previous line number and the current line number as a pair. Taken all together, these arcs show how execution has moved through the code.

Most arcs are uninteresting. Consider this simple program:

1print("Hello")

2print("world")
3print("bye")

This will result in arcs (1, 2) and (2, 3). Those tell us that lines 1, 2, and 3 were all executed, but nothing more interesting. Lots of arcs are this kind of straight-line information.

But when there are choices in the execution path, arcs tell us about branches taken:

1a = 1

2if a == 1:
3    print("a is one!")
4else:
5    print("a isn't one!")
6print("Done")

Now we’ll collect these arcs during execution: (1, 2), (2, 3), (3, 6). When coverage.py analyzes the code, it will determine that there were two possible arcs that didn’t happen: (2, 5) and (5, 6).

The set of all possible arcs is used to determine where the branches are. A branch is a line that has more than one possible arc leaving it. In this case, the possible arcs include (2, 3) and (2, 5), so line 2 is a branch, and only one of the possible arcs happened, so line 2 is marked as a partial branch.

SlipCover

SlipCover is a completely different implementation of a coverage measurement tool, focused on minimal execution overhead. They’ve accomplished this in a few clever ways, primarily by instrumenting the code to directly announce what is happening rather than using a trace function. Synthetic bytecode or source lines are inserted into your code to record data during execution without using a trace function.

SlipCover’s author (Juan Altmayer Pizzorno) and I have been talking for years about how SlipCover and coverage.py each work, with the ultimate goal to incorporate SlipCover-like techniques into coverage.py. SlipCover is an academic project, so was never meant to be widely used and maintained long-term.

One of the ways that SlipCover reduces overhead is to remove instrumentation once it has served its purpose. After a line has been marked as executed, there is no need to keep that line’s inserted bytecode. The extra tracking code can be removed to avoid its execution overhead.

Instrumenting and un-instrumenting code is complex. With Python 3.12, we might be able to get the best aspects of instrumented code without having to jump through complicated hoops.

Python 3.12: sys.monitoring

Python 3.12 introduced a new way for Python to track execution. The sys.monitoring feature lets you register for events when lines are executed. This is similar to a classic trace function, but the crucial difference is you can disable the event line-by-line. Once a line has reported its execution to coverage.py, that line’s event can be disabled, and the line will run at full speed in the future without the overhead of reporting to coverage. Other lines in the same file will still report their events, and they can each be disabled once they have fired once. This gives us low overhead because lines mostly run at full speed.

Coverage.py has supported line coverage in 3.12 since 7.4.0 with only 5% overhead or so.

Unfortunately, branch coverage is a different story. sys.monitoring has branch events, but they are disabled based only on the “from” line, not on the “from/to” pair. In our example above, line 2 could branch to 3 or to 5. When sys.monitoring tells us line 2 branched to 3, we have to keep the event enabled until we get an event announcing line 2 branching to line 5. This means we continue to suffer event overhead during execution of the 2-to-3 branch, plus we have to do our own bookkeeping to recognize when both branch destinations have been seen before we can disable the event.

As a result, branch coverage doesn’t get the same advantages from sys.monitoring as statement coverage.

I’ve been talking to the core Python devs about changing how sys.monitoring works for branches. But even if we come to a workable design, because of Python’s yearly release cycle it wouldn’t be available until 3.14 in October 2025 at the earliest.

Using lines for branches

In SlipCover, Juan came up with an interesting approach to use sys.monitoring line events for measuring branch coverage. SlipCover already had a few ways of instrumenting code, rewriting the code during import to add statements that report execution data. The new idea was to add lines at the destinations of branches. The lines don’t have to do anything. If sys.monitoring reports that the line was executed, then it must mean that the branch to that line happened.

As an example, our code from above would be rewritten to look something like this:

a = 1

if a == 1:                  # line 2
    NO-OP                   # line A, marked as 2->3
    print("a is one!")      # line 3
else:                       
    NO-OP                   # line B, marked as 2->5
    print("a isn't one!")   # line 5
print("Done")

If sys.monitoring reports that line A was executed, we can record a branch from 2 to 3 and disable the event for line A. This reduces the overhead and still lets us later get an event for line B when line 2 branches to line 5.

This seems to give us the best of all the approaches: events can be disabled for each choice from a branch independently, and the inserted lines can be as minimal as possible.

Problems

There are a few problems with adapting this clever approach to coverage.py.

Moving away from arcs

This technique changes the data we collect during execution. We no longer get a (1, 2) arc, for example. Overall, that’s fine because that arc isn’t involved in computing branches anyway. But arcs are used as intermediate data throughout coverage.py, including its extensive test suite. How can we move to arc-less measurement on 3.12+ without refactoring many tests and while still supporting Pythons older than 3.12?

I’ve gotten a start on adapting some test helpers to avoid having to change the tests, so this might not be a big blocker.

Is every multi-arc a branch?

Another problem is how coverage.py determines branches. As I mentioned above, coverage.py statically analyzes code to determine all possible arcs. Any line that could arc to more than one next line is considered a branch. This works great for classic branches like if statements. But what about this code?

1def func(x):

2    try:
3        if x == 10:
4            print("early return")
5            return
6    finally:
7        print("finally")
8    print("finished")

If you look at line 7, there are two places it could go next. If x is 10, line 7 will return from the function because of the return on line 5. If x is not 10, the line 7 will be followed by line 8. Coverage.py’s static analysis understands these possibilities and includes both (7, return) and (7, 8) in the possible arcs, so it considers line 7 a branch. But is it? The conditional here is really on line 3, which is already considered a branch.

I mention this as a problem here because the clever NO-OP rewriting scheme depends on being able to insert a line at a destination that clearly indicates where the branch started from. In this finally clause, where would we put the NO-OP line for the return? The rewriting scheme breaks down if the same destination can be reached from different starting points.

But maybe those cases are exactly the ones that shouldn’t be considered branches in the first place?

Where are we?

I’ve been hacking on this in a branch in the coveragepy repo. It’s a complete mess with print-debugging throughout just to see where this idea could lead.

For measuring performance, we have a cobbled-together benchmark tool in the benchmark directory of the repo. I could use help making it more repeatable and useful if you are interested.

I’m looking for thoughts about how to resolve some of these issues, and how to push forward on a faster coverage.py. There’s now a #coverage-py channel in the Python Discord where I’d welcome feedback or ideas about any of this, or anything to do with coverage.py.

June 21, 2024 12:14 PM UTC


Real Python

The Real Python Podcast – Episode #209: Python's Command-Line Utilities & Music Information Retrieval Tools

What are the built-in Python modules that can work as useful command-line tools? How can these tools add more functionality to Windows machines? Christopher Trudeau is back on the show this week, bringing another batch of PyCoder's Weekly articles and projects.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 21, 2024 12:00 PM UTC


death and gravity

reader 3.13 released – scheduled updates

Hi there!

I'm happy to announce version 3.13 of reader, a Python feed reader library.

What's new? #

Here are the highlights since reader 3.12.

Scheduled updates #

reader now allows updating feeds at different rates via scheduled updates.

The way it works is quite simple: each feed has an update interval that determines when the feed should be updated next; calling update_feeds(​scheduled​=True) updates only feeds that should be updated at or before the current time.

The interval can be configured by the user globally or per-feed through the .reader​.update tag. In addition, you can specify a jitter; for an interval of 24 hours, a jitter of 0.25 means the update will occur any time in the first 6 hours of the interval.

In the future, the same mechanism will be used to handle 429 Too Many Requests.

Improved documentation #

As part of rewriting the Updating feeds user guide section to talk about scheduled updates, I've added a new section about being polite to servers.

Also, we have a new recipe for adding custom headers when retrieving feeds.

mark_as_read reruns #

You can now re-run the mark_as_read plugin for existing entries by adding the .reader​.mark-as-read​.once tag to a feed. Thanks to Michael Han for the pull request!


That's it for now. For more details, see the full changelog.

Want to contribute? Check out the docs and the roadmap.

Learned something new today? Share this with others, it really helps!

What is reader? #

reader takes care of the core functionality required by a feed reader, so you can focus on what makes yours different.

reader in action reader allows you to:

...all these with:

To find out more, check out the GitHub repo and the docs, or give the tutorial a try.

Why use a feed reader library? #

Have you been unhappy with existing feed readers and wanted to make your own, but:

Are you already working with feedparser, but:

... while still supporting all the feed types feedparser does?

If you answered yes to any of the above, reader can help.

The reader philosophy #

Why make your own feed reader? #

So you can:

Obviously, this may not be your cup of tea, but if it is, reader can help.

June 21, 2024 07:43 AM UTC


Seth Michael Larson

CPython vulnerability data infrastructure (CVE and OSV)

CPython vulnerability data infrastructure (CVE and OSV)

AboutBlogNewsletterLinks

CPython vulnerability data infrastructure (CVE and OSV)

Published 2024-06-21 by Seth Larson
Reading time: minutes

This critical role would not be possible without funding from the Alpha-Omega project. Massive thank-you to Alpha-Omega for investing in the security of the Python ecosystem!

Let's talk about some vulnerability data infrastructure for the Python Software Foundation (PSF). In the recent past, most of the vulnerability data processes were manual. This worked okay because the PSF processes a low double-digit number of vulnerabilities each year (2023 saw 12 published vulnerabilities).

However, almost all of this manual processing was being done by me as full-time staff. Imagining this work being done by either someone else on staff or a volunteer isn't great, because it's a non-zero amount of extra work. Automation to the rescue!

How the vulnerability data flows

The PSF uses the CVE database as its “source of truth” for vulnerability data which then gets imported into our Open Source Vulnerability (OSV) database by translating CVE records into OSV records.

We manually update CVE information in a program called Vulnogram which provides a helpful UI for CVE services.

So what is the minimum amount of information we need to manually create to automatically generate the rest? This is the current list of data the PSF CVE Numbering Authority team creates manually:

  • Advisory text and description
  • CVE reference to the advisory
  • GitHub issue (as a CVE reference)

GitHub Repository
GitHub Reposito...
CVE
Services
CVE...
PSF OSV Database
PSF OSV Database
GitHub
Issue
GitHub...
GitHub
Merged PR
GitHub...
git commit
git commit
git tag
(v3.12.4)
git tag...
CVE Record
CVE Record
CVE Affected Versions
CVE Affected...
CVE References
CVE References
OSV Record
OSV Record
OSV Affected Commits
OSV Affected...
OSV Affected Tags
OSV Affected...
OSV References
OSV ReferencesText is not SVG - cannot display
Blue items are manually created, green items are automatically generated.

Advisories are sent to the security-announce@python.org mailing list and then the description is reused as the CVE record's description. The linkages between a GitHub pull request and a GitHub issue is maintained by Bedevere which automatically updates metadata as new pull requests are opened for an issue.

From this information we can use scripts to generate the rest in two stages:

  • CVE affected versions and references are populated by finding git tags that contain git commits.
  • All OSV record information is generated from CVE records. New OSV records are automatically assigned their IDs by OSV tooling. The central OSV database calculates affected git tags on our behalf.

The PSRT publishes advisories and patches once they're available in the latest CPython versions. For low-severity vulnerabilities there typically isn't an immediate release of all bugfix and security branches (ie 3.12, 3.11, 3.10, 3.9, etc). This means that many vulnerability records will be need to be updated over time as fixes are merged and released into other CPython versions so these scripts run periodically to avoid tracking these updates manually.

Other items

  • This week marks my one-year anniversary in the role of Security Developer-in-Residence. Woohoo! 🥳
  • Advisories and records were published for CVE-2024-0397 and CVE-2024-4032.
  • Wrote the draft for "Trusted Publishers for All Package Repositories" for the OpenSSF Securing Software Repositories WG. This document would be used by other package repositories looking to implement Trusted Publishers like PyPI has.
  • Working on documenting more of PSRT processes like membership.
  • Triaging reports to the PSRT.
  • Reviewing PEP 740 and other work for generating publish provenance from William Woodruff.
  • Google Summer of Code contributor, Nate Ohlson has been making excellent progress on the effort for CPython to adopt the "Hardened Compiler Options Guide" from the OpenSSF Best Practices WG. You can follow along with his progress on Mastodon and on GitHub.

That's all for this post! 👋 If you're interested in more you can read the last report.

Thanks for reading! ♡ Did you find this article helpful and want more content like it? Get notified of new posts by subscribing to the RSS feed or the email newsletter.


This work is licensed under CC BY-SA 4.0

June 21, 2024 12:00 AM UTC

June 20, 2024


The Python Show

45 - Computer Vision and Data Science with Python

In this episode, we welcome Kyle Stratis from Real Python to the show to chat with us about computer vision and Python.

We chatted about several different topics including:

Links

June 20, 2024 10:11 PM UTC


Paolo Melchiorre

Django 5 by Example preface

The story of my experience in writing the preface of the book “Django By Example” by Antonio Melé.

June 20, 2024 10:00 PM UTC


PyBites

Introducing eXact-RAG: the ultimate local Multimodal Rag

Exact-RAG is a powerful multimodal model designed for Retrieval-Augmented Generation (RAG). It seamlessly integrates text, visual and audio information, allowing for enhanced content understanding and generation.

In the rapidly evolving landscape of the Large language model (LLM), the quest for more efficient and versatile models continues unabated.

One of the latest advancements in this realm is the emergence of eXact-RAG, a multimodal RAG (Retrieval Augmented Generation) system that leverages state-of-the-art technologies to deliver powerful results.

eXact-RAG stands out for its integration of LangChain and Ollama for backend and model serving, FastAPI for REST API service, and its adaptability through the utilization of ChromaDB or Elasticsearch.

Coupled with an intuitive user interface built on Streamlit, eXact-RAG represents a significant leap forward in LLMs capabilities.

A Step back: what is a RAG?

Retrieval Augmented Generation, or RAG, is an architectural approach that can improve the efficacy of Large Language Model (LLM) applications by leveraging custom data. This is done by retrieving data/documents relevant to a question or task and providing them as context for the LLM. RAG has shown success in support chatbots and Q&A systems that need to maintain up-to-date information or access domain-specific knowledge.

rag processRAG process schema. Source: neo4j.com

As the name suggests, RAG has two phases: retrieval and content generation. In the retrieval phase, algorithms search for and retrieve snippets of information relevant to the user’s prompt or question. In an open-domain, consumer setting, those facts can come from indexed documents on the internet; in a closed-domain, enterprise setting, a narrower set of sources are typically used for added security and reliability.

Understanding eXact-RAG

At its core, eXact-RAG combines the principles of RAG, which focuses on retrieval-based conversational agents, with multimodal capabilities, enabling it to process and generate responses from various modalities such as text, images, and audio.

This versatility makes eXact-RAG well-suited for a wide range of applications, from chatbots to content recommendation systems and beyond.

Technologies Powering eXact-RAG

  1. LangChain and Ollama: LangChain and Ollama serve as the backbone of eXact-RAG, providing robust infrastructure for model development, training, and serving.

    LangChain offers a comprehensive suite of tools for natural language understanding and processing, while Ollama specializes in multimodal learning, enabling eXact-RAG to seamlessly integrate and process diverse data types.
  2. FastAPI for REST API Service: FastAPI, known for its high performance and simplicity, serves as the interface for eXact-RAG, facilitating seamless communication between the backend system and external applications.

    Its asynchronous capabilities ensure rapid response times, crucial for real-time interactions.
  3. ChromaDB or Elasticsearch: eXact-RAG offers flexibility in data storage and retrieval by supporting both ChromaDB and Elasticsearch.

    ChromaDB provides a lightweight solution suitable for simpler tasks, while Elasticsearch caters to more complex operations involving vast amounts of data. This versatility enables users to tailor eXact-RAG to their specific needs, balancing performance and scalability accordingly.

User-Friendly Interface with Streamlit for demo purposes

The user interface of eXact-RAG is built on Streamlit, a popular framework for creating interactive web applications with Python. Streamlit’s intuitive design and seamless integration with Python libraries allow users to interact with eXact-RAG effortlessly.

Through the interface, users can input queries, explore results, and interact with generated content across various modalities, enhancing the overall user experience.

Related article: From concepts to MVPs: Validate Your Idea in few Lines of Code with Streamlit

Applications of eXact-RAG

The versatility of eXact-RAG opens up a myriad of applications across different domains:

y4E69q0zeenXw40ATAnjp hbJEg Hg6svocuBs0pHZUj7RLz2u DmJfgS74V84CGpBohhisR7aY3HosogTdPySFrp9EbYlvxJkFHE5o0 GboogOSPoS08jmxVtvwL9wjsdf26wQdHDCb2 kiDLaOkFkeXact-RAG flow schema

Let’s play with eXact-RAG

The first step to use eXact-RAG is to run your preferred LLM model using Ollama or get an OpenAI token. Both are supported by the RAG and to be sure to configure them rightly, it’s necessary to fill the settings.toml with the preferred options. Here an example:

[embedding]
type = "openai"
api_key = ""
chat.model_name = "gpt-3.5-turbo"
...

[database]
type = "chroma"
persist_directory = "persist"
...

In this example it’s possible to configure the LLM model and the vector database in which store the embeddings of your local data.

eXact-RAG is a multimodal RAG, for this reason it can ingest different kind of data like audio files or images. It is possible to choose, in the installation phase, which “backends” to install:

poetry install # -E audio -E image

* this feature gives the possibility to process images even if the user has not the (hardware) possibility to run a Vision model like llava locally but still wants to pass images as data.

Now the job is done! The following command

poetry run python exact_rag/main.py

starts the server and at http://localhost:8080/docs it is showed the OpenAPI document (swagger) with all the available endpoints.

Demo

eXact-RAG was built as a server for multimodal RAGs but we provide also a user interface just for demo purposes to test all the features.
To run the UI just use the command:

poetry run streamlit run frontend/ui.py

Now, the page at http://localhost:8501 will show a chat interface like in the following example:

Conclusion

eXact-RAG represents a significant advancement in the field of multimodal RAG systems, offering unparalleled versatility and performance through its integration of cutting-edge technologies.

With its robust backend powered by LangChain and Ollama, flexible data storage options, and user-friendly interface built on Streamlit, eXact-RAG is poised to revolutionize various applications of natural language processing and multimodal learning.

As the demand for sophisticated LLM solutions continues to grow, eXact-RAG stands ready to meet the challenges of tomorrow’s digital landscape.

June 20, 2024 04:57 PM UTC

What is the Repository Pattern and How to Use it in Python?

The repository pattern is a design pattern that helps you separate business logic from data access code.

It does so by providing a unified interface for interacting with different data sources, bringing the following advantages to your system:

A practical example

Let’s use Python and sqlmodel (PyPI) to demonstrate this pattern (code here):

from abc import ABC, abstractmethod
from sqlmodel import SQLModel, create_engine, Session, Field, select


# Define the model
class Item(SQLModel, table=True):
    id: int = Field(default=None, primary_key=True)
    name: str


# Repository Interface
class IRepository(ABC):
    @abstractmethod
    def add(self, item: Item):
        pass

    @abstractmethod
    def get(self, name: str) -> Item | None:
        pass


# SQLModel implementation
class SQLModelRepository(IRepository):
    def __init__(self, db_string="sqlite:///todo.db"):
        self.engine = create_engine(db_string)
        SQLModel.metadata.create_all(self.engine)
        self.session = Session(self.engine)

    def add(self, item: Item):
        self.session.add(item)
        self.session.commit()

    def get(self, name: str) -> Item | None:
        statement = select(Item).where(Item.name == name)
        return self.session.exec(statement).first()


# CSV implementation
class CsvRepository(IRepository):
    def __init__(self, file_path="todo.csv"):
        self._file_path = file_path

    def add(self, item: Item):
        with open(self._file_path, "a") as f:
            f.write(f"{item.id},{item.name}\n")

    def get(self, name: str) -> Item | None:
        with open(self._file_path, "r") as f:
            return next(
                (
                    Item(id=int(id_str), name=item_name)
                    for line in f
                    if (id_str := line.strip().split(",", 1)[0])
                    and (item_name := line.strip().split(",", 1)[1]) == name
                ),
                None,
            )


if __name__ == "__main__":
    repo = SQLModelRepository()
    repo.add(Item(name="Buy Milk"))
    sql_item = repo.get("Buy Milk")

    # Swap out the repository implementation
    csv_repo = CsvRepository()
    csv_repo.add(Item(id=1, name="Buy Milk"))
    csv_item = csv_repo.get("Buy Milk")

    print(f"{sql_item=}, {csv_item=}, {sql_item == csv_item=}")
    # outputs:
    # sql_item=Item(name='Buy Milk', id=1), csv_item=Item(id=1, name='Buy Milk'), sql_item == csv_item=True

Note: In this implementation, we did not add error handling to keep the example relatively small. You should ensure that if something goes wrong (e.g., database connection issues or file access errors), the program can handle the error gracefully and provide useful feedback.


The example might be a bit contrived, but it shows how we can leverage the repository pattern in Python.

Again, the advantage is that we have a flexible design that allows us to swap out the data access implementation without changing the business logic. This makes our code easier to test and maintain.

Have you used this pattern yourself and if so, how? Hit us up on social media and let us know: @pybites …

June 20, 2024 12:17 PM UTC


Talk Python to Me

#467: Data Science Panel at PyCon 2024

I have a special episode for you this time around. We're coming to you live from PyCon 2024. I had the chance to sit down with some amazing people from the data science side of things: Jodie Burchell, Maria Jose Molina-Contreras, and Jessica Greene. We cover a whole set of recent topics from a data science perspective. Though we did have to cut the conversation a bit short as they were coming from and go to talks they were all giving but it was still a pretty deep conversation.<br/> <br/> <strong>Episode sponsors</strong><br/> <br/> <a href='https://talkpython.fm/sentry'>Sentry Error Monitoring, Code TALKPYTHON</a><br> <a href='https://talkpython.fm/code-comments'>Code Comments</a><br> <a href='https://talkpython.fm/training'>Talk Python Courses</a><br/> <br/> <strong>Links from the show</strong><br/> <br/> <div><b>Jodie Burchell</b>: <a href="https://twitter.com/t_redactyl" target="_blank" rel="noopener">@t_redactyl</a><br/> <b>Jessica Greene</b>: <a href="https://www.linkedin.com/in/jessica0greene" target="_blank" rel="noopener">linkedin.com</a><br/> <b>Maria Jose Molina-Contreras</b>: <a href="https://www.linkedin.com/in/mjmolinacontreras/" target="_blank" rel="noopener">linkedin.com</a><br/> <br/> <b>Talk Python's free Shiny course</b>: <a href="https://talkpython.fm/shiny" target="_blank" rel="noopener">talkpython.fm/shiny</a><br/> <b>Watch this episode on YouTube</b>: <a href="https://www.youtube.com/watch?v=QYiRrHnEomw" target="_blank" rel="noopener">youtube.com</a><br/> <b>Episode transcripts</b>: <a href="https://talkpython.fm/episodes/transcript/467/data-science-panel-at-pycon-2024" target="_blank" rel="noopener">talkpython.fm</a><br/> <br/> <b>--- Stay in touch with us ---</b><br/> <b>Subscribe to us on YouTube</b>: <a href="https://talkpython.fm/youtube" target="_blank" rel="noopener">youtube.com</a><br/> <b>Follow Talk Python on Mastodon</b>: <a href="https://fosstodon.org/web/@talkpython" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>talkpython</a><br/> <b>Follow Michael on Mastodon</b>: <a href="https://fosstodon.org/web/@mkennedy" target="_blank" rel="noopener"><i class="fa-brands fa-mastodon"></i>mkennedy</a><br/></div>

June 20, 2024 08:00 AM UTC


Matt Layman

Password Resets and Signal Handling - Building SaaS #194

In this episode, we hooked up the email confirmation signal to the prompt sending code so that new users can use JourneyInbox immediately. Then we focused on handling all the functionality related to the password reset feature. This meant customizing a bunch of django-allauth forms.

June 20, 2024 12:00 AM UTC

June 19, 2024


Django Weblog

DjangoCon US: Call for Venue Proposal 2025

DEFNA is seeking proposals for a venue for DjangoCon US 2025 and ideally 2026. You can read the details on DEFNA’s site.

For 2025, we are looking at conference dates of October 5-10, 2025 or October 12-17, 2025.

The deadline for submissions is July 28, 2024. If you have any questions or concerns, please reach out to the DEFNA board at hello AT defna.org. We look forward to hearing from you!

June 19, 2024 04:55 PM UTC


Real Python

Build a Guitar Synthesizer: Play Musical Tablature in Python

Have you ever wanted to compose music without expensive gear or a professional studio? Maybe you’ve tried to play a musical instrument before but found the manual dexterity required too daunting or time-consuming. If so, you might be interested in harnessing the power of Python to create a guitar synthesizer. By following a few relatively simple steps, you’ll be able to turn your computer into a virtual guitar that can play any song.

In this tutorial, you’ll:

  • Implement the Karplus-Strong plucked string synthesis algorithm
  • Mimic different types of string instruments and their tunings
  • Combine multiple vibrating strings into polyphonic chords
  • Simulate realistic guitar picking and strumming finger techniques
  • Use impulse responses of real instruments to replicate their unique timbre
  • Read musical notes from scientific pitch notation and guitar tablature

At any point, you’re welcome to download the complete source code of the guitar synthesizer, as well as the sample tablature and other resources that you’ll use throughout this tutorial. They might prove useful in case you want to explore the code in more detail or get a head start. To download the bonus materials now, visit the following link:

Get Your Code: Click here to download the free sample code that you’ll use to build a guitar synthesizer in Python.

Take the Quiz: Test your knowledge with our interactive “Build a Guitar Synthesizer” quiz. You’ll receive a score upon completion to help you track your learning progress:


Interactive Quiz

Build a Guitar Synthesizer

In this quiz, you'll test your understanding of what it takes to build a guitar synthesizer in Python. By working through this quiz, you'll revisit a few key concepts from music theory and sound synthesis.

Demo: Guitar Synthesizer in Python

In this step-by-step guide, you’ll build a plucked string instrument synthesizer based on the Karplus-Strong algorithm in Python. Along the way, you’ll create an ensemble of virtual instruments, including an acoustic, bass, and electric guitar, as well as a banjo and ukulele. Then, you’ll implement a custom guitar tab reader so that you can play your favorite songs.

By the end of this tutorial, you’ll be able to synthesize music from guitar tablature, or guitar tabs for short, which is a simplified form of musical notation that allows you to play music without having to learn how to read standard sheet music. Finally, you’ll store the result in an MP3 file for playback.

Below is a short demonstration of the synthesizer, re-creating the iconic soundtracks of classic video games like Doom and Diablo. Click the play button to listen to the sample output:

E1M1 - At Doom's Gate (Bobby Prince), Tristram (Matt Uelmen)

Once you find a guitar tab that you like, you can plug it into your Python guitar synthesizer and bring the music to life. For example, the Songsterr website is a fantastic resource with a wide range of songs you can choose from.

Project Overview

For your convenience, the project that you’re about to build, along with its third-party dependencies, will be managed by Poetry. The project will contain two Python packages with distinctly different areas of responsibility:

  1. digitar: For the synthesis of the digital guitar sound
  2. tablature: For reading and interpreting guitar tablature from a file

You’ll also design and implement a custom data format to store guitar tabs on disk or in memory. This will allow you to play music based on a fairly standard tablature notation, which you’ll find in various places on the Internet. Your project will also provide a Python script to tie everything together, which will let you interpret the tabs with a single command right from your terminal.

Now, you can dive into the details of what you’ll need to set up your development environment and start coding.

Prerequisites

Although you don’t need to be a musician to follow along with this tutorial, a basic understanding of musical concepts such as notes, semitones, octaves, and chords will help you grasp the information more quickly. It’d also be nice if you had a rough idea of how computers represent and process digital audio in terms of sampling rate, bit depth, and file formats like WAV.

But don’t worry if you’re new to these ideas! You’ll be guided through each step in small increments with clear explanations and examples. So, even if you’ve never done any music synthesis before, you’ll have a working digital guitar or digitar by the end of this tutorial.

Note: You can learn music theory in half an hour by watching an excellent and free video by Andrew Huang.

The project that you’ll build was tested against Python 3.12 but should work fine in earlier Python versions, too, down to Python 3.10. In case you need a quick refresher, here’s a list of helpful resources covering the most important language features that you’ll take advantage of in your digital guitar journey:

Other than that, you’ll use the following third-party Python packages in your project:

  • NumPy to simplify and speed up the underlying sound synthesis
  • Pedalboard to apply special effects akin to electric guitar amplifiers
  • Pydantic and PyYAML to parse musical tablature representing finger movements on a guitar neck

Read the full article at https://realpython.com/python-guitar-synthesizer/ »


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 19, 2024 02:00 PM UTC

Quiz: Creating Great README Files for Your Python Projects

Test your understanding of how a great README file can make your Python project stand out and how to create your own README files.

Take this quiz after reading our Creating Great README Files for Your Python Projects tutorial.


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 19, 2024 12:00 PM UTC


PyCharm

How to Move From pandas to Polars

This is a guest post from Cheuk Ting Ho, a data scientist who contributes to multiple open-source libraries, such as pandas and Polars.

How to Move From pandas to Polars banners

You’ve probably heard about Polars – it is now firmly in the spotlight in the data science community. 

Are you still using pandas and would like to try out Polars? Are you worried that it will take a lot of effort to migrate your projects from pandas to Polars? You might be concerned that Polars won’t be compatible with your existing pipeline or the other tools you are currently using.

Fear not! In this article, I will answer these questions so you can decide whether to migrate to using Polars or not. I will also provide some tips for those of you who have already decided to migrate.

How is Polars different from pandas?

Polars is known for its speed and security, as it is written in Rust and based on Apache Arrow. For details about Polars vs. pandas, you can see our other blog post here. In short, while Polars’ backend architecture is different from pandas’, the creator and community around Polars have tried to maintain a Python API that is very similar to pandas’. At first glance, Polars code is very similar to pandas code. Fun fact – some contributors to pandas are also contributors to Polars. Due to this, the barrier for pandas users to start using Polars is relatively low. However, as it is still a different library, it is worth double-checking the differences between the two.

Advantages of using Polars

Have you struggled when using pandas for a relatively large data set? Do you think pandas is using too much RAM and slowing your computer down while working locally? Polars may solve this problem by using its lazy API. Intermediate steps won’t be executed unless needed, saving memory for the intermediate steps in some cases.

Another advantage Polars has is that, since it is written in Rust, it can make use of concurrency much better than pandas. Python is traditionally single-threaded, and although pandas uses the NumPy backend to speed up some operations, it is still mainly written in Python and has certain limitations in its multithreading capabilities.

Tools that make the switch easy

As Polars’ popularity grows, there is more and more support for Polars in popular tools for data scientists, including scikit-learn and HoloViz.

PyCharm, the most popular IDE used by data scientists, provides a similar experience when you work with pandas and Polars. This makes the process of migration smoother. For example, interactive tables allow you to easily see the information about your DataFrame, such as the number of rows and columns.

Try PyCharm for free

PyCharm interactive tables

PyCharm has an excellent pagination feature – if you want to see more results per page, you can easily configure that via a drop-down menu:

Rows

You can see the statistical summary for the data when you hover the cursor over the column name:

Statistical summary

You can also sort the data for inspection with a few clicks in the header. You can also use the multi-sorting functionality – after sorting the table once, press and hold (macOS) or Alt (Windows) and click on the second column you want the table to be sorted by. For example, here, we can sort by island and bill_length_mm in the table.

To get more insights from the DataFrame, you can switch to chat view with the icon on the left:

DataDrame chat view

You can also change how the data is shown in the settings, showing different columns and using different graph types:

Graph types

It also helps you to auto-complete methods when using Polars, very handy when you are starting to use Polars and not familiar with all of the methods that it provides. To understand more about full line code completion in JetBrains IDEs, please check out this article

Polars autocompletion

You can also access the official documentation quickly by clicking the Polars icon in the top-right corner of the table, which is really handy.

How to migrate from pandas to Polars

If you’re now convinced to migrate to Polars, your final questions might be about the extent of changes needed for your existing code and how easy it is to learn Polars, especially considering your years of experience and muscle memory with pandas.

Similarities between pandas and Polars

Polars provides APIs similar to pandas, most notably the read_csv(), head(), tail(), and describe() for a glance at what the data looks like. It also provides similar data manipulation functions like join() and groupby()/ group_by(), and aggregation functions like mean() and sum().

Before going into the migration, let’s look at these code examples in Polars and pandas.

Example 1 – Calculating the mean score for each class

pandas

import pandas as pd

df_student = pd.read_csv("student_info.csv")

print(df_student.dtypes)

df_score = pd.read_csv("student_score.csv")

print(df_score.head())

df_class = df_student.join(df_score.set_index("name"), on="name").drop("name", axis=1)

df_mean_score = df_class.groupby("class").mean()

print(df_mean_score)

Polars

import polars as pl

df_student = pl.read_csv("student_info.csv")

print(df_student.dtypes)

df_score = pl.read_csv("student_score.csv")

print(df_score.head())

df_class = df_student.join(df_score, on="name").drop("name")

df_mean_score = df_class.group_by("class").mean()

print(df_mean_score)

Polars provides similar io methods like read_csv. You can also inspect the dtypes, do data cleaning with drop, and do groupby with aggregation functions like mean.

Example 2 – Calculating the rolling mean of temperatures

pandas

import pandas as pd

df_temp = pd.read_csv("temp_record.csv", index_col="date", parse_dates=True, dtype={"temp":int})

print(df_temp.dtypes)

print(df_temp.head())

df_temp.rolling(2).mean()

Polars

import polars as pl

df_temp = pl.read_csv("temp_record.csv", try_parse_dates=True, dtypes={"temp":int}).set_sorted("date")

print(df_temp.dtypes)

print(df_temp.head())

df_temp.rolling("date", period="2d").agg(pl.mean("temp"))

Reading with date as index in Polars can also be done with read_csv, with a slight difference in the function arguments. Rolling mean (or other types of aggregation) can also be done in Polars.

As you can see, these code examples are very similar, with only slight differences. If you are an experienced pandas user, I am sure your journey using Polars will be quite smooth.

Tips for migrating from pandas to Polars

As for code that was previously written in pandas, how can you migrate it to Polars? What are the differences in syntax that may trip you up? Here are some tips that may be useful:

Selecting and filtering

In pandas, we use .loc / .iloc and [] to select part of the data in a data frame. However, in Polars, we use .select to do so. For example, in pandas df["age"] or df.loc[:,"age"] becomes df.select("age") in Polars.

In pandas, we can also create a mask to filter out data. However, in Polars, we will use .filter instead. For example, in pandas df["age" > 18] becomes df.filter(pl.col("a") > 18) in Polars.

All of the code that involves selecting and filtering data needs to be rewritten accordingly.

Use .with_columns instead of .assign

A slight difference between pandas and Polars is that, in pandas we use .assign to create new columns by applying certain logic and operations to existing columns. In Polars, this is done with .with_columns. For example:

In pandas

df_rec.assign(

    diameter = lambda df: (df.x + df.y) * 2,

    area = lambda df: df.x * df.y

)

becomes

df_rec.with_columns(

    diameter = (pl.col("x") + pl.col("y")) * 2,

    area = pl.col("x") * pl.col("y")

)

in Polars.

.with_columns can replace groupby

In addition to assigning a new column with simple logic and operations, .with_columns offers more advanced capabilities. With a little trick, you can perform operations similar to groupby in pandas by using window functions:

In pandas

df = pd.DataFrame({

    "class": ["a", "a", "a", "b", "b", "b", "b"],

    "score": ["80", "39", "67", "28", "77", "90", "44"],

})

df["avg_score"] = df.groupby("class")["score"].transform("mean")

becomes

df.with_columns(

    pl.col("score").mean().over("class").alias("avg_score")

)

in Polars.

Use scan_csv instead of read_csv if you can

Although read_csv also works in Polars, by using scan_csv instead of read_csv it will turn to lazy evaluation mode and benefit from the lazy API mentioned above.

Building pipelines properly with lazy API

In pandas, we usually use .pipe to build data pipelines. However, since Polars works a bit differently, especially when using the lazy API, we want the pipeline to be executed only once. So, we need to adjust the code accordingly. For example:

Instead of this pandas code snippet:

def discount(df):

    df["30_percent_off"] = df["price"] * 0.7

    return df

def vat(df):

    df["vat"] = df["price"] * 0.2

    return df

def total_cost(df):

    df["total"] = df["30_percent_off"] + df["vat"]

    return df

(df

 .pipe(discount)

 .pipe(vat)

 .pipe(total_cost)

)

We will have the following one in Polars:

def discount(input_col)r:

    return pl.col(input_col).mul(0.7).alias("70_percent_off")

def vat(input_col):

    return pl.col(input_col).mul(0.2).alias("vat")

def total_cost(input_col1, input_col2):

    return pl.col(input_col1).add(pl.col(input_col2).alias("total")

df.with_columns(

    discount("price"),

    val("price"),

    total_cost("30_percent_off", "vat"),

)

Missing data: No more NaN

Do you find NaN in pandas confusing? There is no NaN in Polars! Since NaN is an object in NumPy and Polars doesn’t use NumPy as the backend, all missing data will now be null instead. For details about null and NaN in Polars, check out the documentation.

Exploratory data analysis with Polars

Polars provides a similar API to pandas, and with hvPlot, you can easily create a simple plotting function with exploratory data analysis in Polars. Here I will show two examples, one creating simple statistical information from your data set, and the other plotting simple graphs to understand the data.

Summary statistics from dataset

When using pandas, the most common way to get a summary statistic is to use describe. In Polars, we can also use describe in a similar manner. For example, we have a DataFrame with some numerical data and missing data:

DataFrame with numerical data

We can use describe to get summary statistics:

Describe in pandas

Notice how object types are treated – in this example, the column name gives a different result compared to pandas. In pandas, a column with object type will result in categorical data like this:

Cathegorical data

In Polars, the result is similar to numeric data, which makes less sense:

Polars numeric data

Simple plotting with Polars DataFrame


To better visualize of the data, we might want to plot some graphs to help us evaluate the data more efficiently. Here is how to do so with the plot method in Polars.

First of all, since Polars uses hvPlot as backend, make sure that it is installed. You can find the hvPlot User Guide here. Next, since hvPlot will output the graph as an interactive Bokeh graph, we need to use output_notebook from bokeh.plotting to make sure it will show inline in the notebook. Add this code at the top of your notebook:

from bokeh.plotting import output_notebook

output_notebook()

Also, make sure your notebook is trusted. This is done by simply checking the checkbox in the top-right of the display when using PyCharm.

Trusted notebook in PyCharm

Next, you can use the plot method in Polars. For example, to make a scatter plot, you have to specify the columns to be used as the x- and y-axis, and you can also specify the column to be used as color of the points:

df.plot.scatter(x="body_mass_g", y="bill_length_mm", color="species")

This will give you a nice plot of the different data points of different penguin species for inspection:

Of course, scatter plots aren’t your only option. In Polars, you can use similar steps to create any type of plot that is supported by hvPlot. For example, hist can be done like this:

df.plot.hist("body_mass_g", by=["species","sex"])
Hvplot

For a full list of plot types supported by hvPlot, you can have a look at the hvPlot reference gallery.

Conclusion

I hope the information provided here will help you on your way with using Polars. Polars is an open-source project that is actively maintained and developed. If you have suggestions or questions, I recommend reaching out to the Polars community.

About the author

Cheuk Ting Ho

Cheuk has been a Data Scientist at various companies – a job that demands high numerical and programming skills, especially in Python. Following her passion for the tech community, Cheuk has been a Developer Advocate for three years. She also contributes to multiple open-source libraries like Hypothesis, Pytest, pandas, Polars, PyO3, Jupyter Notebook, and Django. Cheuk is currently a consultant and trainer at CMD Limes.

June 19, 2024 11:48 AM UTC


EuroPython

EuroPython June 2024 Newsletter

🐍 EuroPython 2024: The Ultimate Python Party Awaits! 🎉
EuroPython June 2024 Newsletter


Hello Pythonistas,

Get ready to code, connect, and celebrate at EuroPython 2024! We’re thrilled to bring you an unforgettable conference experience filled with enlightening talks, engaging workshops, and a whole lot of fun. Whether you&aposre a seasoned developer or just starting your Python journey, there&aposs something for everyone. Let&aposs dive into the details!

⏰ THREE DAYS LEFT TO BUY YOUR TICKETS!! 🎟️

Don&apost miss out on the Python event of the year! Secure your spot today and be part of the magic.

🎟️ Buy your tickets here!!! 🎟️

The Late Bird prices kick in this Saturday (June 22nd).

SCHEDULE 📅

The schedule is OUT! Check out all the awesome stuff we have planned for you in Prague this year.

🎤 Keynote Speakers

We&aposre excited to announce our stellar lineup of keynote speakers who will inspire and challenge you with their insights and experiences:

🥳 Social Events

🍽  PyLadies Lunch

EuroPython June 2024 NewsletterPyLadies Lunch at EuroPython 2023 at the Prague Conference Centre

On Thursday, 11th July 2024 12:30 to 14:00. Join us for a special lunch event aimed at fostering community and empowerment among women in tech.

Thank you to our sponsor 🐙 Kraken Tech 🐙 for supporting the lunch event.

We’re excited to announce a range of events for underrepresented groups in computing this year! 🎉 Whether you’re new to PyLadies or a long-time supporter, we warmly welcome you to join us and be part of our supportive community.

Sign up for any of the sessions above here

🌍 Community Organiser&aposs Lunch

On Friday (July 12th) at 1 pm. A great opportunity for community leaders to network and discuss strategies for success. This lunch will include an Open Space discussion about Python Organizations and how we deal with challenges.

Sign up for the Community Organiser’s session here

👩‍💻 Learn to Build Websites With Django Girls

Are you interested in learning how to build websites and belong to an underrepresented group in computing? Join us for a one-day workshop!

No prior programming experience is required. The workshop is open to everyone, regardless of participation in EuroPython. For more information, click here

👶 Childcare

This year, we&aposre once again partnering with Susie&aposs Babysitting Prague to offer childcare at the main conference venue (Prague Conference Centre).

If you&aposre interested, please let us know at the latest two weeks before the event by filling out this form.

You will be asked about the Childcare add-on when you buy your ticket.

💻 Sprint Weekend

EuroPython 2024 Sprints will be during the weekend of the 13th and 14th of July. This year, the event will happen at a different venue from the conference and it will be free for anyone with a conference ticket to join!

As per our tradition, the EuroPython will provide the rooms and facilities but the sprints are organised by YOU. It is a great chance to contribute to open-source projects large and small, learn from each other, geek out and have fun. 🐍

Lunch and coffee will be provided.

🤭 Py.Jokes

~ pyjoke
There are only two hard problems in Computer Science: cache invalidation, naming things and off-by-one-errors.

📱 Stay Connected

Share your EuroPython experience on social media!

Use the hashtag #EuroPython2024 and follow us on:

With so much joy and excitement,

EuroPython 2024 Team 🤗

June 19, 2024 10:13 AM UTC

June 18, 2024


PyCoder’s Weekly

Issue #634 (June 18, 2024)

#634 – JUNE 18, 2024
View in Browser »

The PyCoder’s Weekly Logo


Should Python Adopt Calendar Versioning?

Python’s use of semantic style versioning numbers causes confusion, as breaking changes can be present in the “minor” number position. This proposal given at the Python Language Summit is to switch to using calendar based versioning. A PEP is forthcoming.
PYTHON SOFTWARE FOUNDATION

Python Mappings: A Comprehensive Guide

In this tutorial, you’ll learn the basic characteristics and operations of Python mappings. You’ll explore the abstract base classes Mapping and MutableMapping and create a custom mapping.
REAL PYTHON

How Do You Turn Data Science Insights into Business Results? Posit Connect

alt

Data scientists use Posit Connect to get their work into the hands of decision-makers. Securely deploy python analytic work & distribute that across teams. Publish data apps, documents, notebooks, and dashboards. Deploy models as APIs & configure reports to run & get distributed on a custom schedule →
POSIT sponsor

NumPy 2.0.0 Release Notes

The long awaited 2.0 release of NumPy landed this week. Not all the docs are up to date yet, but this final draft of the release notes shows you what is included.
NUMPY.ORG

Quiz: Python Mappings

In this quiz, you’ll test your understanding of the basic characteristics and operations of Python mappings. By working through this quiz, you’ll revisit the key concepts and techniques of creating a custom mapping.
REAL PYTHON

Discussions

Personal Red Flags When You’re Interviewing at a Company?

HACKER NEWS

Articles & Tutorials

Proposed Bylaws Changes for the PSF

As part of the upcoming board election, three new bylaws are also being proposed for your consideration. The first makes it easier to qualify for membership for Python-related volunteer work, the second makes it easier to vote, and the third gives the board more options around the code of conduct.
PYTHON SOFTWARE FOUNDATION

Python Interfaces: Object-Oriented Design Principles

In this video course, you’ll explore how to use a Python interface. You’ll come to understand why interfaces are so useful and learn how to implement formal and informal interfaces in Python. You’ll also examine the differences between Python interfaces and those in other programming languages.
REAL PYTHON course

Prod Alerts? You Should be Autoscaling

alt

Rest easy with Judoscale’s web & worker autoscaling for Heroku, Render, and Amazon ECS. Traffic spike? Scaled up. Quiet night? Scaled down. Work queue backlog? No problem →
JUDOSCALE sponsor

Listing All Files in a Directory With Python

In this video course, you’ll be examining a couple of methods to get a list of files and folders in a directory with Python. You’ll also use both methods to recursively list directory contents. Finally, you’ll examine a situation that pits one method against the other.
REAL PYTHON course

Python Logging: The Log Levels

Logging levels allow you to control which messages you record in your logs. Think of log levels as verbosity levels. How granular do you want your logs to be? This article teaches you how to control your logging.
MIKE DRISCOLL

How Do You Program for 8h in a Row?

You may get paid for 8 hours a day, but that doesn’t necessarily mean you’re coding that whole time. This article touches on the variety of the job and what you should expect if you are new to the field.
BITE CODE!

Python Language Summit 2024: Lightning Talks

A summary of the six lightning talks given at the 2024 Python Language Summit. Topics include Python for iOS, improving asserts in 3.14, sharing data between sub-interpreters, and more.
PYTHON SOFTWARE FOUNDATION

Starting and Stopping uvicorn in the Background

Learn how to start and stop uvicorn in the background using a randomly selected free port number. Useful for running test suites that require live-webservers.
CHRISTOPH SCHIESSL • Shared by Christoph Schiessl

How I Built a Bot Publishing Italian Paintings on Bluesky

This article describes Nicolò’s project to build a bot that retrieves images from Wikimedia, selecting the best ones, and deploying it to the cloud.
NICOLÒ GISO • Shared by Nicolò Giso

Testing async MongoDB AWS Applications With Pytest

This article shows real life techniques and fixtures needed to make the test suite of your MongoDB and AWS-based application usable and performant.
HANDMADESOFTWARE • Shared by Thorin Schiffer

DjangoCon Europe 2024 Bird’s-Eye View

Thibaud shares some of the best moments of DjangoConEU 2024. He highlights some of the talks, workshops, and the outcome of the sprints.
THIBAUD COLAS

Storing Django Static and Media Files on DigitalOcean Spaces

This tutorial shows how to configure Django to load and serve up static and media files, public and private, via DigitalOcean Spaces.
MICHAEL HERMAN

CPython Reference Counting and Garbage Collection Internals

A detailed code walkthrough of how CPython implements memory management, including reference counting and garbage collection
ABHINAV UPADHYAY

The Decline of the User Interface

“Software has never looked cooler, but user interface design and user experience have taken a sharp turn for the worse.”
NICK HODGES

Ruff: Internals of a Rust-Backed Python Linter-Formatter

This article dives into the structure of the popular ruff Python linter written in Rust.
ABDUR-RAHMAAN JANHANGEER • Shared by Abdur-Rahmaan Janhangeer

Projects & Code

FinanceDatabase: Financial Database as a Python Module

GITHUB.COM/JERBOUMA

prettypretty: Build Awesome Terminal User Interfaces

GITHUB.COM/APPAREBIT

smbclient-ng: Interact With SMB Shares

GITHUB.COM/P0DALIRIUS

wakepy: Cross-Platform Keep-Awake With Python

GITHUB.COM/FOHRLOOP

django-mfa2: Django MFA; Supports TOTP, U2F, & More

GITHUB.COM/MKALIOBY

Events

Weekly Real Python Office Hours Q&A (Virtual)

June 19, 2024
REALPYTHON.COM

Wagtail Space US

June 20 to June 23, 2024
WAGTAIL.SPACE

PyData Bristol Meetup

June 20, 2024
MEETUP.COM

PyLadies Dublin

June 20, 2024
PYLADIES.COM

Chattanooga Python User Group

June 21 to June 22, 2024
MEETUP.COM

PyCamp Leipzig 2024

June 22 to June 24, 2024
BARCAMPS.EU


Happy Pythoning!
This was PyCoder’s Weekly Issue #634.
View in Browser »

alt

[ Subscribe to 🐍 PyCoder’s Weekly 💌 – Get the best Python news, articles, and tutorials delivered to your inbox once a week >> Click here to learn more ]

June 18, 2024 07:30 PM UTC


PyBites

Learn Python From Scratch: We Extended Our Newbie Bite Exercises From 25 to 50 🐍 📈

We are excited to announce that we’ve extended our Newbie Bites from 25 to 50 Python exercises! 🎉

The importance of exercising when learning how to code 💡

We’re passionate about this new batch of exercises because they require active engagement, which is crucial for learning how to code. Passive methods like reading books or watching videos don’t help concepts click or stick.

Our exercises involve writing code that is validated with pytest. This immediate feedback helps you understand mistakes and learn more effectively. You’ll encounter errors, re-think your approach, and practice deliberately.

Why double the number of Newbie exercises? 🐍

The first 25 exercises taught the fundamentals well, but many found it hard to tackle the intro and beginner Bites afterward. The extra 25 Newbie Bites (#26-50) bridge the gap between complete beginners and intermediate Python programmers. 💪

These new exercises cover essential concepts like error handling, type hints, default arguments, special characters, working with dates, classes (!), list comprehensions, constants, exception handling, and more. 💡

We believe these challenges will provide a deeper understanding and more robust skill set to tackle the regular Bites and become more proficient in Python. 📈

Overview of the new exercises

Reading Errors: Learn to read and understand error messages in Python. These messages provide valuable information for debugging and fixing issues efficiently.

Failing Tests: Practice reading and interpreting failing test outputs with the pytest framework. This skill is crucial for resolving Bites and any Python development.

Type Hints: Explore type hints introduced in Python 3.5, which help you write more readable and maintainable code by specifying expected data types.

Default Arguments: Understand how to define functions with default values, making your functions more flexible and easier to use.

Special Chars: Learn about special characters in Python strings, such as \n and \t, for better formatting and readability.

Word Count: Use string methods like .split() and .splitlines() to manipulate and process text data effectively.

Dict Retrieval – Part 2: Explore advanced techniques for retrieving values from dictionaries to enhance your data handling skills.

Dict Retrieval – Part 3: Learn safer methods to retrieve values from dictionaries, providing defaults if keys are not present.

Random Module: Use Python’s random module to write a number guessing game, showcasing the practical use of standard library modules.

Working With Dates – Part 1: Explore the datetime module, focusing on the date class and the .weekday() method to work with dates.

Working With Dates – Part 2: Continue working with the datetime module, focusing on importing specific objects versus entire modules.

Make a Class: Learn about Python classes, which serve as blueprints for creating objects, starting from the basics.

Class With Str: Build upon the previous exercise by learning special methods like __str__ for adding string representations to classes.

Make Dataclass: Simplify class creation with Python dataclasses, introduced in Python 3.7.

Scope: Understand variable scope to write clearer and less error-prone code. Scope determines the visibility and lifespan of variables.

String Manipulations: Practice fundamental string manipulations, essential for processing and transforming text data.

List Comprehension: Learn to write concise and efficient list comprehensions, a powerful feature in Python for creating new lists.

Named Tuple: Explore namedtuples, which allow attribute access to elements and support type hints, enhancing your data handling capabilities.

Constants: Learn to assign and use constants, which are fixed values that improve code readability and maintainability.

Exceptions: Master exception handling to write resilient code and establish clear boundaries for function callers.

For Loop With Break And Continue: Control loop flow using break and continue statements to manage iterations based on conditions.

In Operator: Use the in operator to check for item presence in collections, a common practice in Python programming.

String Module: Combine list comprehensions with the string module to check and manipulate characters, then join them back into strings.

Formatting Intro: Learn string interpolation using the .format() method to insert variables into strings dynamically.

Formatting Intro: Learn string interpolation using the .format() method to insert variables into strings dynamically.


The exercises build upon each other, so they must be done in order. After completing them, you’ll earn your Pybites Newbie Certificate.

We’re also working on a series of screencasts to further explain the second half of the Newbie Bites. Stay tuned by subscribing to our YouTube channel.

About the Pybites Platform

We offer a rich platform for learning Python through practical exercises.

Our Bite exercises are designed to challenge you and help you apply Python in real-world scenarios, ranging from basic syntax to advanced topics.

Whether you’re a beginner or an experienced coder, our exercises aim to improve your coding skills in an effective and fun manner.

Start coding today: https://codechalleng.es/

And join our growing community of passionate Pythonistas: https://pybites.circle.so/

June 18, 2024 06:42 PM UTC


Real Python

Rounding Numbers in Python

With many businesses turning to Python’s powerful data science ecosystem to analyze their data, understanding how to avoid introducing bias into datasets is absolutely vital. If you’ve studied some statistics, then you’re probably familiar with terms like reporting bias, selection bias, and sampling bias. There’s another type of bias that plays an important role when you’re dealing with numeric data: rounding bias.

Understanding how rounding works in Python can help you avoid biasing your dataset. This is an important skill. After all, drawing conclusions from biased data can lead to costly mistakes.

In this video course, you’ll learn:


[ Improve Your Python With 🐍 Python Tricks 💌 – Get a short & sweet Python Trick delivered to your inbox every couple of days. >> Click here to learn more and see examples ]

June 18, 2024 02:00 PM UTC


Mike Driscoll

How to Publish a Python Package to PyPI

Do you have a Python package that you’d like to share with the world? You should publish it on the Python Package Index (PyPI). The vast majority of Python packages are published there. The PyPI team has also created extensive documentation to help you on your packaging journey. This article does not aim to replace that documentation. Instead, it is just a shorter version of it using ObjectListView as an example.

The ObjectListView project for Python is based on a C# wrapper .NET ListView but for wxPython. You use ObjectListView as a replacement for wx.ListCtrl because its methods and attributes are simpler. Unfortunately, the original implementation died out in 2015 while a fork, ObjectListView2 died in 2019. For this article, you will learn how I forked it again and created ObjectListView3 and packaged it up for PyPI.

Creating a Package Structure

When you create a Python package, you must follow a certain type of directory structure to build everything correctly. For example, your package files should go inside a folder named src. Within that folder, you will have your package sub-folder that contains something like an __init__.py and your other Python files.

The src folder’s contents will look something like this:

package_tutorial/
| -- src/
|    -- your_amazing_package/
|        -- __init__.py
|        -- example.py

The __init__.py file can be empty. You use that file to tell Python that the folder is a package and is importable. But wait! There’s more to creating a package than just your Python files!

You also should include the following:

Go ahead and create these files and the tests folder. It’s okay if they are all blank right now. At this point, your folder structure will look like this:

package_tutorial/
| -- LICENSE
| -- pyproject.toml
| -- README.md
| -- src/
|    -- your_amazing_package/
|        -- __init__.py
|        -- example.py
| -- tests/

Picking a Build Backend

The packaging tutorial mentions that you can choose various build backends for when you create your package. There are examples for the following:

You add this build information in your pyproject.toml file. Here is an example you might use if you picked setuptools:

[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"

This section in your config is used by pip or build. These tools don’t actually do the heavy lifting of converting your source code into a wheel or other distribution package. That is handled by the build backend. You don’t have to add this section to your pyproject.toml file though. You will find that pip will default to setuptools if there isn’t anything listed.

Configuring Metadata

All the metadata for your package should go into a pyproject.toml file. In the case of ObjectListView, it didn’t have one of these files at all.

Here’s the generic example that the Packaging documentation gives:

[project]
name = "example_package_YOUR_USERNAME_HERE"
version = "0.0.1"
authors = [
  { name="Example Author", email="author@example.com" },
]
description = "A small example package"
readme = "README.md"
requires-python = ">=3.8"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
]

[project.urls]
Homepage = "https://github.com/pypa/sampleproject"
Issues = "https://github.com/pypa/sampleproject/issues"

Using this as a template, I created the following for the ObjectListView package:

[project]
name = "ObjectListView3"
version = "1.3.4"
authors = [
  { name="Mike Driscoll", email="mike@somewhere.org" },
]
description = "An ObjectListView is a wrapper around the wx.ListCtrl that makes the list control easier to use."
readme = "README.md"
requires-python = ">=3.9"
classifiers = [
    "Programming Language :: Python :: 3",
    "License :: OSI Approved :: MIT License",
    "Operating System :: OS Independent",
]

[project.urls]
Homepage = "https://github.com/driscollis/ObjectListView3"
Issues = "https://github.com/driscollis/ObjectListView3/issues"

Now let’s go over what the various parts are in the above metadata:

You can specify other information in your TOML file if you’d like. For full details, see the pyproject.toml guide.

READMEs and Licenses

The README file is almost always a Markdown file now. Take a look at other popular Python packages to see what you should include. Here are some recommended items:

There are many licenses out there. Don’t take advice from anyone; buy a lawyer on that unless you know a lot about this topic. However, you can look at the licenses for other popular packages and use their license or one similar.

Generating a Package

Now you have the files you need, you are ready to generate your package. You will need to make sure you have PyPA’s build installed.

Here’s the command you’ll need for that:

python3 -m pip install --upgrade build

Next, you’ll want to run the following command from the same directory that your pyproject.toml file is in:

python3 -m build

You’ll see a bunch of output from that command. When it’s done, you will have a dist A folder with a wheel (*.whl) file and a gzipped tarball inside it is called a built distribution. The tarball is a source distribution , while the wheel file is called a built distribution. When you use pip, it will try to find the built distribution first, but pip will fall back to the tarball if necessary.

Uploading / Publishing to PyPI

You now have the files you need to publish to share your package with the world on the Python Package Index (PyPI). However, you need to register an account on TestPyPI first. TestPyPI  is a separate package index intended for testing and experimentation, which is perfect when you have never published a package before. To register an account, go to https://test.pypi.org/account/register/ and complete the steps on that page. It will require you to verify your email address, but other than that, it’s a very straightforward signup.

To securely upload your project, you’ll need a PyPI API token. Create one from your account and make sure to set the “Scope” to the “Entire account”. Don’t close the page until you have copied and saved your token or you’ll need to generate another one!

The last step before publishing is to install twine, a tool for uploading packages to PyPI. Here’s the command you’ll need to run in your terminal to get twine:

python3 -m pip install --upgrade twine

Once that has finished installing, you can run twine to upload your files. Make sure you run this command in your package folder where the new dist folder is:

python3 -m twine upload --repository testpypi dist/*

You will see a prompt for your TestPyPI username and/or a password. The password is the API token you saved earlier. The directions in the documentation state that you should use __token__ as your username, but I don’t think it even asked for a username when I ran this command. I believe it only needed the API token itself.

After the command is complete, you will see some text stating which files were uploaded. You can view your package at https://test.pypi.org/project/example_package_name

To verify you can install your new package, run this command:

python3 -m pip install --index-url https://test.pypi.org/simple/ --no-deps example-package-name

If you specify the correct name, the package should be downloaded and installed. TestPyPI is not meant for permanent storage, though, so it will likely delete your package eventually to conserve space.

When you have thoroughly tested your package, you’ll want to upload it to the real PyPI site. Here are the steps you’ll need to follow:

That’s it! You’ve just published your first package!

Wrapping Up

Creating a Python package takes time and thought. You want your package to be easy to install and easy to understand. Be sure to spend enough time on your README and other documentation to make using your package easy and fun. It’s not good to look up a package and not know which versions of Python it supports or how to install it. Make the process easy by uploading your package to the Python Package Index.

The post How to Publish a Python Package to PyPI appeared first on Mouse Vs Python.

June 18, 2024 01:01 PM UTC