Google Summer of Code is coming to an end. I've spent the summer working on optimizing the VOC compiler, and I’m super excited to share the results.
There are a couple of ways to evaluate the performance improvement from my project.
Firstly, we introduced a microbenchmarking suite. Each microbenchmark is a small piece of Python code that tests a single and specific Python construct, or datatype, or control flow. The benchmarking infrastructure itself is crude (essentially it just tells you the total amount of processor time it took to run, with no fancy statistics) but it has been extremely useful to me while working on performance features to verify performance gain.
The idea is that the benchmarking suite is not to be run as part of the full test suite, but rather as needed and manually whenever an optimization is implemented. It also provides a way to check and prevent performance regression, especially on the "optimized" parts of VOC. While it doesn't really make sense to record specific numbers, as they will always vary from machine to machine, it should be reasonably easy to compare two versions of VOC. Benchmark numbers are included on each optimization-related PR I've worked on this summer (see PR log below), and I hope that more benchmarks will be added as more performance efforts are carried out in the future.
May 10th, 2018:
$ python setup.py test -s tests.test_pystone test_pystone (tests.test_pystone.PystoneTest) ... Pystone(1.2) time for 50000 passes = 101.833 This machine benchmarks at 490.998 pystones/second
$ python setup.py test -s tests.test_pystone test_pystone (tests.test_pystone.PystoneTest) ... Pystone(1.2) time for 50000 passes = 101.298 This machine benchmarks at 493.595 pystones/second
$ python setup.py test -s tests.test_pystone test_pystone (tests.test_pystone.PystoneTest) ... Pystone(1.2) time for 50000 passes = 102.247 This machine benchmarks at 489.014 pystones/second
On current master (Aug 14th, 2018):
$ python setup.py test -s tests.test_pystone test_pystone (tests.test_pystone.PystoneTest) ... Pystone(1.2) time for 50000 passes = 11.2300 This machine benchmarks at 4452.37 pystones/second
$ python setup.py test -s tests.test_pystone test_pystone (tests.test_pystone.PystoneTest) ... Pystone(1.2) time for 50000 passes = 10.9833 This machine benchmarks at 4552.36 pystones/second
$ python setup.py test -s tests.test_pystone pystone (tests.test_pystone.PystoneTest) ... Pystone(1.2) time for 50000 passes = 10.9498 This machine benchmarks at 4566.29 pystones/second
Some things that I learned about VOC while working on this project:
1. Object creation in the JVM is expensive. This definitely does not mean that the VOC user writing Python should think about minimizing the number of objects that she creates, but rather that any time we can non-trivially reduce the number of objects created during bytecode transpilation or in VOC-defined function calls, we can expect to see a huge performance boost. Integer and boolean preallocation, which is about reusing objects that have already been created, was one of the most significant improvements we made this summer.
2. Method calls in VOC are expensive. This is essentially due to the process of invoking a callable: you have to check that the method is defined on the object, then construct it (read: object creation!), and check the arguments, before it can actually be called. (This is done using reflection, which is super interesting and confusing in itself.) And this is the reason why refactoring the Python comparison functions made such a big performance impact, because we were able to circumvent this process.
3. Exception-heavy code is expensive. Again, this is not to say that the programmer is on the hook for being frugal when throwing exceptions, but that VOC benefits greatly by avoiding the use of exceptions internally except when strictly necessary. For instance, Python uses StopIteration exceptions to signal the end of a for loop, and they quickly rack up when you have nested loops (everything is ultimately related to object creation!). That was the motivation for the nested loops optimization.
If I may be a bit more reflective here, one of the a-ha! moments I had this summer was realizing that to really optimize something, you have to understand where its biggest problems are first. I remember pitching to Russ at the start of the summer things like loop unrolling, constant folding, even converting to SSA-form (you know, stuff I heard about optimzation in my compilers class) and he was saying to me, think simpler. While working on my project, I used a profiler to understand exactly which parts of VOC were slow, and that information drove the changes we implemented. I think it worked out pretty well!
- Minimize boxing of primitive types like String and Int. As VOC is written half in Python, half in Java, a single integer can be found in various representations on its way through the compiler -- as a Python object, unboxed to a primitive Java int, then packaged back up to a Python object. This problem was (somewhat incoherently) addressed in my proposal, but ultimately we couldn't come up with a good abstraction to support it.
- Build a peephole optimizer. CPython's peephole optimizer scans generated bytecode to identify sequences of bytecode that can be optimized, VOC could benefit from this too.
- Hook up more benchmarks, which serve as both proof of the kinds of programs VOC can currently compile and areas ripe for performance improvement.
I will wrap this up by giving big thanks to Russ, my mentor. The time you spent helping me form my ideas, patiently answering my questions and reviewing my work was invaluable to me. It couldn't have been easy keeping up with what I was doing especially since I started improvising halfway through the summer. I am so grateful for your help, thank you.
PR Log (in chronological order)
Bug Fixes and Miscellaneous
- Fix Method repr
- Fix custom substitutions
- Fix List Bug
- Remove Unnecessary Instruction
- Fix __setitem__ error messages
- Fix contains/not contains bugs and refactor
- Add tests for problematic exception raising
- Add test for with + exception combo
- Add test for wrong iter error message
- Add tests for globals bug
- Remove unnecessary type casts and clean up
- Add test for problematic builtin function call
- Introduce org/python/Object type tests
In the blink of an eye, Google Summer of Code (GSoC) 2018 has come to an end. During the three months long coding period, I have contributed several patches in VOC repository of BeeWare, all working towards the ultimate end goal of running asyncio module in VOC. In this blog post (which is my first actual blog post by the way 😄), I will document what I have done so far, why I couldn't make it to the end goal (yea, unfortunately I couldn't get asyncio to work at the end of GSoC 2018), and what's left that needs to be done in order to achieve the end goal (or at least make part of asyncio work).
The first error that the transpiler throws when attempting to compile asyncio module was "No handler for YieldFrom", so it makes sense to start from this issue first.
Another feature related to generator was Yield expression. Before GSoC 2018, Yield statement in VOC was just a statement, meaning yield could not be used as expression. Generator methods such as generator.send, generator.throw and generator.close were not supported as well. Those features are what make asynchronous programming with generator possible, so I spent a few weeks to extend generator functionality in VOC, laying down the path to asyncio module.
PRs related to generator are listed below:
Nonlocal statement was another syntax not supported by VOC. After completion of generator's features, implementing this is the next step towards compiling asyncio module.
Implementing this feature took about 3 ~ 4 weeks as this is not as trivial as it seems. I took several approaches on this, while some of them do work, the code is not pretty and hacky, which could come back to bite me/other contributors in the long run. After many discussions with Russell, I refactored the closure mechanism in VOC and took a much cleaner approach in nonlocal implementations. I must admit that I took some short-cuts for the sake of "making nonlocal works" in the process of implementing nonlocal statement, resulting in poor design and messy codes. Many thanks to Russell, who helped me to improve my coding style and told me not to be discouraged when I'm stuck. 😄
The Collections Module
Next item on my hit list was pure Java implementations of the collections module. asyncio module depends on 3 data structures from collections, namely defauldict, Deque and OrderedDict. Two of them (defaultdict and Deque) are implemented in C in CPython, plus they have good analog in Java, so it makes senses to implement the module in Java. Porting defauldict, Deque and OrderedDict to Java in VOC is relatively straight-forward, taking about 1.5 weeks to complete.
Other PRs submitted during GSoC 2018
- PR #817 : Added coroutine related exception class [WIP] (closed due to not needed)
- PR #836 : Changed Bool construction to use getBool instead (merged)
- PR #847 : Add custom exceptions test cases (closed due to more comprehensive handling in PR #844)
- PR #849 : Fixed Unknown constant type <class 'frozenset'> in function definition (merged)
- PR #858 : Added test case for Issue #857 (merged)
- PR #860 : Added test case for Issue #859 (merged)
- PR #862 : Added test case for Issue #861 (merged)
- PR #867 : Fixed Issue #866 RunTimeError when generator is nested in more than 1 level of function definition (merged)
- PR #868 : Fixed Issue #861 Redefining nested function from other function overrides original nested function (merged)
- PR #879 : Fixed Incompatible Stack Height caused by expression statement (merged)
- PR #901 : Added test case for Issue #900 (merged)
- PR #788 : Implements asyncio.coroutines [WIP] (open, the dream 😎)
Issues submitted during GSoC 2018
- Issue #861 : Redefining nested function from other function overrides original nested function (fixed in PR #868)
- Issue #866 : RunTimeError when generator is nested in more than 1 level of function definition (fixed in PR #867)
- Issue #828 : Finally block of generator is not executed during garbage collection (open)
- Issue #857 : Complex datatype in set cause java.lang.StackOverflowError (open)
- Issue #859 : Duplicated values of equivalent but different data types in set (open)
- Issue #900 : Exception in nested try-catch suite is 'leaked' to another enclosing try-catch suite (open)
- Issue #827 : Maps reserved Java keywords to Python built-in function/method call (closed)
Towards The Ultimate End Goal
Unfortunately, three months of GSoC coding period was not enough for me to bring asyncio module to VOC. The nonlocal statement implementation was the biggest blocker for me mainly because I didn't think thoroughly before writing code. If I were to plan carefully and lay out a general coding direction, I would've completed it in much shorter time and have time for other implementations. An advice for the aspiring and upcoming GSoC-er, don't rush your code, make sure you know 100% about what you're doing before diving into the codes.
With that said, following are the list of modules to be implemented/ported to Java before asyncio will work in VOC:
- socket module (a bit tricky since Java doesn't support Unix domain socket natively)
- selectors module (high level I/O operations)
- threading module (might be easier to implement this first since threading in Python is an emulation of Java's Thread)
- time module (partially implemented in VOC)
Huge thanks to my mentor, Russell Keith-Magee for accepting my proposal, providing guidance and encouraging me when things didn't go as intended. It is truly an honor to be a part of the BeeWare community. I had a blast contributing to BeeWare project, and I'm sure I will stick around as a regular contributor. Also shout out to the BeeWare community for answering my queries and reviewing my pull requests. 😄
This article was originally published on the BeeWare Enthusiasts mailing list. If you'd like to receive regular updates about the BeeWare project, why not subscribe?
When you're designing a GUI app - be it for desktop, mobile, or browser - one of the most fundamental tasks is describing how to lay widgets out the screen. Most widget toolkits will use a grid or box packing model of some kind to solve this problem. These models tend to be relatively easy to get started, but rapidly fall apart when you have complex layout needs - or when you have layouts that need to adapt to different screen sizes.
Instead of inventing a new grid or box model, the Toga widget toolkit takes a different approach, using a well known scheme for laying out content: Cascading Style Sheets, or CSS. Although CSS is best known for specifying layout in web pages, there's nothing inherently web specific about it. At the end of the day, it's a system for describing the layout of a hierarchical collection of content nodes. However, to date, every implementation of CSS is bound to a browser, so the perception is that CSS is a browser-specific standard.
That's where Colosseum comes in. Colosseum is a browser independent implementation of a CSS rendering engine. It takes a tree of content "nodes" - such as a DOM from a HTML document - an applies CSS styling instructions to layout those nodes as boxes on the screen. In the case of Toga, instead of laying out <div> and <span> elements, you lay out Box and Button objects. This allows you to specify incredibly complex, adaptive layouts for Toga applications.
But Colosseum as a project has many other possible uses. It could be used anywhere that there is a need for describing layout outside a browser context. For example, Colosseum could be the cornerstone of a HTML to PDF renderer that doesn't require the involvement of a browser. It could also be used as a test harness and reference implementation for the CSS specification itself, providing a lightweight way to encode and test proposed changes to the specification.
This week, we started a big project: rewriting Colosseum to be a fully standard-compliant CSS engine. The work so far can be found in the globe branch of the colosseum repository on Github. The first goal is CSS2.1 compliance, with an implementation of the traditional CSS box model and flow layout. Once we've got a reasonable implementation of that, we'll look to adding Grid and FlexBox layouts from the CSS3 specification set.
This is obviously a big job. CSS is a big specification, so there's a lot of work to be done - but that also means there's lots of places to contribute! Pick a paragraph of the CSS specification, build some test cases that demonstrate the cases described in that paragraph, and submit a patch implementing that behaviour!
It also highlights why your financial support is so important. While we could do this entirely with volunteered effort, we're going to make much faster progress if a small group of people could focus on this project full time. Financial support would allow up to significantly ramp up the development speed of Colosseum, and the rest of the BeeWare suite.
If you would like to see Colosseum and the rest of BeeWare develop to the point where it can be used for commercial applications, please consider supporting BeeWare financially. And if you have any leads for larger potential sources of funding, please get in touch.
After almost 4 months of work on Google Summer of Code 2017, finally I'm completing my proposal. Every widget migration and every commit/PR/issue/discussion with my mentors about Cricket , Toga and rubicon-objc were detailed on the Issue 58.
"Eating your own dog food"
The best way to show that a product is reliable to the customers is use it. So, the way to show that Toga is an effective tool to build a GUI is to build a complete application using it.
Cricket is a graphical tool that helps you run your test suites. Its current version is implemented using Tkinter as the main GUI framework. So, why not test Toga inside of another product from BeeWare? That's what I have acomplished during my GSoC work.
The proposal focus not only on the port of Tkinter to Toga, but on mapping the necessary widgets for a real application using Toga framework. To help me to map this I studied more about Tkinter, Toga, Colosseum, rubicon-objc, Objective-C, Cocoa and CSS.
The work I did during GSoC were sent throught the PR 65, reported on the Issue 58 and the final demonstration of the work can be seen in this link. There were widgets used on Cricket that weren't ready yet on Toga, so some improvements were necessary on Toga so that I could use them on Cricket. In summary here are some PRs and issues that I contributed to get my work done in Cricket:
Open PR that I sent to Toga:
- PR 201 : [Core][Cocoa] Refactoring of the Tree widget
Merged PRs that I sent to Toga:
- PR 112 : [Core][Cocoa] Enable/disable state for buttons, solved Issue 91
- PR 170 : [Cocoa] Content and retry status for stack trace dialog
- PR 172 : [Cocoa] Window resize
- PR 173 : [Core][Cocoa] Button color
- PR 174 : [Doc] Examples folder and button features example
- PR 178 : [Doc] Fix tutorial 2 setup
- PR 180 : [Doc] Update Toga widgets roadmap
- PR 182 : [Cocoa] Update the label of the Stack trace button for critical dialog
- PR 184 : [Core][Cocoa] Hide/show boxes widget
- PR 188 : [Cocoa] Fix error on MultilineTextInput widget, solved Issue 187
- PR 204 : [Core][Cocoa] Clear method to MultilineTextInput widget, solved Issue 203
- PR 206 : [Core][Cocoa] Readonly and placeholder for MultilineTextInput widget
- PR 208 : [Cocoa] Fix apply style to a SplitContainer widget, solved Issue 207
Merged PR that I sent to Cricket:
Merged PR that I sent to rubicon-objc:
- PR 34 : [Doc] Add reference to NSObject
Open issues that I sent to Toga:
- Issue 175 : [Core] Add more properties for Label and Font widgets
- Issue 176 : [Core] Add "rehint()" on the background of the widget after changing font size
- Issue 186 : [Core] Set initial position of the divisor of a SplitContainer
- Issue 197 : [Core] Get the id of the selected Tab View on the OptionContainer
Closed issues that I reported to Toga:
- Issue 167 : [Cocoa] Addition of a SplitContainer on a Box doesn't show the SplitContainer, was fixed by Russell Keith-Magee
- Issue 168 : [Cocoa] Addition of 2 boxes on an OptionContainer emits Rubicon's error, was fixed by Russell Keith-Magee
- Issue 169 : [Cocoa] Addition of 2 empty boxes on an OptionContainer emits error from Toga Cocoa platform, was fixed by Russell Keith-Magee
- Issue 181 : [Core][Cocoa] "Hide" option for widgets, was solved by me
- Issue 187 : [Cocoa] Errors on MultilineTextInput, was fixed by me
- Issue 189 : [Cocoa] ProgressBar doesn't appears in a Box, was fixed by Jonas Schell
- Issue 194 : [Cocoa] The frame of the MultilineTextInput doesn't appear, was fixed by Russell Keith-Magee
- Issue 195 : [Cocoa] ProgressBar doesn't appear inside of a Box oriented by row, was fixed by Russell Keith-Magee
- Issue 196 : [Cocoa] Set max value and value on a ProgressBar doesn't make any effect on the layout, was fixed by Russell Keith-Magee
- Issue 203 : [Core][Cocoa] Clear text on MultilineTextInput widget, was solved by me
- Issue 207 : [Cocoa] Set SplitContainer height doesn't update its size, was solved by me
Closed issues that I didn't reported but I solved on Toga:
Closed issue that I reported to Cricket:
- Issue 59 : Run selected doesn't count/ runs every test selected in a test module, was fixed by me
- Issue 1 : Seg Fault when iterate through a NSIndexSet using block notation
There are some features on Cricket that I want to help develop in a near future, for example:
- A button to refresh all the tests tree
- Cricket settings
- A gap between the output and error boxes when there is no output message
- Run a test if the user click on it
I truly believe that Toga will be the oficial framework on Python to build GUI for multiplatforms applications, so I'll continue to contribute to this project because I want to use in every application that I would need a GUI.
I would like to truly thank my mentors Russell Keith-Magee and Elias Dorneles for guide and help me so much during this period. The opportunity to be part of this community was a great honor to me, thank you so much to accept me in this program Russell Keith-Magee. Also, I want to thank Philip James that made some reviews in my PRs and Jonas Schell that fixed one issue that I sent to Toga.
With Google Summer of Code 2017 program nearing its end, it is time to summarize what I got done during the summer working on Batavia.
Batavia is a part of BeeWare's collection of projects. As it is still in its early stage of development, for my part I offered to implement a number of features missing from Batavia, ranging from elemental data types, through JSON manipulation and language constructs such as generators. I posted my proposal in this GitHub thread and kept it updated with my progress on a weekly basis.
Note that by the end of GSoC, we have decided to diverge from the proposal and forgo implementation of contextlib in favor of support for Python 3.6 2-byte wide opcodes.
Overall it was great learning experience and fun. Big thanks to my mentors Russell Keith-Magee and Katie McLaughlin, and the whole BeeWare community.
Lists and dicts
Python 3.6 compatibility
Google Summer of Code 2017 is coming to an end. After three month of working on the BeeWare project, I would like to summarise my work and share my experiences.
“No Battle Plan Survives First Contact.”
This was one of the first things Russell said to me after we decided to fundamentally restructure my proposed GSoC timetable and goals. During the community bounding period we discovered that Toga was even harder to test as we expected. The tight coupling between the platform independent Toga-Core part and the platform dependent implementations for (Windows, macOS, iOS, Linux, Android, Django, ...) was giving us a hard time to write meaningful tests.
We expect Toga to become a decently sized project, therefore we want it to have a solid and well tested base. Given that, we decided that I would spend most of GSoC to restructure Toga to make it more testable. Besides that, I would also add implementation tests to check if a given backend is implemented in the right way. If I would finish before the end of the summer I would just start with my original project proposal.
The Big Restructure of Toga
With the new goals and a fresh branch I started the journey to restructure the Toga project to make it more testable.
After hacking around and testing different things on a separate branch. I identified that the intertwined platform dependencies are the main problem. To separate the Toga-Core module form its backend implementations we decided to use the factory pattern instead of the inheritance model that we had before. Now every backend has its own factory that produces the right widgets for the platform it is running on. This way we have a clear separation between Toga-Core and the implementation level. Platform dependencies are now enclosed in the implementation level.
After the new structure was clear I ported Toga-Core as well as the backends for cocoa, iOS and GTK. I did this in the Toga branch (The Big Restructure of Toga [WIP] #185). In practice this meant that I had to manually touch almost every widget of all backends to port them to the new factory pattern.
Toga talks to native GUI frameworks, hence I had to get a good understanding about the principles and concepts behind each and every of these frameworks. At times I felt overwhelmed by the combined complexity of all the parts that make up Toga. The following is list:
- Every Toga backend wraps around a existing and unique framework. To wrap the framework you have to understand the framework.
- “I love Python, why do I have to understand Objective C”? To effectively work on the iOS and macOS backends I had to learn the basics of Objective C – if only to read the Apple docs.
- Toga has a lot of moving parts. There are backends, frameworks, libraries to talk to backends, libraries to perform the layout of the UI and more. I spend a good amount of time to understand all of these parts. The following is just a overview of newly acquired knowledge during GSoC:
Other work I did
- PR: Translated part of the beeware.org webpage into german. (PR #173)
- Helped newcomers on Gitter and GitHub.
- Tested if Toga would profit from static typing (toga/static_typing).
- Created an implementation test suite based on the AST module.
- Added test for Toga-Core.
- Updated and extended documentation on Toga-Core as well as the macOS and iOS Backend.
- Created a toga-dummy backend.
- First draft of the Settings API and working backend implementation for macOS.
- Many small and big fixes on Toga-Core, cocoa, iOS, and GTK backends. All in the main PR beeware/toga The Big Restructure of Toga [WIP]
- PR: beeware/toga fix for getting the length of the filenames array
- PR: beeware/toga Fixed #189 cocoa.progressbar with rehint
- PR: beeware/briefcase-template Fix for spaces in app name. Issue #2
- PR: beeware/toga Toga Settings API [WIP]
Future Work to be Done
- All my work sits in the PR “The Big Restructure of Toga [WIP]”. After the missing backends, namely Windows and Android, are ported, everything can be merged into master. We have to wait for the missing backends, because the new is incompatible with the old versions and they can’t coexist.
- The Settings API from my original proposal is not finished because of the above mentioned reasons. I have a first working draft and I will continue working on it after GSoC in this PR.
I would like to thank Russell Keith-Magee for being an awesome Mentor and for all the time he invested in me during GSoC. I also would like to thank the BeeWare community for helping me when ever I had a problem. Thank you!
Testing is a skill that is a vital part of every programmers training. Learning how to write good tests helps you write more robust code, and ensures that when you've written code that works, it keeps working long into the future. It can also help you write better code in the first place. It turns out that well architected code, with high cohesion and low coupling, is also easy to test - so writing code that is easy to test will almost always result in better overall code quality.
An important step in "levelling up" your testing experience is to start using a Continuous Integration, or CI service. A CI service is a tool that automatically runs your test suite every time someone makes a change - or every time someone proposes a change in the form of a pull request. Using a CI service makes sure that your code always passes your test suite - you can't accidentally slip in a change that breaks a test, because you'll get a big red warning notification.
CI is such an important service that many companies exist solely to provide CI-as-a-service. The BeeWare project has, for various projects, used TravisCI and CircleCI. Both these tools provide free tiers for open source projects, and have generously sponsored BeeWare with capacity upgrades at various times.
However, the BeeWare has had an interesting relationship with commercial CI services. This is for two reasons.
We've been able to speed up the duration of a test run by splitting up the test suite and running parts of the suite in parallel, but that forces us up against the second problem. Commercial CI services usually operate on a subscription model; higher subscriptions providing more simultaneous resources. However, our usage pattern is highly unusual. Most of the year, we get a slow trickle of pull requests that require testing. However, a couple of times a year, we have a large sprint, and we have a flood of contributions over a short period of time. At PyCon US, we have had groups of 40 people submitting patches - and they all need their submissions tested by CI. And time is a factor - the sprints only last a couple of days, so a fast turnaround on testing is essential.
If we were to subscribe to the top tier subscription levels of CircleCi and TravisCI, we still wouldn't have enough resources to support a sprint - but we'd be massively overresourced for the rest of the year. We'd also have to pay $750 or more a month for this service, which is a budget we can't afford.
So - we had a problem. To run our test suite effectively, we needed massively parallelized resources to run a test suite quickly; and at certain times of the year, we need extremely large numbers of these resources.
We also had other automated tasks that we wanted to perform. We wanted to do code linting (automated checking of code style) before a PR was tested. We wanted to check spelling of documentation. And we wanted these tasks to feed back into GitHub as automated comments and specific code review status markers, informing contributors not just that a problem occurred - but what problem occurred, and where in their code.
We also wanted to manage pipeline builds - there's no point in doing a full test of multiple versions of Python once you've established the tests are failing on one version. And there's no point testing at all if there are code style problems.
We also wanted to do things that weren't just testing. We wanted to check that contributor agreements have been signed. We wanted to automate deployment of websites and documentation.
What we had wasn't just a CI problem. It was a problem where we wanted to automatically run arbitrary code, in a safe way, in response to a GitHub event.
I've been trying to find a CI service that can meet our needs for over a year. But over the last year, a few thoughts started to congeal in my head.
- Amazon provides a API (EC2) that allows you to spool up machines of varying complexity (up to 64 CPUs, with almost 500GB of RAM), and pay by the minute for that usage.
- Docker provides the tools for configuring, launching, and running code in an isolated fashion
- Amazon also provides an API (ECS) to control the execution of Docker containers.
There's nothing specific about AWS EC2 or ECS either - you could just as easily use Linode and Kubernetes, or Docker Swarm, or Microsoft Azure... you just need to have the ability to easily spool up machines and run a Docker container. After all: a test suite is just a Docker container that runs a script that starts your test suite. A linting check is a Docker container that runs a script that lints your code. A contributor agreement check is a Docker container that checks the metadata associated with a pull request.
All you need then is a website that can receive GitHub event notifications, and start Docker containers in response.
At the start of July, I found myself between jobs, and uttered the fateful question "How hard could it be?" And so, today, I'm announcing BeeKeeper - BeeWare's own CI service.
BeeKeeper deploys as a Heroku website, written using Django. After configuring it with Github and AWS credentials, it listens for Github webhooks. When a Pull Request or Push is detected, BeeKeeper creates a build request; that build request inspects the code in the repository looking for a beekeeper.yml configuration file. That configuration file describes the pipeline of tasks that will be performed, and for each task, the type of machine that should be used, any environment variables that are required, and the Docker image that will be used.
BeeKeeper also allows the site admin to describe what resources will be used to satisfy builds. A task can say it needs a "High CPU" instance; but the BeeKeeper instance can determine what "High CPU" means - is it 4 CPUs or 32? When those machines are spooled up, how long will they be allowed to sit idle before being shut down again? How many machines should be sitting in the pool permanently? And what is the upper limit on machines that will be started?
A companion tool to BeeKeeper is Waggle. Waggle is a tool that prepares a local definition of a task so it can be used by BeeKeeper - it compiles the Docker image, and uploads it into ECS so that it can be referenced by tasks. (It's called "Waggle" because when worker bees discover a good source of nectar, they return to the hive and do a waggle dance that tells other bees how to find that source).
We've also provided a repository called Comb (named after honey comb, the place bees store all the nectar they find) that defines the task configurations that a BeeKeeper instance can use. We've provided some simple definitions as part of the base Comb repository; each BeeKeeper deployment should have one of these repositories of it's own.
There's still a lot of work to do, but we're already using BeeKeeper to Batavia and VOC, and the upcoming PyCon AU sprints will be our first outing under high-load conditions. Some back-of-envelope calculations predict that for around $50, we'll be able to provide enough CPU resources for each test run to complete running in 10 minutes or less, supporting a sprint of dozens of people.
Although BeeKeeper was written to meet the needs of the BeeWare project, it's an open source tool available for anyone to use. If you'd like to take BeeKeeper for a spin, come along to the sprints, or check out the code on GitHub.
BeeKeeper is also an example of the sort of product you'd see more of if BeeWare development was funded full time. I was able to build BeeKeeper because I had a spare couple of weeks between jobs. There is no end to the tools and libraries like BeeKeeper and Waggle that could be built to support the software development process - all that is missing is the resources needed to develop those tools. If you'd like to see more tools like BeeKeeper in the world, please consider joining the BeeWare project as a financial member. Every little bit helps, and if we can reach a critical mass of supporters, I'll be able to start working on BeeWare full time.
Around 4 years ago, I made the first commit to Cricket - the first tool that would eventually become part of the BeeWare suite. Since then, the BeeWare project has grown to encompass support for mobile platforms, two alternate Python implementations, and a cross platform widget set - as well as the developer tools that started the project originally.
For most of this time, BeeWare has been a volunteer effort. Initially, it was a solo project; however, over the last year in particular, the number of people who have contributed to BeeWare has grown rapidly. Over 300 people have now made contributions to the various BeeWare tools and libraries - due, at least in part, to the Challenge Coins we've been offering at sprints.
Of particular note is the team of 7 people who have joined the project as apiarists, helping share the load of project maintainence. I can't thank these people enough - without their assistance, I wouldn't have made anywhere near as much progress over the last year.
You may have noticed that over the last few months, progress has been especially rapid. This is because, for the last six months, BeeWare development has been partially funded by very accomodating employers at Jambon Software. My contract with Jambon allowed me to spend significant periods of time being paid to work on BeeWare - and, not surprisingly, it was possible to make enormous progress as a result. The last 6 months has seen:
- Extensive improvements to Batavia and VOC;
- An Android backend for Toga;
- A Django backend for Toga, enabling Toga apps to be deployed as web apps;
- A Winforms backend for Toga, enabling Toga apps to run on Windows with a modern appearance;
Unfortunately, my contract with Jambon is coming to a close - which means my contributions to BeeWare will go back to being what my spare time allows.
This also means the rate of progress will also slow. There's still plenty to do: there's plenty of the Python standard library to port to Batavia and VOC; Toga needs much wider widget support; Colosseum needs to be extended so it supports the full CSS box model, not just CSS3 Flexbox; and the tools that started it all - Cricket, Bugjar, Duvet, and others - all need to be ported to Toga.
I would like to be able to work on BeeWare full time. However, the simple reality is that unless I can find a way to pay for this work, it will only be able happen in my spare time.
So - this is an appeal to you - the Python community. If you are excited by the prospect of having access to Python on mobile platforms, or you would like to write applications in Python that have completely native user interfaces - I need your help.
For just US$10 a month - you can join the BeeWare project as a member, and help make this dream become a reality. If I can find 1000 people in the Python community who want these these tools and are willing to support their development financially, I can start working on BeeWare full time. Of course, this target is even easier if companies get involved and sponsor at higher tiers.
If I can find more than 1000 people, then even more becomes possible. The obvious option would be to hire other experienced developers to assist with the work. The idea of having others to bounce ideas off during the development process is very appealing. However, we can also use this as an opportunity to do some social good.
For some time, I've had an open offer to mentor anyone who wants to get involved with Open Source contribution using the BeeWare project. However, not many people have been able to seriously take up this offer, because the time required to seriously take up the offer is prohibitive. I'd like to be able to extend my offer to more than just a casual mentoring relationship. I'd like to be able to hire - and pay - one or more junior developers for the BeeWare project, and focus on giving that opportunity to people from underrepresented demographics.
It's still early days for BeeWare. Financial support means faster progress. More widgets. Better documentation. More of everything you’ve seen so far from BeeWare. If I can find full time funding for myself - or better still, for myself and a small team - then I have no doubt that the BeeWare suite will become a viable alternative for commercial projects in very short order. Best of all, we will be able to do this without having to give up on the ideals of the open source movement.
I'm excited for what the future holds for BeeWare. I hope you'll join me on this journey.
(And if you’re contemplating signing up, and you’re coming to PyCon US in Portland this May, let me drop a gentle hint… sign up now. It will be worth it #cryptic)
The tickets for PyCon US 2017 have been given away. We look forward to seeing everyone who can make it to the conference at Booth 103!
Want to to go @PyCon, can’t afford it? @PyBeeWare has 2 tickets to give away. Email firstname.lastname@example.org and tell us why you want to bee there!
— PyBee (@PyBeeWare) January 30, 2017
PyCon US 2017 is running in Portland, Oregon from May 17 - 25, and it's bound to be another amazing conference.
With this booth, we get two tickets to the conference. This includes:
- access to the opening reception
- 3 days of conference talks, expo hall / job fair
- breakfasts, breaks, lunches, and
- swag bag
Thing is, both Russell and I have already registered.
So, we want to give you the ticket.
If you can get yourself to Portland on the conference days, we want to give you our free ticket.
What do we want in return?
Just a little bit of your time.
The Bee Team will be helping to staff our booth, but we'd also like to see (and give!) talks, so helping us out by running the booth would be lovely. (Plus, and I'm sure Russell would agree, just helping out on the booth earns you a coin)
Plus, we'd love for you to stick around for the famous coding Sprints. These are held in the four days after the event, while people are still in town. We get coffee, lunch, and a room full of tables, chairs and copious amount of power points, and we code on projects. We'll be running a BeeWare sprint where we will be mentoring and helping first time contributors earn their shiny challenge coin
Does this sound like something you'd be interested in?
Please, email us!
Tell us about yourself! Who you are, what you do, why you want to go to PyCon and what makes you interested in Python.
We need to allocate our tickets early, so please email us by February 12, 2017
We'd love to see you there! ✨
[This article has been cross-posted on glasnt.com/blog]
At PyCon AU 2015, and again at DjangoCon US 2015, I gave a talk entitled "Money Money Money: Writing software, in a rich (wo)man's world". The talk was a summary of the issues around one of the biggest problems that I see facing the open source community: how to provide the resources that are needed to develop and maintain the software that we, as a community depend upon. This means providing maintenance and support for established projects, large and small; but also providing an ecosystem where new ideas can be incubated, developed and matured until they present compelling alternatives or significant benefits over closed source offerings.
It's been almost 18 months since I first presented this talk, but the issue persists. I haven't been alone in noticing and drawing attention to this issue, either. Nadia Eghbal was commissioned to write a white paper for the Ford Foundation entitled Roads and Bridges highlighting the chronic need for resources to support the basic infrastructure that underpins large parts of the modern economy. Eric Holscher (maintainer of Read the Docs) blogged about the problems he's had raising funds, despite the fact that the service he delivers is a widely used - arguably indispensible - part of the Python ecosystem.
However, despite this attention, it still doesn't get anywhere near as much attention as it should. And it's an issue that is of pressing importance to me, as the BeeWare project is looking for ways to fund the development needed to take us from "interesting technical demo" to "compelling technical solution".
A few months back, it was suggested that I should publish a blog post to accompany the video presentation. I dragged my feet on doing this, until industrious BeeWare contributor and all-around nice guy Elias Dorneles offered to take my speakers notes and convert it into a transcript.
So - here it is. Money Money Money: Writing software, in a rich (wo)man's world. If you've got any questions, disagreements, requests to present this at your conference, or just a generic offer of a bag of cash, you can reach me at email@example.com.
Hi, I'm Russell Keith-Magee, if you've heard of me before, it's probably because of my work on the Django project; I've been a core team member for almost 10 years, and president of the Django Software Foundation since 2011.
One of the big challenges of the Django project - and any big project for that matter - has been to secure certainty in its development future.
My day job is as CTO and co-founder of TradesCloud. We’re a Software as a Service for tradespeople -- plumbers, electricians, carpenters and the like. TradesCloud depends on a number of open source projects - Django, Apache, memcache, countless others. So I've got a business interest in the continuity of these open source projects - but I certainly don't have the resources to fund them all myself.
I've also got a declared interest in GUI/UI issues, especially as they relate to development tools. I've got grand visions of what I'd like to do with this project, and I've received some great contributions from the community, but it's still largely my own work. But my startup would be able to make great use of these tools if they were mature.
I've also a maintainer of some smaller projects, like the Python wrapper to the Xero API. I started the project because I had an itch; and my itch has been scratched. But now I've open sourced that project, which means I've inherited a maintenance task. I've accepted help from a couple of people - most notably Matthew Schinkel and Aidan Lister, who have done great work. But if I'm honest, the maintenance burden of PyXero well exceeds the time I can reasonably dedicate to it.
So - I've got vested interests in free software. I've got an interest as a producer of a successful project with a high profile; as the producer of a small project with lots of users but little personal incentive; and as a producer of smaller projects with almost no profile but grand plans. These projects all have different resourcing needs, reflecting their maturity as projects.
I've also got interests as a consumer of free software - both in terms of the software I rely on to develop my own projects, and in terms of my commercial interest in the long term maintenance of tools and platforms. I need these projects to continue to develop, survive and thrive.
Based on my experience, I'd like to make a bold assertion:
Absent of any other constraints, given equivalent resources, the free software approach produces vastly superior engineering outcomes than the closed-source approach.
The catch is the operative clause "given equivalent resources". Most free software projects aren't developed using anything close to "equivalent" resources.
In some cases, this is a blessing in disguise - regardless of the project, having scarce resources is an excellent crucible for burning away the unnecessary to leave only the base metal. But it's not always a blessing.
The moral high road is littered with the corpses of our allies
Talk to any prominent free software developer, and amongst the success stories, you'll also hear some consistent troubles - that they've got great ideas and grand plans, but no time to execute; that they're about to burn out due to the pressues of maintaining their project; or that they've had yet another mailing list discussion with someone who doesn't understand why you didn't drop everything to help them fix their problem. And there are plenty of examples.
OpenSSL is the software that drives pretty much every "secure" connection on the internet - and yet it took until the discovery of Heartbleed - a critical vulnerability that sent the internet into a tailspin - before it could find funding to pay for maintainers.
Another example - GnuPG - Werner Koch almost went bankrupt trying to support GPG - a project that many others depends upon for trust in their release process. He was rescued, at deaths door, by the Linux Foundation's Core Infrastructure initiative.
Those are both examples that ended with funding; but it's not all happy endings.
Take the example of Capistrano. Hugely popular configuration management tool around 2007-8, maintained by Jamis Buck. In 2008, citing burnout and maintenance overhead, he famously withdrew support for Windows, saying "Windows may be the 800 pounds gorilla in the room, but it's not my gorilla, and it's not in my room". This was an incredibly unpopular move; but even with that scale down, Jamis burned out in 2009, abandoning Capistrano, and a number of other projects, without maintainers.
"Hi I'm an engineer at a well-funded company and we need this feature can someone implement it for free?" -- Every FOSS mailing list.— Christophe (@Xof) July 17, 2015
The thing is - that this is a community that has lots of cash. In the grand scheme of things, software development is a well funded industry. If companies can find money for foosball tables and meditative ball pits, they should be able to find resources to help maintain the software on which they've based their success. And if you're on the receiving end of the problem - a developer of free software - that can be really frustrating.
For me, this is the great unanswered question of the free software movement - how to reconcile the discrepancy between the clear demand for a software product, and the ability to convert that demand into the time and resources needed to service that demand.
Free software: Dream vs Realty
Although the theory says that anyone can contribute to a free software project, in reality, every single project of any significance has leaders. At the most basic level, it's whoever has the commit bit. And you need that leadership, especially when anything to do with design is involved. The running gag is that a camel is a horse designed by committee. The worst APIs we deal with on a daily basis are the ones that were designed by committee. You need someone with taste running the show.
But there's a bigger problem - there's the extent of the engagement that users have with a project. Here's a though experiment to prove my point. We're in a room full of Python users. Django is a free software project.
- Who in this room has found a bug in Django, or has a niggling thing they'd like to see fixed Django API?
- Who has turn that niggle into a bug report to Django?
- Who has submitted a patch to Django for that niggle?
- Of those, who have had that patch committed?
When this is done in front of a live audience, at each question, there are less people raising their hands
Products vs Projects
So what's going on here? Well, it reflects 2 different ways of looking at a piece of software - Projects and Products. And it's a matter of personal perspective - my project can be your product, and vice versa.
When I view some code as a project, it's a body of code where I'm contributing to a larger goal. I'm willing to spend resources focussing on other people's needs in the hope that their needs will help make the project as a whole better. I'm willing to do this because I get some personal gain, like an enhanced public profile; or if the tool is really close to my work coalface; or where I know I can make a substantive contribution.
But the further I get away from my coalface, and the harder it is to make a contribution, the less inclined I am to even want to contribute to the project. Most of the time, it's a product that I'm using, warts and all. If a product has bugs, I'll work around them, or find an alternative, rather than navigating the contribution process and contributing a patch back upstream.
In the case of a product, the freedoms afforded by Free software are a bit like freedom of speech - it's a freedom I definitely want, it's comforting to know it's there, but I don't spend every day making sure that I fully exploit the extents of those freedoms. There are people who do - protestors, advocates for controversial social positions - and one day, given the right circumstances, I might join in and help them. But most of the time, I just want to be pragmatic, and get on with living.
This dichotomy between the theory and practice is also the reason why comments like "Patches Welcome" get made in Free software mailing lists. On the one hand, it's completely correct. Anyone can contribute, and on most projects patches are welcome. But most people don't look at a new open source project as an opportunity to engage and contribute. Most people just want to use the damn software.
And you can argue it's because people are focussing on the wrong interpretation of "free", and haven't "captured the spirit of free software". Which is 100% true, but utterly counterproductive as a position. Anyone who has done any UX work knows that if users are consistently making a mistake, blaming the user gets you nowhere. You were in charge of what the user experienced, and they made the "error" because of a fundamental cognitive disconnect.
And even if everyone did get the right message - let's be realistic - it wouldn't work anyway. The mythical man month showed us that you don't deliver a project faster or better by throwing more people at it. 9 women can't make a baby in 1 month - a project doesn't just need resources - it needs the right resources, in the right quantities. And ultimately, that means projects finding a way to get the resouces they need to be self sustaining.
So, that means that if we want free software to be maintained, and maintained well, we need to find a way to fund it's maintenance.
Use value vs Sale value
18 years ago, Eric S Raymond published an essay entitled "Cathedral and the Bazaar". This essay catalyzed the start of the open source movement - a redefinition of "free software" to make it clear that the openness, not price, was the important property. What isn't as well remembered is that "Cathedral and the Bazaar" isn't the only essay Raymond wrote at that time. He published 2 other essays shortly after CatB - "Homesteading the Noosphere", about the social organization and motivations behind free software projects, and The Magic Cauldron, about the economics of Free software.
One of the key distinctions that Raymond highlights in that essay is the difference between Use value and Sale value.
The sale value of a product is its value as a salable commodity - literally, what you sell it for.
The use value of a product is its economic value as a tool, as a productivity multiplier. This is how much economic benefit the user will get from the product.
The important the is that the two aren't necessarily connected. In a traditional manufacturing environment, the focus is on sale value, because that's usually linked to the manufacturing cost - the cost of parts and materials.
But most software isn't produced for sale value - it's in house software produced for use value. In the case of free software, the "sale" value for free software is 0. But that doesn't mean there isn't use value, and the catch is to work out how to exploit the use value that exists within an organization as a monetization channel.
So, what are your options?
Well, you can sell merchandise. And while this is relatively easy to do, lets be honest - you're not going to fund a software empire selling t-shirts.
You can get users to pay for documentation. Write a book, get users to pay for it.
Unfortunately, the dirty little secret of the tech writing scene is that nobody gets rich writing tech books. They're incredibly valuable resources for the community, they're great for padding your resume, but not so great as a revenue stream.
One area that does pay well is selling training. Employers are willing to pay big bucks for training courses; if you can put together a 3 day intensive workshop, you can sell it over and over again.
You can productize your offering. The source code remains free, but a simple, easy-to-use installer costs money. This works really great when when what you have is a clearly identifiable product - like an IDE.
One specific model of productization is SaaS - give the code away for free, but pay for the convenience of having someone else administer it for you. Any open source web software is a good example of this - you can install the software on your own servers, but honestly, unless you've got a reason, you use someone elses hosted solution and let them take care of it for you.
But, SaaS is only viable if you can deliver as a service - which means it's really only viable for web. And, as technologies like docker commodify deployment, it's possible that even this revenue stream might evaporate.
So what else can you sell?
You can sell access to the developers. If you're the maintainer of a project, you're in the best position to provide support, or debug complex problems - which means you're in a prime position to sell support and consulting. Honza Král <https://twitter.com/honzakral> from ElasticSearch calls this the "Ghostbusters" business model - "Who you gonna call?"
You can sell access to the software. Trolltech did this with Qt; Riverbank still does with the PyQt bindings. The library itself is GPL - but if you want to use it on a closed source project, you can, for a hefty fee. This has the advantage that it forces commercial interests to pay for what they're using; but it also discourages small scale commercial tinkering - if I'm writing a new tool, and I'm forced to choose between open sourcing my tool or a $1000 price tag, I'll probably choose a different toolkit.
You can also get into the business of providing certifications and guarantees - auditing code to ensure quality or compatibility, providing guarantees about fixing bugs, and certifying that individuals are skilled in the use of the product. This is a big part of RedHat's business model - auditing packages, making sure they all interoperate as expected, ensuring that security updates are available and meet the same standards, and certifying system administrators.
This is a set of products that appeals to the "enterprise" high end of town - and that's is a lucrative end of the market. But those customers also tend to only want these guarantees for certain types of software - broadly speaking, "Boring fuddy-duddy" software. Your operating system, your database - these are pieces of software that need to be rock solid in an enterprise setting. Your debug toolbar - not so much, unless maybe they've being wrapped as part of a suite of tools.
Undermining your value proposition
Another problem with many of these revenue sources is that if you run your project well, a lot of them are ruled out - or at least severely curtailed.
If you make something truly simple to use, you've just removed the need for books and training courses - or, at the very least, moved the ground to "advanced" courses - which are harder to write, and have a smaller audience.
Django saw this first hand - Django's documentation was famously very good from a very early stage, and as a result, it was quite hard to get publishers to take on Django book projects - because the documentation was too good, and was undermining the market for books. It's taken a long time, and mostly self-publishing, to get good Django books available for sale.
If you have a project mailing list, where the community answers questions for free, and it's a healthy, responsive mailing list... why would I pay for support?
If your software is well designed, and modular, and those interfaces are well documented, and I need a modification - why wouldn't I just write the mod myself?
Now, Ok - I'm exaggerating. There are legitimate reasons to pay for support or to bring in a consultant even if the interfaces are clean and well documented. But my point is that the better you do a job as a free software engineer, it becomes harder to make the business case for your revenue stream, because the value proposition becomes less obvious.
And if it isn't directly undermining your value proposition, it can still undermine you indirectly, because the more time you spend earning money, the less you spend doing what the money pays for. Administering a certification program takes time. Writing and delivering training courses takes time. Consulting can be lucrative, but developing a sales pipeline takes time. And if you're consulting, you need to make sure that sales pipeline is full, which means you're going to err on the side of taking on more work than less... which means you've just closed the door on the the time you have available to work on open source.
You also need to be careful that in selling your business case, you don't undermine your main project. If every hard question on the mailing list is answered "We can answer that for a fee", you're going to sound like a shill.
If a project says it needs money to make sure we stay on top of security issues... you have to be really careful how you say it, because the easy interpretation is "well, the project must be insecure, because they're not on top of security now".
A question of scale
There's also a question of scale. Python, Django - these are large projects with large communities. Making a business case for Python or Django isn't too hard. Not trivial, but possible. But is the maintainer of a smaller project - like, say Django Debug Toolbar - are they seriously expected to write and sell a book about Debug Toolbar? Or training and certification in its use?
Smaller projects are no less important to the vitality of the overall Django ecosystem, but those projects aren't afforded the same opportunties for fundraising as larger projects.
It's also important to realize that not all products are afforded all these opportunities for revenue. An IDE can be productized; a developer library probably can't. Different projects will need different mixes of revenue streams.
Do you have to sell something?
Ok - but can we do this without selling anything at all?
"How to make money from open source" is like "How to make money from clean water" or public education or science.— Pieter Hintjens (@hintjens) May 27, 2015
Free software stems from a moral imperative - so can we fund development through altruism and patronage? Well, it's harder, but there's evidence that it can be done.
Crowdfunding and Bug Bounties
One option that has seen a lot of activity recently is crowdfunding. Platforms like Kickstarter and Indiegogo provide a way to a group of people to pitch in and contribute to a goal.
The Django community has a couple of examples of very successful crowdfunding projects, each raising tens of thousands of dollars (Django Migrations, Multiple Template engines, django.contrib.postgres, Django Rest Framework). But - if you survey the people who did these projects, they didn't make money on these projects. The amount of engineering time that went into them well exceeded the money raised.
Another idea that gets floated regularly is the idea of bug bounties - placing a price on specific bugs. This is really just another form of crowdfunding - You want a specific bug fixied or feature added? Contribute money. You really want it - contribute more. And eventually someone will take the bait and fix the bug or implement the feature.
It's an attractive model - but it also has problems - most notably, Who gets paid. Do you pay by lines of code written? That ignores the contribution of reviewers. How much more important than writing code is reviewing code or triaging a bug?
But the biggest problem I see with crowdfunding approaches is that they are in conflict with establishing a working income. If you're hiring yourself out, the shorter the engagement, the higher your hourly rate. This is a hedge against unemployment at the end of the contract. If you're doing a Kickstarter, you're best served by having a small, clearly described, clearly achievable goal. So - a month or two of work.
That means you need to charge your short term rate in order to guarantee long term income. But it's also in your interest to have a low fundraising target, so that the campaign actually tips. If it's too high, it could scare off potential bidders. It also means the community pays a premium for any new feature, which isn't the best use of already scarce community resources.
Grants, Fellowships, Patreon
Ok - so if you want longer term income, you need to look at longer term engagements. That means that you stop looking at funding specific projects, and start looking at funding the person, with grants and fellowship positions.
I'm including Patreon in that list because it's effectively "crowdsourced patronage". You're not paying for a specific thing - it's "keep doing what you're doing" money.
Over the last 10 months, Django has been using this model. We hired a fellow at the end of last year. The fellow - whose name is Tim Graham - is responsible for keeping the project wheels greased.
From a technical perspective, this has been a roaring success. Tim's made great inroads into our ticket backlog, and response times on ticket reviews are lower than they've ever been.
The hard part has been raising the money to pay him.
We did a fundraiser at the start of the year specifically to fund the Fellow; that campaign has raised just over $50k US. That's no small chunk of change - but it's also not a lot when you're talking about a full time employee. We're going to need to do another fundraiser really soon if the Fellowship is going to continue.
Almost everyone agrees it's been money well spent, but converting that into donations has been hard work.
Another other approach for raising money is to embrace the devil, and go commercial. Commercial interests have the money, so why shouldn't they pay for it? This can be very successful under certain circumstances - but you have to be able to make a business case for the corporate owner.
OpenStack is a great example of this. Why are Rackspace, HP, Redhat, Ubuntu all interested in OpenStack? Because they sell products that benefit from commodifying the hosting environment. By making it cheap and easy to control cloud deployments, they increase the size of the market for cloud hosting, which means more money for their core business - either directly (like Rackspace), or indirectly (like RedHat, because cloud servers still need operating systems).
Node.js is funded by Joyent for similar reasons - Joyent is making a long play that if it's easier to develop web software, more people will write web software, increasing the market for Joyent's services.
However, Node.js is also a cautionary tale about corporate interests - what happens when the community and the corporate sponsor disagree about the direction of the project? Well, you get project splits, like the io.js split. Corporate money can corrupt. You need to be careful that project governance is independent of the funding source.
Having a single corporate master also puts the project the risk if that corporate master loses interest. Django was originally under the wing of the World Company. Eventually, that interest waned, and the corporate support dried up, too.
The other corporate answer that gets floated is to go get venture money. And there are some examples out there of people doing this. Meteor, for example, is funding their development with money from a venture fund. They've got plenty of money to hire engineers, designers - whatever they need.
What makes me nervous about this model? We've been here before. Who here remembers Eazel? For those who don't remember, In the late 90s, Eazel was a dot-com company founded to develop the Nautilus file manager. And, we got 2 years of open source code. And then the bubble burst, and the company went bust. Now, the good thing is that we still have the code. But it would be better to have the code still being actively developed.
Developing a patronage culture
And again, all this discussion is happening in an industry where billion dollar valuations are kicking around.
Large parts of our industry, are built on a foundation of FOSS - but those with the most capacity to contribute, in many cases, aren't contributing.
Some are - there are some companies that do give huge amounts back to open source communities. But there are also a lot of companies that don't give back.
And even those that do give back - many of them give the help they're able to give, which isn't necessarily the help that is needed. The Django Software Foundation regularly receives offers for free or subsidized hosting from hosting providers. And don't get me wrong - that's great, and incredibly useful.
But what we really need is someone on the payroll to review bugs. Earlier on in the lifecycle of Django, we needed people to write new big features. We're in a constant need of graphic designers, artists, tech writers. We need people who know how to do sysadmin, and community outreach. And we don't need 400 people donating 1 hour; we need 10 very specific people donating 40 hours.
When a disaster hits, organizations like the Red Cross call for donations. And they will usually say "please give us money, not tins of food or blankets". The reason? Money can buy what is needed. You can't control what people donate, and even if they do happen to donate exactly what you need, there's a huge logistic task of working out what has been donated, and getting it where it's needed.
FOSS projects are much the same - they require resources. Some companies donate in kind, and that's great, but it's rarely what is needed, and it's easy to get distracted working out what to do with all resources that have been donated, but you don't have an immediate use for.
And sadly, just as with charities - many companies don't donate at all. And I'm not claiming these companies are being deliberately malicious in not funding open source - if anything, we as a community have failed them because we haven't helped them help us.
Most importantly, we don't have a mechanism in place to make it easy to spend money, and easy to receive money. The current state of affairs clearly demonstrates that it's not enough that there is a lot of money in a community - you have to make the mechanism for donating that money as obvious and seamless as possible, and you have to have someone to direct that money towards.
One model community that I think does this really well is the Wordpress community. Wordpress is GPL'd software. There are books, videos, blogs on how to write wordpress plugins and themes - the same as there would be for any open source software community. But criticially, there are also books, videos and blogs on how to make a business writing Wordpress plugins and themes. Wordpress is GPL. And therefore, so are all the plugins. They've got a store where you can buy plugins (and get free ones), and easily install them in your Wordpress instance.
The Wordpress ecosystem has fundamentally embraced the fact that money needs to be part of the equation, and by doing so, they've created an industry that is self-funding - and this, I'd argue, is one big reason for the success of Wordpress as a platform.
A controversial proposal
In that vein, I'd like to make a controversial proposal.
What if PyPI was a revenue stream? What if, when I registered an application with PyPI, I could specify a price - either per install, or per version, or per-app, or per-month. When I pip install, the cost just gets added to my PyPI account, and I get a monthly charge on my credit card; and that gets passed back to the developer projects.
If PyPI stats are to be believed, Django is downloaded from PyPI more than 1 million times per month. If there was a 10c toll on each download, that would raise $100,000 per month - enough to pay for 10 full-time well paid developers, or a metric buttload of diversity, outreach, DjangoGirls, and community support.
You could also leverage PyPi dependency information to pay it forward - if you're writing a project that depends on another, you could opt to pass some of your revenue upstream. Or, request that downstreams provide a tithe.
And then there's the biggest dependencies of all - PyPI and Python itself. If PyPI took a small cut of the revenue raised, that could help pay for the development and maintenance of PyPI - and potentially Python as well.
You could also take some of the money collected and put it into a development pool - so if there's a new project that needs some bootstrapping cash to get started, or an established project that needs some help for a big task (like a Python 3 upgrade), the community as a whole can easily funnel money towards that project.
What about people who don't have money? Well, we can offer free passes. You're at a DjangoGirls event? Use this promo code to get free access.
You can still give away your software at no cost. And it's still free software - you're still delivering Python, so you're still going to get the source. You could probably even make the signing up process completely optional. All we'd be doing is paving the cowpath - making it easy to collect a toll on the easy way to use your software.
I'm not going to say that any of this would be easy to implement. Completely aside from any technical challenge and nailing down details, it's a big philosophical change for the community, and that's a much bigger issue. But this philosophical question about money is a discussion that we, as a community need to have.
What we've got here is a tragedy of the commons. Everyone agrees this software is good. Everyone knows that software needs to be maintained. But why should one individual company take on the economic burden of funding maintenance when their competitors are getting the benefit for free?
Learn from the MPAA and RIAA: Make it easy to do the right thing
For me, the key lesson from the music and movie industry from the last 20 years is that if you make it easy to do the right thing, people will do the right thing. If you make it simple and seamless for a large company to swipe their credit card and get charged $100 a month for using all this open source software that has been developed, they will. Yes, people will cheat. Don't worry about it, as long as cheating isn't easier than doing the right thing.
The biggest reason that this isn't happening right now is that it isn't obvious what the "right" thing is. I could donate to the DSF or PSF... but it's a bit fiddly to organize, and it's not clear where the money goes... I could sign up to a few Patreons, or sign up to a bug bounty program... but not everybody is there, and those that are aren't really earning enough to make a living, and I don't want to just contribute beer money.
So what are we to do? The short answer is "I don't know". There probably isn't going to be a silver bullet. I'd like to think that a monetization channel on PyPI would work, but I'll be the first to admit that it might not be the answer, or there might be problems I haven't forseen.
But I firmly believe this conversation about money is conversation we have to have. Free software as a movement is over 30 years old. Open source is almost 20. We've conquered the technical hurdles. We're even conquered the policy hurdles. Now it's time to tackle how we make this a long term sustainable industry, not just commercial interests exploiting the naïveté of a stream of doe-eyed volunteers.
I'm actually going to take another slightly controversial position, and not take questions at this point, for two reasons - firstly, because this sort of talk is a magnet for "I have a comment, not a question" responses; but also because this isn't a subject where I have the answers.
This is a discussion we need to have as a community - and I'm eager to have that discussion - just not on this stage.