Python supercharged for fastai development
Project description
Welcome to fastcore
Python is a powerful, dynamic language. Rather than bake everything into
the language, it lets the programmer customize it to make it work for
them. fastcore
uses this flexibility to add to Python features
inspired by other languages we’ve loved, like multiple dispatch from
Julia, mixins from Ruby, and currying, binding, and more from Haskell.
It also adds some “missing features” and clean up some rough edges in
the Python standard library, such as simplifying parallel processing,
and bringing ideas from NumPy over to Python’s list
type.
Installing
To install fastcore run: conda install fastcore
(if you use Anaconda,
which we strongly recommend) or pip install fastcore
. For an editable
install,
clone this repo and run: pip install -e ".[dev]"
.
fastcore is tested to work on Ubuntu, Macos and Windows, for the
versions tagged with the -latest
suffix in these
docs.
A tour
fastcore
contains many features. See the
docs for all the details, which cover the
modules provided:
- test: Simple testing functions
foundation
: Mixins, delegation, composition, and morextras
: Utility functions to help with functional-style programming, parallel processing, and moredispatch
: Multiple dispatch methodstransform
: Pipelines of composed partially reversible transformations
Here’s a (somewhat) quick tour of a few higlights, showing examples from each of these modules.
Documentation
All fast.ai projects, including this one, are built with nbdev, which is a full literate programming environment built on Jupyter Notebooks. That means that every piece of documentation, including the page you’re reading now, can be accessed as interactive Jupyter notebooks. In fact, you can even grab a link directly to a notebook running interactively on Google Colab - if you want to follow along with this tour, click the link below, or click the badge at the top of the page:
colab_link('index')
The full docs are available at
fastcore.fast.ai. The code in the examples
and in all fast.ai libraries follow the fast.ai style
guide. In order to support
interactive programming, all fast.ai libraries are designed to allow for
import *
to be used safely, particular by ensuring that
__all__
is defined in all packages. In order to see where a function is from,
just type it:
coll_repr
<function fastcore.foundation.coll_repr(c, max_n=10)>
For more details, including a link to the full documentation and source
code, use doc
, which pops up a window with this information:
doc(coll_repr)
The documentation also contains links to any related functions or classes, which appear like this: coll_repr (in the notebook itself you will just see a word with back-ticks around it; the links are auto-generated in the documentation site). The documentation will generally show one or more examples of use, along with any background context necessary to understand them. As you’ll see, the examples for each function and method are shown as tests, rather than example outputs, so let’s start by explaining that.
Testing
fastcore’s testing module is designed to work well with nbdev, which is a full literate programming environment built on Jupyter Notebooks. That means that your tests, docs, and code all live together in the same notebook. fastcore and nbdev’s approach to testing starts with the premise that all your tests should pass. If one fails, no more tests in a notebook are run.
Tests look like this:
test_eq(coll_repr(range(1000), 5), '(#1000) [0,1,2,3,4...]')
That’s an example from the docs for coll_repr. As you see, it’s not showing you the output directly. Here’s what that would look like:
coll_repr(range(1000), 5)
'(#1000) [0,1,2,3,4...]'
So, the test is actually showing you what the output looks like, because
if the function call didn’t return '(#1000) [0,1,2,3,4...]'
, then the
test would have failed.
So every test shown in the docs is also showing you the behavior of the library — and vice versa!
Test functions always start with test_
, and then follow with the
operation being tested. So
test_eq tests for equality
(as you saw in the example above). This includes tests for equality of
arrays and tensors, lists and generators, and many more:
test_eq([0,1,2,3], np.arange(4))
When a test fails, it prints out information about what was expected:
test_eq([0,1,2,3], np.arange(3))
----
AssertionError: ==:
[0, 1, 2, 3]
[0 1 2]
If you want to check that objects are the same type, rather than the just contain the same collection, use test_eq_type.
You can test with any comparison function using test, e.g test whether an object is less than:
test(2, 3, operator.lt)
You can even test that exceptions are raised:
def divide_zero(): return 1/0
test_fail(divide_zero)
…and test that things are printed to stdout:
test_stdout(lambda: print('hi'), 'hi')
Foundations
fast.ai is unusual in that we often use mixins in our code. Mixins are widely used in many programming languages, such as Ruby, but not so much in Python. We use mixins to attach new behavior to existing libraries, or to allow modules to add new behavior to our own classes, such as in extension modules. One useful example of a mixin we define is Path.ls, which lists a directory and returns an L (an extended list class which we’ll discuss shortly):
p = Path('images')
p.ls()
(#6) [Path('images/mnist3.png'),Path('images/att_00000.png'),Path('images/att_00005.png'),Path('images/att_00007.png'),Path('images/att_00006.png'),Path('images/puppy.jpg')]
You can easily add you own mixins with the patch decorator, which takes advantage of Python 3 function annotations to say what class to patch:
@patch
def num_items(self:Path): return len(self.ls())
p.num_items()
6
We also use **kwargs
frequently. In python **kwargs
in a parameter
like means “put any additional keyword arguments into a dict called
kwargs
”. Normally, using kwargs
makes an API quite difficult to
work with, because it breaks things like tab-completion and popup lists
of signatures. utils
provides
use_kwargs and
delegates to avoid this
problem. See our detailed article on
delegation on this topic.
GetAttr solves a similar
problem (and is also discussed in the article linked above): it’s allows
you to use Python’s exceptionally useful __getattr__
magic method, but
avoids the problem that normally in Python tab-completion and docs break
when using this. For instance, you can see here that Python’s dir
function, which is used to find the attributes of a python object, finds
everything inside the self.default
attribute here:
class Author:
def __init__(self, name): self.name = name
class ProductPage(GetAttr):
_default = 'author'
def __init__(self,author,price,cost): self.author,self.price,self.cost = author,price,cost
p = ProductPage(Author("Jeremy"), 1.50, 0.50)
[o for o in dir(p) if not o.startswith('_')]
['author', 'cost', 'name', 'price']
Looking at that ProductPage
example, it’s rather verbose and
duplicates a lot of attribute names, which can lead to bugs later if you
change them only in one place. fastcore
provides
store_attr to
simplify this common pattern. It also provides
basic_repr to give
simple objects a useful repr
:
class ProductPage:
def __init__(self,author,price,cost): store_attr()
__repr__ = basic_repr('author,price,cost')
ProductPage("Jeremy", 1.50, 0.50)
ProductPage(author='Jeremy', price=1.5, cost=0.5)
One of the most interesting fastcore
functions is the
funcs_kwargs
decorator. This allows class behavior to be modified without
sub-classing. This can allow folks that aren’t familiar with
object-oriented programming to customize your class more easily. Here’s
an example of a class that uses
funcs_kwargs:
@funcs_kwargs
class T:
_methods=['some_method']
def __init__(self, **kwargs): assert not kwargs, f'Passed unknown args: {kwargs}'
p = T(some_method = print)
p.some_method("hello")
hello
The assert not kwargs
above is used to ensure that the user doesn’t
pass an unknown parameter (i.e one that’s not in _methods
). fastai
uses funcs_kwargs in
many places, for instance, you can customize any part of a DataLoader
by passing your own methods.
fastcore
also provides many utility functions that make a Python
programmer’s life easier, in fastcore.utils
. We won’t look at many
here, since you can easily look at the docs yourself. To get you
started, have a look at the docs for
chunked (remember, if
you’re in a notebook, type doc(chunked)
), which is a handy function
for creating lazily generated batches from a collection.
Python’s
ProcessPoolExecutor
is extended to allow max_workers
to be set to 0
, to easily turn off
parallel processing. This makes it easy to debug your code in serial,
then run it in parallel. It also allows you to pass arguments to your
parallel function, and to ensure there’s a pause between calls, in case
the process you are running has race conditions.
parallel makes
parallel processing even easier to use, and even adds an optional
progress bar.
L
Like most languages, Python allows for very concise syntax for some very
common types, such as list
, which can be constructed with [1,2,3]
.
Perl’s designer Larry Wall explained the reasoning for this kind of
syntax:
In metaphorical honor of Huffman’s compression code that assigns smaller numbers of bits to more common bytes. In terms of syntax, it simply means that commonly used things should be shorter, but you shouldn’t waste short sequences on less common constructs.
On this basis, fastcore
has just one type that has a single letter
name: L. The reason for
this is that it is designed to be a replacement for list
, so we want
it to be just as easy to use as [1,2,3]
. Here’s how to create that as
an L:
L(1,2,3)
(#3) [1,2,3]
The first thing to notice is that an
L object includes in its
representation its number of elements; that’s the (#3)
in the output
above. If there’s more than 10 elements, it will automatically truncate
the list:
p = L.range(20).shuffle()
p
(#20) [0,10,7,16,5,1,14,17,9,8...]
L contains many of the
same indexing ideas that NumPy’s array
does, including indexing with a
list of indexes, or a boolean mask list:
p[2,4,6]
(#3) [7,5,14]
It also contains other methods used in array
, such as
L.argwhere:
p.argwhere(ge(15))
(#5) [3,7,11,14,18]
As you can see from this example, fastcore
also includes a number of
features that make a functional style of programming easier, such as a
full range of boolean functions (e.g
ge,
gt, etc) which give the same
answer as the functions from Python’s operator
module if given two
parameters, but return a curried
function if given one
parameter.
There’s too much functionality to show it all here, so be sure to check
the docs. Many little things are added that we thought should have been
in list
in the first place, such as making this do what you’d expect
(which is an error with list
, but works fine with
L):
1 + L(2,3,4)
(#4) [1,2,3,4]
Function dispatch and Transforms
Most Python programmers use object oriented methods and inheritance to
allow different objects to behave in different ways even when called
with the same method name. Some languages use a very different approach,
such as Julia, which uses multiple dispatch generic
functions. Python
provides single dispatch generic
functions as part of the
standard library. fastcore
provides multiple dispatch, with the
typedispatch
decorator (which is actually an instance of
DispatchReg):
@typedispatch
def _f(x:numbers.Integral, y): return x+1
@typedispatch
def _f(x:int, y:float): return x+y
_f(3,2.0), _f(3,2)
(5.0, 4)
This approach to dispatch is particularly useful for adding implementations of functionality in extension modules or user code. It is heavily used in the Transform class. A Transform is the main building block of the fastai data pipelines. In the most general terms a transform can be any function you want to apply to your data, however the Transform class provides several mechanisms that make the process of building them easy and flexible (see the docs for information about each of these):
- Type dispatch
- Dispatch over tuples
- Reversability
- Type propagation
- Preprocessing
- Filtering based on the dataset type
- Ordering
- Appending new behavior with decorators
Transform looks for
three special methods, encodes
, decodes
, and
setups
, which provide the implementation for
__call__
,
decode
, and setup
respectively. For instance:
class A(Transform):
def encodes(self, x): return x+1
A()(1)
2
For simple transforms like this, you can also use Transform as a decorator:
@Transform
def f(x): return x+1
f(1)
2
Transforms can be composed into a Pipeline:
@Transform
def g(x): return x/2
pipe = Pipeline([f,g])
pipe(3)
2.0
The power of Transform and Pipeline is best understood by seeing how they’re used to create a complete data processing pipeline. This is explained in chapter 11 of the fastai book, which is available for free in Jupyter Notebook format.
Contributing
After you clone this repository, please run nbdev_install_hooks
in
your terminal. This sets up git hooks, which clean up the notebooks to
remove the extraneous stuff stored in the notebooks (e.g. which cells
you ran) which causes unnecessary merge conflicts.
To run the tests in parallel, launch nbdev_test
.
Before submitting a PR, check that the local library and notebooks
match. * If you made a change to the notebooks in one of the exported
cells, you can export it to the library with nbdev_prepare
. * If you
made a change to the library, you can export it back to the notebooks
with nbdev_update
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for fastcore-1.5.18-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 45afb00951a1c3eb6e4ce4e40db4b147023c0362cfe6bfc4e08f1540b0e66a31 |
|
MD5 | a872c5a0105b09815ee8502a00b226cf |
|
BLAKE2b-256 | 5bc81720bb223915db8ca3ae77ed39876bfa241f67964c7eb9be164f41da7a20 |