Some basic Python Optimizations


For CPython, if you want performance you should prefer algorithmic optimizations (i.e. find better algorithms for what you're doing) or, if that's not possible, rewrite parts of your code in Cython.

You should generally aim for readability, and avoid micro-optimizations - except in cases where it's the same, like comprehension expressions: they are both more readable and faster (special-cased, emit less instructions than loops because there's no possibility of breakcontinue or return - compile.c, Ctrl+F, "List and set comprehensions").

You should always profile your application before optimizing, and you should profile with real (or realistic) data and in an as complete state as possible - concentrate on optimizing stuff that actually matters, not stuff you think matters.

You have been warned. On to the fun part.

  1. >>> from dis import dis
  2. >>> def fn(a):
  3. b = a
  4. >>> dis(fn)
  5. 2 0 LOAD_FAST 0 (a)
  6. 3 STORE_FAST 1 (b)
  7. 6 LOAD_CONST 0 (None)
  9. >>> dis(compile("b = a", "wtf", "exec"))
  10. 1 0 LOAD_NAME 0 (a)
  11. 3 STORE_NAME 1 (b)
  12. 6 LOAD_CONST 0 (None)

As you can probably guess, LOAD_FAST and its counter-part STORE_FAST are faster than LOAD_NAME and STORE_NAME. The reason is that local variable (and parameter) access in functions is optimized - instead of using full dictionary lookups it uses integer offsets; during compilation all local variables must be known for this offset table to work, so you can't dynamically add local variables to functions.

About why a = a + 1 is faster than a += 1:

  1. >>> def fn(n):
  2. n = n + 1
  3. >>> dis(fn)
  4. 2 0 LOAD_FAST 0 (n)
  5. 3 LOAD_CONST 1 (1)
  7. 7 STORE_FAST 0 (n)
  8. 10 LOAD_CONST 0 (None)
  10. >>> def fn(n):
  11. n += 1
  12. >>> dis(fn)
  13. 2 0 LOAD_FAST 0 (n)
  14. 3 LOAD_CONST 1 (1)
  16. 7 STORE_FAST 0 (n)
  17. 10 LOAD_CONST 0 (None)

So the only difference is INPLACE_ADD versus BINARY_ADD (this has nothing to do with binary numbers, it just means "add for two arguments").

These instructions are defined in ceval.c - to spare you the reading, they only differ in that for numbers one calls PyNumber_Add() while the other calls PyNumber_InPlaceAdd(). Those two are defined in abstract.c, where the main difference is that they call either binary_op1 or binary_iop1, defined in the same file.

binary_iop1 will check whether the object supports in-place operations (the existing object is modified instead of a new copy created and returned), but the answer in the case of integers (intobject.c) and floats (floatobject.c) is "no" (nb_inplace_add is zero - a new object is created, except for integers with values between -5 and 257, which are reused from a pool, but this is transparent to this code).

However, this won't be determined before a couple of pointer dereferences and conditional branches, until execution goes back to the non-inplace version. This is the overhead you are seeing (which is less than 5% on my computer, sometimes I even get the same results).

Using a similar workflow, you can probably answer the rest of your questions yourself - just use the dis module to disassemble the Python code, then check out how the instructions are handled in the source code of CPython at GitHub.

It's getting late, so I'll just leave this here and go to bed:

  1. static long
  2. int_hash(PyIntObject *v)
  3. {
  4. /* XXX If this is changed, you also need to change the way
  5. Python's long, float and complex types are hashed. */
  6. long x = v -> ob_ival;
  7. if (x == -1)
  8. x = -2;
  9. return x;
  10. }

  1. >>> [hash(n) for n in range(-4, 5)]
  2. [-4, -3, -2, -2, 0, 1, 2, 3, 4]

Guest Author

Vladislav Zorov

programming enthusiast.
Lives in Bulgaria


Popular posts from this blog

List of Selenium Demo Websites for Practice

How to learn Selenium Webdriver on your own online