News
Plauderkasten VZ-networks
12 Apr 2010 from bennenbroek
Snow-Sprint 2010: a diverse Outcome
18 Feb 2010 from lang
Our team needs growth
13 Jan 2010 from lang
Merry Christmas and a happy 2010
23 Dec 2009 from lang
Snowsprint 2010 - Registration opened
03 Oct 2009 from Manfred Schwendinger
Lovely Systems at EuroPython 2009
30 Jun 2009 from Andreas Feuerstein
Manunia Friedel taking care of business relations
29 May 2009 from Andreas Feuerstein
New version of a-z.ch is online!
06 Apr 2009 from Andreas Feuerstein
Jodok is Chief Technical Officer of studiVZ
02 Apr 2009 from Andreas Feuerstein
Lovely Twitter
The Decathlon of Computer Science
30 Mar 2007 from Jodok Batlogg, postet in Home, Plone, development, snowsprint and zope

or: “how to make zope/python at least 10 times faster”. Kudos to jukart, dobee, benji and j1m!

During the last weeks the lovely team was working hard on adding speed to some of our zope3 based applications. We were fighting problems that started with wrong bits in the EEPROM of network interfaces up to high level conceptual/architectural problems. That’s why I had the idea to write the story of our Decathlon :)
Our sites have pretty different “load profiles”. I’ll pick two of them:

Lovely Books
it’s a lovely books community site :) here are some stats: about 1.000 users, 82.000 books, 45.000 authors, 6.500 tags, 17.000 reviews,… in total a few million different objects in ZODB (and groooooowing).

  • Users are mainly accessing “the long tail”.
    The database is beeing accessed randomly.
  • Every user get’s a personalized view.
    As soon as the user is logged in, 99% of all pages can’t be cached.
  • We have a lot of pages online.
    Google has roughly 170.000 different pages in his cache, caching all the pages is pretty senseless.
  • Asynchronous Tasks might block application server threads.
    Calls to Amazons webservices,… block server threads while they are running.
  • Adding books, rating, commenting, tagging, making friends changes the relations, results, friends all the times.
    A intelligent solution for cache invalidation is needed

Videoportal
Is a local video portal with quite some traffic. A few month after the official launch we’re serving up to 60MBit/s videos and have roughly half a Terabyte Data online. The top 10 videos had been viewed more than 100.000 times.

  • Users are hammering top videos.
    The main traffic is static (video data) – and can be cached.
  • Logged in users get personalized pages.
    Sometimes it’s just the name of the logged in user on top of the page.
  • Live stats are important.
    We need to keep track of the number of videos viewed, and we need to keep track of it on a pro rata temporis basis.

“Premature optimization is the root of all evil.” (C. A. R. Hoare)

That’s what everybody tells :) And that’s what we did. But sooner or later it’s time to think (dedicated) about optimizations. Here’s Lovely Systems Decathlon:

100 m: lovely.viewcache 0.1
Let’s start with a sprint: At snow-sprint we started implementing a view cache for zope3. Just a view words about it. Lovely Portals are usually built with Viewlets. The image below shows an example for one viewlet. The idea was to store already rendered Viewlets in RAM and speed up the rendering of expensive pages. Yes. the speedup was great (iirc 2x)! Check it out: http://svn.zope.org/lovely.viewcache/trunk/src/lovely/viewcache/
Viewlet

long jump: from mappingstorage via tempstorage to memcached
As we are using multiple Zope Clients we need a ZEO based cache storage. First all we used the regular mapping storage to keep the cached viewlets.
But mappingstorage has no conflict resolution it is better to use tempstorage. It is available in the zope repository http://svn.zope.org/Zope/trunk/lib/python/tempstorage.

We still had lots of expensive conflict errors on the tempstorage, so we kept searching for a better solution and decided to switch to memcached. Memcached is memcached is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.”

Well – the result was impressive. After switching to lovely.memcache we were able to server roughly 50% more traffic with 50% less load…

Traffic

Load
Shot put: (Fighting) The hardware

A simple thing is to add more hardware – so we added a new cluster of 6 Boxes. Nice machines. Ubuntu 6.06 LTS, XEN Virtualization, Intel Dual-Cores, 4GB RAM, 3ware RAIDs, Intel E1000 Gigabit Ethernet, Linksys Switches. The setup in a few words: Reverse Proxies, PXE boot Zope Clients, ZEO Servers.

Wow! (do you think?) – Actually the performance was even worse :(
After several hours of debugging we found the our version of the Intel E1000 Server NIC caused the problems. The story is almost unbelievable: “Several NIC’s with the 82573 chipset display ‘TX unit hang’ messages during normal operation with the linux e1000 driver. The issue appears both with TSO enabled and disabled, and is caused by a power management function that is enabled in the EEPROM. Early releases of the chipsets to vendors had the EEPROM bit that enabled the feature. After the issue was discovered newer adapters were released with the feature disabled in the EEPROM.”

After setting some bits in EEPROM we had no problem to transfer 400-600MBit/s over the wire. (If someone from Intel is reading this – apologies and chocolate accepted :))

high jump: make as much asynchronous as possible

As mentioned, we’re getting pretty often data from external sites. The duration of Amazon queries ranges from several milliseconds to multiple seconds. Due to the nature of Zope the threads handling requests are blocked during that time.
The amazonasync module provides a way to do amazon queries ansynchronous to the zope threads.
There is another problem when doing amazon lookups, if we are doing to many queries in to short time it is possible that amazon doesn’t answer queries and locks us completely for some time.

We are using lovely.remotetask to do the amazon lookup. First we try to put the amazon tasks in one queue and therefore serialize all queries.
lovely.remotetask would also allow us to forward the queries to any machine which runs the amazon async service.

  1. browser request wants to do an amazon search
  2. a job is inserted into the remotetask
  3. the browser returns some ajax code to the browser which is polling for the result

That saved our lives threads once more :) There are now several places where we’re queueing asynchronous tasks in lovely.remotetask.

Btw: The status monitor Jürgen added to twisted is pretty helpful to diagnose the server :)

Threads

400 m: Counting in Zope

We still had conflicts. Especially when a lot of people were hammering the portal in parallel. The counting utility (every video view had to increment the counter) wasn’t able to handle the load. We invested a lot of time in resolving the conflicts and finally decided to switch to a mysql-based counting utility. lovely.counting uses python logging to make everything threadsafe and conflict-free. It can count hundreds of requests per second in an easy way:

>>> import lovely.counting
>>> counter = lovely.counting.getCounter()

>>> counter.count(intid)

>>> counter.mostViewed(fromTime, toTime, limit=10)

110m hurdles: through the ZODB and back

we still suspected that somewhere in the ZEO protocol something was wrong:

  • requests lasted very long, although cpus are idle
  • after startup the first requests took very, very long (loading large btrees)
  • tcpdump shows only a little bit traffic between zope and zeo (1 mbit out of 400 :))
  • apache keeps a lot of connections open (Hitting MaxClients setting)

Benji and Jürgen were tcpdumping and etherealing wiresharking several hours until they found, that a request never lasted less than 10ms!!!
After further investigation we tracked the problem down to ZEOs connection.py. Although it stated it should wait for only 1ms, the Zope client was always waiting 10ms!!!

And the reason is inside the Linux Kernel: The default kernel frequency of our Ubuntu Server (and OSX and a lot of other systems) is 100Hz – that means that the system has only a granularity to make “10ms steps” (You can read some background discussion at kerneltrap.org).
We fixed the problem three ways :):

  • Oliver an me built a kernel with a frequency of 1000Hz
  • Jürgen introduced nanosleep to wait less than 1ms

to describe the third measurement i need to start the next discipline :)

discus throw: Let J1m fix it the right way :)

“Removed a needless timeout to a condition wait call. Using timeouts can cause signidficant delays, especially on systems with very course-grained sleeps, like most linux systems. This change makes the ZEO tests run about 25% faster on an Ubuntu desktop system. We suspect the production impact to be much greater, at least on some systems.
Removed some non-async code, now that we no-longer have a non-async mode.”
THAT added a huge performance gain!!! WOHOO!

pole vault: fixing the hardware, part #2

i’m totally bored about hardware, but this needs to be mentioned too: While PDBing we always experienced TCP CRC errors which was really strange – after even more googling it turned out that it is wise to switch off some features of the network card. what a shame!
‘ethtool -K tso off; ethtool -K tx off’

There are two disciplines left, and believe me, we’re working on it right now:

Javelin throw: Let apache or the client assemble the page.

It turns out that we need to assemble the pages within Apache – or if possible (Google search) even with Ajax within the Browser. This makes it possible to cache Viewlets / Snippets with Squid or Varnish in an unassembled state.
Stay tuned for the first results.

1500 m: Using PGStorage as Database, multiple writers

We (to be correct: dobee) invested a lot of time to polish the PGStorage layer (instead of ZEO). We’re able to export the ZODB and import the entire data into PostgreSQL (and back). Right now the speed looks really promising.
Stay tuned too :)