The last time Hackerfall tried to access this page, it returned a not found error. A cached version of the page is below, or clickhereto continue anyway

Why And How I Switched From Python To Erlang

WHY AND HOW I SWITCHED FROM PYTHON TO ERLANG

A PYTHON PROGRAMMER’S ADVENTURE

Version 1.1 | 7 Jul 2016

Farsheed Ashouri | CTO/RashaVas LTD.

Summary

15 Years from Day 1

Choosing The Right Framework

FULL STACK PYTHON Frameworks Are Honey Pots

PYTHON Frameworks Are Also Honey Pots!

I Don't Have Any Of These Problems!  Go On…

I have solved all of these issues. What’s wrong with it?

Conclusion

Summary

In this article, I am explaining my trip from Python to Erlang.  If you are not a Python developer (+ probably with a deep understanding of Python based web services), or you don’t need/want to scale thing to a very huge scale,   you won’t be able to find this article much useful. If you don’t want to develop an infrastructure for your business or If you develop simple blogs, small asset management systems or  Hello World-ish websites, This article won't help you at all and If you are about to choose a language to start,  please do not decide based on my words.  I am going to tell you what problems I encountered using Python and how Erlang is able to solve those specific problems for me.

I will start with a brief history and I will end with my conclusion.  If you find yourself suffering from my reasonings,  let’s discuss about it!  That's the whole purpose of sharing my experience.

15 Years from Day 1

I started programming with MEL (Maya Embedded Language).  Then I found a job and got my first pay check.  Quickly I switched to Python for more serious development opportunities and  I completed reading K&R for developing extensions for Python with C.  Years past and I was increasingly interested in Web stuff.   I quit Animation industry for good (my demo) and I was employed in a famous Tech company here in Tehran.

Recently I have developed Appido.IR,  A Video/Music streaming service with Python.

Lets explain what my problems were.

Choosing The Right Framework

Everyone loves Django.  I hate it for no reason!  Probably because I helped Massimo Developing Web2py or maybe simplicity of Web2py made it impossible for me to select another full stack framework.  I finally fully tried Django later on a boring project.

FULL STACK PYTHON Frameworks Are Honey Pots

So, what's wrong with Django or even Web2py?  Nothing!  Until you start using Bottle/Falcon with some templating engines and database ORMs [you read Mako and SQLAlchemy] and you feel like those enterprise frameworks are very slow.   For a simple RESTful API service,  you have to waste your CPU cycles for no good reason.   And for complex API services,  you need to find a way to jailbreak and  put together a completely new architecture inside your so called full-stack framework.  Here is an example:

In Appido.ir Streaming Technology,   we implemented Dash protocol with the help of FFMPEG and tons of other open-source tools.  Appido has it’s own OAuth2 server,   an Authorization system,  a workflow engine,  scheduling,  monitoring,  logging,  debugging, reporting,  accounting, ...  You name it!  After few weeks of RnD and struggling with Web2py/Django,  It was obvious it’s not going to work with a single framework.  Creating some folders and trying to be MVC or having a database admin panel in BIG FRAMEWORKS  won’t help you scale!  And finally some of those unwanted features will bit you! Sooner or later.  Yes, there is some honey there.  But you must be careful of angry bees.  So I invented my own Falcon based framework.

PYTHON Frameworks Are Also Honey Pots!

So you start using Bottle/Falcon/Flask… And you find yourself in need to install task queues and scheduling modules (Celery, RQ for example).  Why do you need it?  Because every request that takes more than 500ms in web 2.0 needs to be stateful! Thats an unwritten rule. You need to give your users status about their long processing requests.   Your clients can’t wait for your calculations.  You need to put the heavy burden of sending emails,  converting images, etc on Celery (or your own invented customized multi-process queue).  What’s wrong with it?  Let's see:

Imagine you have a streaming service.  Client uploads 20Gb 4K raw video files, you convert files to 10 different resolutions  and email him/her that results are ready.

You use celery with 40 workers.  Videos start to convert and your server overloads and gets slow and slower.  So you find a genius solution!  Install another server,  with streaming code and tools on it using Celery as a worker.  Cool, Problem solved!  NO! Not so fast!!  In the middle of the night,  You find out that 5 or your 6 servers have 0 CPU usage and the 6th has 100%.  WHY?  It turns out the Redis has a timing problem with Celery that prevents workers to pick jobs.  (It has a solution of-course: Welcome to world of fanout-patterns and visibility-timeouts),  Installing RabbitMQ would solved your problem (And would introduce other new strange problems).  (If you developed such a system and did not find any of mentioned problems,  Congratulations! You are the luckiest guy on earth).  You found honey without getting bit by bees.  

I Don't Have Any Of These Problems!  Go On…

So your service is working great and what you need is to increase your web service RPS (Requests Per Second) either by increasing WSGI workers or using Tornado/Gevent and finally load balancing using HAProxy or Nginx with some Varnish Cache help.   Perfectly OK solutions.   You also start noticing that using SQLAlchemy slows your queries (beside SQLAlchemy`s extreme complexity).  Writing Raw SQL commands fixes your problem.  Let's see a simple example with a Postgres database with few million records:

def pure_python():    max_per_task = db.DBSession.query(        Version.task_id, func.max(Version.version_number).label('max'))\        .join(Task)\        .filter(Task.project_id == proj_id)\        .group_by(Version.task_id)\        .subquery()    return Version.query\        .join(max_per_task,              tuple_(max_per_task.c.task_id, max_per_task.c.max) ==              tuple_(Version.task_id, Version.version_number))\        .all()def simple_sql():    sql = """    select        max("Versions".id) as id    from "Versions"        join "Tasks" on "Versions".task_id = "Tasks".id        join "Projects" on "Tasks".project_id = "Projects".id        where "Projects".id = %s        group by "Versions".task_id, "Versions".take_name    """ % proj_id    conn = db.DBSession.connection()    result = conn.execute(sql)    return result.fetchall()

def complex_sql():    sql = """    select        "Links".id as id,        "Links".full_path as full_path    from        (select            max("Versions".id) as id        from "Versions"        join "Tasks" on "Versions".task_id = "Tasks".id        join "Projects" on "Tasks".project_id = "Projects".id        where "Projects".id = %s        group by "Versions".task_id, "Versions".take_name) as "sub_versions", "Links"    where "Links".id = "sub_versions".id    """ % proj_id    conn = db.DBSession.connection()    result = conn.execute(sql)    return result.fetchall()

And Results

pure_python: 3.284 sec        simple_sql: 0.228 sec

complex_sql: 0.463 sec

Are Impressive!   Hope you are getting the point.

I have solved all of these issues. What’s wrong with it?

have you?  Well,  you are trying to add Python code  some Erlang features. None of those distributing/scaling problems even existed in Erlang.  Finding solutions for scalability problems are common in Python world.  For scaling, you need distribution and for distribution, you need well architectured service oriented systems with interoperability.  You need to be failover and fault tolerant.  And I accept the fact that all of these features are possible with Python.  But greatness comes with a price.  You need to circumvent Python problems.  You need to right raw SQL and create custom indexes (may be materialized views).   Python’s global interpreter lock is not a trivial problem.  Shared states in general are useful for quick starts but will sometimes end with disastrous results.  Beside that.  For every real world calculation,  you need to write extension for Python (Native C API/Swig/Cython or maybe pypy).  

Erlang is not fast as C, but it’s distributed model makes it fairly easy to write programs to harness every single core in your datacenter. (If you know NIFs, then you're golden).   Connecting to databases in Erlang requires SQL knowledge. (Same as Python)

How I Switched?

        I started Investing and profiling my code.   My goal was to make the service horizontally scalable and decrease system complexity.   I start to develop the ticketing system in Erlang.  Using Cowboy as my first candidate.  The result was awesome.  Erlang need ZERO configuration to be ready for production.   No monkey patching needed.   No Celery needed.  By simply spawning few processes to create a simple event clients/servers:

loop(S=#state{}) ->

   receive

       {Pid, MsgRef, {subscribe, Client}} ->

           run_func_1();

       {Pid, MsgRef, {match2, Client}} ->

           run_func_2();

         

       ........

   end.

Understanding monitors and links is the key to control your processes.  But once you master those concepts,  Nothing can bring down them.  Also creating named nodes on different servers and connecting them together is very natural:

{ evserver, 'second@appido'} ! { self(), "do_the_task_then_call_me" }.

What you need is few named erlang nodes (port 4369 must be open).  There are plenty of modules out there to help you migrate to Erlang.  

I also studied tons of benchmarks showing that mathematical calculations  are quite slow in Erlang.  That somehow true (as for Python),  but as I mentioned above,  when you try to do intensive arithmetic calculations, C/OpenCL might be your best friend.

Conclusion

If you need to create a cloud service,  or a scalable system for thousands of users, USE RIGHT TOOLS.  Python is great for quick tests and mock ups.  It’s great for you to get the contract!  You can develop complex ideas over night using Python. It Good for big projects with hundreds of users.  But when it comes to huge scalable systems,  I personally prefer to stop bothering myself and my business by trying to fix a wrong problem created by wrong tools.

To cut a long story short,  Use Python to win the contract,  Use erlang to get the job done.  

Farsheed Ashouri  <rodmena @ me.com>

Chief Technology Officer,

RashaVas Telecom LTD.

 

Continue reading on ourway.ir