« Dezember 2009 | Main | Februar 2010 »

25.01.10

HTML5 Video - Introduction and Commentary on Video Codecs

There is a raging discussion out there on the web about the upcoming HTML5 standard and the inclusion of the video tag. Not about the tag itself but about the codec used by videos played inside that tag.
There is a firestorm by free software advocates that want the only codec to be used inside this tag to be the -largely- patent free open source Theora Codec - the other side wants the ubiquitously used high quality H264 video codec. I think I can weigh into that debate. If you donīt want to or know already about codecs, containers and its history jump below to "my take on the codec war".

I am a content producer have been following video on the web since the very very beginning, have advised firms how to handle video on the web, have struggled countless of hours trying to find the best solution to put video on the web and have so far refused to use flash to display video on the web. I always believed that the web should be fundamentally free of technology that is owned by one company that then can take the whole web hostage to their world domination plans. I have hoped that the video tag would be introduced much earlier in the game and have looked with horror to youtube & co. how they made adobe - a company who has basically stopped innovating 10 years ago - a ruler of the web when it comes to moving pixels.
Now this is finally about to change - or at least that is the intention by google, apple, mozilla and others who are pretty fed up with flash for very obvious reasons (its slow, development sucks, its proprietary, the source code of the creations is always closed, its slow, its slow as fuck, it eats energy from the processors like nothing else). It never really made any sense to put a container inside a container inside a container to display a video - the second most powerhungry thing you as a consumer can do on your computer (the first would be 3d/gaming).
Yet a video is not a video. A video to fit through the net needs to be compressed - heavely. Compression technology is nothing new but it evolves over years and years. Its always a tradeoff between size, quality and processing power. The "Video" codec by apple - probably the first "commercial" codec available to a wider audience looks rubbish but is insanely fast (it utilizes almost no processor on a modern machine) and the file size is pretty alright. It was capable to run video on an 8 Mhz processor mind you.
Over the years lots and lots of codecs have sprung up - some geared toward postproduction and some towards media delivery - there is a fine line - for the postproduction codecs you need full quality and try to save a bit of storage. Its videos that still need work and you want to work with a mostly uncompressed or losslessly compressed video. Processing power for decrompression is an issue because you need to scrub through the video - also compression porcessing power (to make the video) you donīt want to take ages because you like to work in realtime and not wait for your computer to re-encode a video just because you clicked on a pixel.

The other codecs are the "end of line" codecs - delivery codecs - made to compress the crap out of the video while "visually" loosing the least amount of quality and having the smallest possible file sizes. Here it doesnīt matter how long the compression side takes as long as the decompression is fast enough to work on low end computers to reach the largest available audience.

While production codecs are fairly fluid - people change as soon as a better becomes available - takes less then 5 month to have a new codec established (recently apple released the ProRes4444 codec - most postproduction companies are already using it (those that donīt use single pictures - but that a whole different story) - the delivery codecs are here to stay for a very very long time because in the time of the web people just donīt reencode their stuff and reupload it - if its there its there.

Now before I go into the format war and my take on it there is one more concept I need to explain shortly - containers. Flash is a container for a video with a certain codec displayed in it. So is quicktime, so is windows media so is Real Media. It gets confusing because MP4 can be a container and a codec at the same time. A container just hold the video and adds some metadata to the video itself - but the raw video could be ripped out of the container and put into another without re-encoding. This is what apple is doing with youtube on the iPhone. Adobes last "great" innovation (or best market move ever) was to enable the Flash container to play h264 (a codec) encoded videos. Since Apple (among everybody else who isnīt a flashy designer) thinks that flash sucks they pull out the video from inside the flash container and put it into the (now equally bad) quicktime container and so you can enjoy flash free youtube on your iPhone.
Now with the technicalities out of the way whats all the fuss about?
HTML5 promises - ones it becomes a standard - to advance the web into a media rich one without bolted on add ons and plugins that differ from platform to platform and browser to browser - its a pretty awesome development or most people ever developing anything on the web. Part of the process to make this the new standard is to involve everybody who has something to say and is a shaker and mover on the web to give the direction this standard is going. Its a tough rough ride - everybody and their mother wants to put in their tech their knowledge their thinking - I really would not want to be the decision maker in this process if you gave me a trillion.
The biggest and most awesome change in HTML5 - and the one the most abvious to the end user - will be the inclusion of media content without a freaking container that needs a plugin to display that content that only half or less of the internet population have. To make this happen at least all the big browser makers need to approve what can be played inside the new tags (video & audio).
This is where the debate heats up. I really donīt understand why audio doesnīt spurn the same debate publicly as does video - but its probably because google is involved with the video debate and can change the direction completely on their merit with whatever they choose to support on youtube.
The two competing codecs are Ogg Theora and H264. Now I am less familiar with the Ogg codec (but have tried it) but first a small history of H264. Back around 2000-2001 a company called Sorenson developed the first video codec that was actually usable on the web - there where different ones before but they all sucked balls in one of the departments for a great delivery codec. Sorenson made a lot of video people who wanted to present their work on the web very happy. Apple bought in and shipped quicktime with the Sorenson codec and the ability to encode (make) the video with this codec - albeit with a catch. To really get the full quality of Sorenson you had to buy a pro version - which costs a lot of money - the version that Apple included could play Sorenson (pro or non pro) just fine but the encoder was crippled to one pass encoding only. The real beauty and innovation was with two pass encoding - basically the computer looks at the video first and decides where it can get away with more compression and where with less.
Apple and the users where not really happy with this situation at all. So for a long time (in web terms) there was no real alternative to that codec. The situation was even worse because to play Sorenson you had to have Quicktime installed - before the advent of the iPod a loosing prospect - I think they had 15% installed user base on the web. It was the days of the video format wars - Microsoft hit with Winows Media (which sucked balls quality wise but had a huge installed user base) and on top Real Media (which was the only real viable solution for streaming video back then).
In the meantime another phenomenom happened on the audio side - mp3 became the defacto standard - a high quality one at that (back then) in the audio field. We the video people looked over with envy. When producing audio you could just encode it in mp3 pretty much for free on shareware apps and upload it to the web and everybody could listen to it. There was nothing even close happening on the video side. The irony is of course that its MPG1 layer3 (mp3) - part of a video codec - but the video codec side of MPG1 sucked really really really really bad. Quality shit, Size not really small only processor use was alrightish but not great.

Jumping forward a couple of years (and jumping over the creation of MPG2 - the codec used for media delivery on DVDs - totally unsuitable for web delivery) the Motion Picture Expert Group - a consortium of industry Companies and experts that developed (and bought in) MPEG1 and MPEG2 decided to do something for cross platform standard video delivery and created the MPEG4 standard (overjumping MPEG3 for various reasons - mostly because of the confusion with MP3 (MPEG1 layer 3). MPEG4 is a container format - mostly - but it had a possibility for reference codecs and the first one of these was H263 - this already was quality wise on par with Sorenson yet in a container that was playable by Quicktime and Windows Media - the two last standing titans of media playback (by this time Real Media mostly had already lost any format war). Great you think - well not quite - Microsoft wasnīt enourmously happy and created its own standard (as they do) based on MPG4 H263 called VC1 (I am not really familiar with this side of the story so I leave you to wikipedia to look that up yourself if you are so inclined). Web-video delivery was still not cross platform sadly and the format war became a codec war but there was now a standard underlying all of this and the quality - oh my the quality was getting good. Then the MPEGroup enhanced the h263 codec and called it h264 and oh my this codec with a pure godsend in the media delivery world - it was great looking scaled up to huge resolution could be used online, streaming and on HighDefDVDs and in the beginning it all was pretty much for free.
It looked like an Apple comeback in the webvideo delivery because Quicktime was for a while the only container that could play H264 without problems. Around that time flash started to include a function in its webplugin to play video - interestingly enough they choosed to include sorenson video as the only supported codec - word on the street was that Sorenson was very unhappy with Apples decision to ditch them as codec of choice and instead pushed H263/H264. Now the story could have ended with Apple winning the format war right there and all computers would have quicktime installed by default but it didnīt because out of nowhere Youtube emerged and Youtube used flash and Youtube scaled big time and made it - for the first time ever - really easy for Joe the Plumber and anybody else to upload a video to the web and share it with the rest of the world family. It changed the landscape in less then 6 month (I watched it it was crazy). Now you had a really good codec finally as a content producer to upload video in very good quality but the world choosed the worse quality inside a player that sucked up 90% processing power with the codec of choise needing another 90% and all that came out was shit looking video that everybody was happy to be over - but the user experience of hitting an upload button and have everybody watch your video was just unbeatable. Eventually just when people realised how bad these videos looked compared to some Hollywood trailers that still used quicktime and H264 Adobe included H264 into flash and prolonged their death by doing so again (without innovating at all it must be said).
Now fast forward to now - again a group of clever people, big companies and such have sat down to bring us HTML5 and the video tag. That tag as said is going to rid us from any plugins and containers and instead just plays pure video as fast as possible right inside the browser that supports HTML5. Now the problem is that people can not agree on the codec to be used. Why you ask if H264 is so great? Because H264 was developed by a for profit group of people and they want to make money and they have freaking patents on it - not that this has hampered web video to this day in any way - but for the future standard it seems lots of people have taking offense to that. There is in fact a whole ecosystem of alternative codecs (audio and video) in the open source world and the most prominent is Ogg and its video incarnation Theora. They are mostly patent free because the company who originally developed these codecs gave the patents to the open source community (yet its still not clear if that covers the whole codec). Now what happens when patents enter the WorldWideWeb could be seen with GIFs. The GIF graphics (moving or non moving) where once a cornerstone of the web - a more popular choice for graphics then anything else (small could be read by anything blahblah) then a company found out that they had the patents on that (luckily just shortly before they run out) and sued a lot of big websites for patent infringement and wanted to have royalties of $5000 from every website that used GIFs - they would have killed the web with that move (and they where in the right - law wise) if the big companies they sued first would have dragged out the court case until the patents run out - now the GIF file format is in public domain.
Now its understandable that this lesson should be learned BUT and here is

my take on the codec war:

Flash is a MUCH bigger threat then patents on the codecs used. Because not only does it use the patent infringed codec inside its container but the container is totally owned by one company and a company that has shown often (PDF) that it will do everything to take control of anybody using their technology - even if it is the whole world.
Now 95% of all web videos are delivered by flash these days and to change that a lot of things need to happen. First google needs to drop it on youtube - they just announced a beta which does just that - but even with googles might its just not enough - content producers need to hop on board as well. And here is where the chain breaks for the "free and open codecs of OGG". See from the history above H264 has been the industry standard on a wide range of devices including the web for years now. The whole backend has settled on this and there are really good workflows to create H264 video. With the video tag - Youtube is less relevant then it was at its beginning because all of the sudden its easy to incorporate video into your webpage. Now if google where to say "we use Theora only" high quality content producers would just say "fuck you" and post their videos on their own sites in a much better quality without the hassle to find any workflow to produce theora videos (for the non terminal using people there just isnīt an easy way to do that still that can be used in a professional non frickly environment - we like to create not code for a delivery sorry).
But thats not enough - almost ALL consumer cameras released over the last 2 years including the hot shit DSLRs with video functionality produce H264 that can be "just" uploaded to the web without reencoding - thats saves Youtube and vimeo a lot of processing capabilities - and with their lossy revenues they sure donīt want to add another server farm just to reencode every and all video they have to a codec that has worse quality. You know 90% of all videos on the web are already encoded in H264 as of now (and Theora maybe has 0.2% of the other 10% that are left over). Its uneconomical and not sensible to re-encode all of that any way you look at it - especially since the quality is not surpassed by any other codec out there - patent free or not.
I would say go H264 now and have a new consortioum of browser developers and other companies develop a new codec (or build upon theora) from scratch that is patent free AND high quality AND has a good workflow (means is supported by hardware vendors and OS vendors across the board). That can then take over H264 (just like PNG took over GIF in less then 2 years following the patent threat). Leave the codec question open for now and let the web sort it out for itself (for the moment) - like in the img tag - doesnīt matter if you put PNGs, GIFs or JPGs in it (or any of the other plethora of IMG formats) as long as the browsers support it its watchable and so far has shaken out a good road to take (see the switch to PNG with transparencies that also helped a lot to bring down IE5 in my opinion which didnīt fully support that - so the market sorted it out quickly (as in 5 years quickly)).
BTW the only browser that just does not want to go down that route (and rather wants to cripple itself with Flash in the meantime) is the oh so open minded Firefox. Sorry I fail to see your point dear Mozilla developers - you are not gonna make a lot of friends that way outside of the very very small open source community (and even here your approach is not liked universally by those who can not install a flash plugin for example because flash is not supported on their platform (ppc linux f.e.).

Get rid of Flash first then get rid of H264 later when you have something equally good on all accounts. Going backward with technology is just never the way forward - open source or not.

16.01.10

PTEX&OpenCL - or How Steve Jobs Companies Are Changing 3D

Something amazing came through my ticker today something that is a game changer and together with another technology will change the way I work and make much more enjoyable.

First some basics to understand what I am talking about for those that have no clue about it all. There are basically the following six steps to get to a final 3d picture.

1. Modelling: Multiple Approaches get you to a mesh model that consists of - in the end mostly - polygons. You can scan you can push little points in 3d space you can use mathematical formulas to create substract or add simple forms or other formulas to make edges round revolve lines or extrude other lines. The end product is mostly always a polygon mesh with x amount of polygons - the more the higher the resolution of the model the closer you can look at it. About ten years ago a really nice way to model high resolution meshes came into existence called SubDivision Surfaces which lets you model a corse resolution model which is much easier to understand and alter and then generate a highres model out of it - that was the first real game changer in the 3d industry and the reason why character modelling became so "easy" and so many people doing so many great models.

2. UV Preperation: Now a model out of triangels looks less then realistic of course so you need to tell the programm what kind of material is on the model - here a lot of option are available - but especially for film work and characters you want something that is realistic and you do that by getting something realistic - like a photo - and alter it in such a way that it fits on your model - or you paint from scratch - now that such a picture can be put onto the model you need to flatten out the model into a two dimensional surface. You can imagine this like taking a dead animal and skinning it then make the skin of the animal flat. Like so:
hide.jpg
(there is actually a programm that stretches the "hides" pretty similar to this very analog process). Its a very dull process to do this on a complex model - mostly you have to take your nice model apart and do all kinds of voodoo to get it artifact free. No fun and certainly not really creative.

3. Texturing: Ones you have your nice model with a more or less nice UV map you start to apply your texture - photo or programatic or a mixture of that. Here is a lot of "fun" to be had as you add little bumps, tell the software how shiny the model will be how reflective how refractive and lots of other things I donīt really want to go into - but its a nice step in general. +

4. Light & Camera: Without light there wouldnīt be anything visible. So you set up some virtual lights which act and react just like different kind of light sources you find in reality + some more other tricks that arenīt in reality but can add to a realistic picture. You also set up a camera or your virtual eye - which again acts just like a photographic camera in real life (almost). Both a creative and fun process.

5. Animation: Then you animate your model - push objects around, apply physics, deform your model. You can either do that by hand or get some animation data from MotionCaputure - like you might have seen these people with a black suit and pingpong balls attaced to them - or faces with dots all over them for example. This step is both fun and frustrating - with hand made or captured data. The human eye is so susceptible to small problems in movement that to get it realistically convincing not even a certain 500 Mio. Dollar production can fully perfect this step.

6. Render: Then comes the process that is mostly free of human intervention but not free of hassles and frustration. The rendering. Can take up to 50 hours per frame in Avatar on a stock normal computer. 24-25 frames per seconds (or in case of 3d double that) and you get an idea how much processing power is needed. And if you do a mistake - render it all over again. Also rendering is a complex mathematical problem and there are bound to be errors in software so prepare for the worst here.

Now why I am telling you all this? Well one step it seems has just been eliminated. Progress in the field of visual effects is very eratic - you have 2-4 years no progress at all and then all of the sudden a floodgate opens and something dramatically changes or multiple things. I would say we had a quit period the last 2-4 years - mostly because the development of the real cool stuff was "inhouse" meaning - that really smart programmer people where hired by the big VFX companies to program them certain things for certain needs and problems - a lot of problems in the above pipeline are already solved I think but have never seen the light of the broader world and instead stayed and sometimes even died within certain companies. Its really frustrating as the software companies struggled with the most basic problems (plagued by slow sales and a bad economy) and then you see Pirates of the Caribbean for example and they completely figure out how to motion capture live actors (record their movement) on set with no real special equipment - that technology is still only available behind the looked doors of Industrial Light & Magic. For me as an artist that is a valuable tool that has been created and I could do cool stuff with it but I canīt get my hands on because of corporate policies.
So its REALLY amazing to see that Disney - the intellectual property hoarding company for whom the copyright law has been rewritten at least ones - is releasing a software/API/Filestandard as open source as of today. Code that no less promises to completely eliminate step two of my list above. In their own words they have already produced one short animation and are in the process of one full feature animation completely without doing any UV mapping. I can only try to explain to you the joy that this brings to me. UV mapping has been my biggest hurdle to date - I never really mastered it - I hated it. Its such a painstaking long tedious process. I normally used every workaround that I could find to avoid doing UV mapping. Its crazy to think they finally have figured out a way to get there without it and I think this will drop like a bomb into every 3d app and supporting 3d app there is on the market within a year (a wishfull thinking here) - at least I can hope it does and I hope that Blender, Autodesk, sideFX are listening very closely.
Combine that with the recent advancement of render technology by using OpenCL (developed and released as part of SnowLeopard by Apple and made an Open Standard with ports for Linux and Windows now available) and render partially on the graphic card (GPU) - which speeds up rendering up to 50 times. That means that a frame from avatar takes only one hour to render instead of 50 - or in a more realistic case - current rendertime for an HD shot takes here 2-5 minutes an average to render - thats cut down to 10sec - 1min and would actually make rendering a fun part of the process.
Now we all know who is behind both companies releasing and opening that up: The mighty Steve Jobs. You could almost say there is an agenda behind it to make 3d a way more pleasurable creative work then it currently is - maybe Mr. Jobs wants us all to model and render amazing virtual worlds to inhabit where he can play god ;)
Good times indeed.

Whats left? Well animation is still not worked out completely but with muscle simulation and easy face and bone setups it has become easier over the past years - still hidousely tidious process to make it look right - donīt know if there ever is a solution for it that is as revolutionary as PTEX. Motion sensors might help a bit in the short future also some techniques that make the models physically acurate so that things canīt get into each other and gravity is automatically apllied. High quality texture maps that hold up to very very close scrunity are still memory hogs and burn down the most powerfull workstations. The rest will get better with faster bigger better computers as always (like all the nice lighting models that are almost unusable in production to date because they render too long). Generally we are so much further with UV mapping and rendering problems out of the picture I might get back into 3d much much more.

ptex.us - the official website
The PTEX white paper
PTEX sample objects and a demo movie

Disclaimer: I have been doing 3d since 1992 when I rendered a 320x240 scene of two lamps on an Amiga 2000 with raytracing - it took 2 days to render. My first animation in 1993 took a month to render. Then I switched to Macintosh (exclusively) in 1995 and did 3d on them for a while. It was so frustrating that I did not make a serious efford to get really good at it ever - now I am still doing it alongside Compositing / VFX Supervision but rather as add on & for previz then main work.