HTML5 Video - Introduction and Commentary on Video Codecs
There is a raging discussion out there on the web about the upcoming HTML5 standard and the inclusion of the video tag. Not about the tag itself but about the codec used by videos played inside that tag.
There is a firestorm by free software advocates that want the only codec to be used inside this tag to be the -largely- patent free open source Theora Codec - the other side wants the ubiquitously used high quality H264 video codec. I think I can weigh into that debate. If you donīt want to or know already about codecs, containers and its history jump below to "my take on the codec war".
I am a content producer have been following video on the web since the very very beginning, have advised firms how to handle video on the web, have struggled countless of hours trying to find the best solution to put video on the web and have so far refused to use flash to display video on the web. I always believed that the web should be fundamentally free of technology that is owned by one company that then can take the whole web hostage to their world domination plans. I have hoped that the video tag would be introduced much earlier in the game and have looked with horror to youtube & co. how they made adobe - a company who has basically stopped innovating 10 years ago - a ruler of the web when it comes to moving pixels.
Now this is finally about to change - or at least that is the intention by google, apple, mozilla and others who are pretty fed up with flash for very obvious reasons (its slow, development sucks, its proprietary, the source code of the creations is always closed, its slow, its slow as fuck, it eats energy from the processors like nothing else). It never really made any sense to put a container inside a container inside a container to display a video - the second most powerhungry thing you as a consumer can do on your computer (the first would be 3d/gaming).
Yet a video is not a video. A video to fit through the net needs to be compressed - heavely. Compression technology is nothing new but it evolves over years and years. Its always a tradeoff between size, quality and processing power. The "Video" codec by apple - probably the first "commercial" codec available to a wider audience looks rubbish but is insanely fast (it utilizes almost no processor on a modern machine) and the file size is pretty alright. It was capable to run video on an 8 Mhz processor mind you.
Over the years lots and lots of codecs have sprung up - some geared toward postproduction and some towards media delivery - there is a fine line - for the postproduction codecs you need full quality and try to save a bit of storage. Its videos that still need work and you want to work with a mostly uncompressed or losslessly compressed video. Processing power for decrompression is an issue because you need to scrub through the video - also compression porcessing power (to make the video) you donīt want to take ages because you like to work in realtime and not wait for your computer to re-encode a video just because you clicked on a pixel.
The other codecs are the "end of line" codecs - delivery codecs - made to compress the crap out of the video while "visually" loosing the least amount of quality and having the smallest possible file sizes. Here it doesnīt matter how long the compression side takes as long as the decompression is fast enough to work on low end computers to reach the largest available audience.
While production codecs are fairly fluid - people change as soon as a better becomes available - takes less then 5 month to have a new codec established (recently apple released the ProRes4444 codec - most postproduction companies are already using it (those that donīt use single pictures - but that a whole different story) - the delivery codecs are here to stay for a very very long time because in the time of the web people just donīt reencode their stuff and reupload it - if its there its there.
Now before I go into the format war and my take on it there is one more concept I need to explain shortly - containers. Flash is a container for a video with a certain codec displayed in it. So is quicktime, so is windows media so is Real Media. It gets confusing because MP4 can be a container and a codec at the same time. A container just hold the video and adds some metadata to the video itself - but the raw video could be ripped out of the container and put into another without re-encoding. This is what apple is doing with youtube on the iPhone. Adobes last "great" innovation (or best market move ever) was to enable the Flash container to play h264 (a codec) encoded videos. Since Apple (among everybody else who isnīt a flashy designer) thinks that flash sucks they pull out the video from inside the flash container and put it into the (now equally bad) quicktime container and so you can enjoy flash free youtube on your iPhone.
Now with the technicalities out of the way whats all the fuss about?
HTML5 promises - ones it becomes a standard - to advance the web into a media rich one without bolted on add ons and plugins that differ from platform to platform and browser to browser - its a pretty awesome development or most people ever developing anything on the web. Part of the process to make this the new standard is to involve everybody who has something to say and is a shaker and mover on the web to give the direction this standard is going. Its a tough rough ride - everybody and their mother wants to put in their tech their knowledge their thinking - I really would not want to be the decision maker in this process if you gave me a trillion.
The biggest and most awesome change in HTML5 - and the one the most abvious to the end user - will be the inclusion of media content without a freaking container that needs a plugin to display that content that only half or less of the internet population have. To make this happen at least all the big browser makers need to approve what can be played inside the new tags (video & audio).
This is where the debate heats up. I really donīt understand why audio doesnīt spurn the same debate publicly as does video - but its probably because google is involved with the video debate and can change the direction completely on their merit with whatever they choose to support on youtube.
The two competing codecs are Ogg Theora and H264. Now I am less familiar with the Ogg codec (but have tried it) but first a small history of H264. Back around 2000-2001 a company called Sorenson developed the first video codec that was actually usable on the web - there where different ones before but they all sucked balls in one of the departments for a great delivery codec. Sorenson made a lot of video people who wanted to present their work on the web very happy. Apple bought in and shipped quicktime with the Sorenson codec and the ability to encode (make) the video with this codec - albeit with a catch. To really get the full quality of Sorenson you had to buy a pro version - which costs a lot of money - the version that Apple included could play Sorenson (pro or non pro) just fine but the encoder was crippled to one pass encoding only. The real beauty and innovation was with two pass encoding - basically the computer looks at the video first and decides where it can get away with more compression and where with less.
Apple and the users where not really happy with this situation at all. So for a long time (in web terms) there was no real alternative to that codec. The situation was even worse because to play Sorenson you had to have Quicktime installed - before the advent of the iPod a loosing prospect - I think they had 15% installed user base on the web. It was the days of the video format wars - Microsoft hit with Winows Media (which sucked balls quality wise but had a huge installed user base) and on top Real Media (which was the only real viable solution for streaming video back then).
In the meantime another phenomenom happened on the audio side - mp3 became the defacto standard - a high quality one at that (back then) in the audio field. We the video people looked over with envy. When producing audio you could just encode it in mp3 pretty much for free on shareware apps and upload it to the web and everybody could listen to it. There was nothing even close happening on the video side. The irony is of course that its MPG1 layer3 (mp3) - part of a video codec - but the video codec side of MPG1 sucked really really really really bad. Quality shit, Size not really small only processor use was alrightish but not great.
Jumping forward a couple of years (and jumping over the creation of MPG2 - the codec used for media delivery on DVDs - totally unsuitable for web delivery) the Motion Picture Expert Group - a consortium of industry Companies and experts that developed (and bought in) MPEG1 and MPEG2 decided to do something for cross platform standard video delivery and created the MPEG4 standard (overjumping MPEG3 for various reasons - mostly because of the confusion with MP3 (MPEG1 layer 3). MPEG4 is a container format - mostly - but it had a possibility for reference codecs and the first one of these was H263 - this already was quality wise on par with Sorenson yet in a container that was playable by Quicktime and Windows Media - the two last standing titans of media playback (by this time Real Media mostly had already lost any format war). Great you think - well not quite - Microsoft wasnīt enourmously happy and created its own standard (as they do) based on MPG4 H263 called VC1 (I am not really familiar with this side of the story so I leave you to wikipedia to look that up yourself if you are so inclined). Web-video delivery was still not cross platform sadly and the format war became a codec war but there was now a standard underlying all of this and the quality - oh my the quality was getting good. Then the MPEGroup enhanced the h263 codec and called it h264 and oh my this codec with a pure godsend in the media delivery world - it was great looking scaled up to huge resolution could be used online, streaming and on HighDefDVDs and in the beginning it all was pretty much for free.
It looked like an Apple comeback in the webvideo delivery because Quicktime was for a while the only container that could play H264 without problems. Around that time flash started to include a function in its webplugin to play video - interestingly enough they choosed to include sorenson video as the only supported codec - word on the street was that Sorenson was very unhappy with Apples decision to ditch them as codec of choice and instead pushed H263/H264. Now the story could have ended with Apple winning the format war right there and all computers would have quicktime installed by default but it didnīt because out of nowhere Youtube emerged and Youtube used flash and Youtube scaled big time and made it - for the first time ever - really easy for Joe the Plumber and anybody else to upload a video to the web and share it with the rest of the world family. It changed the landscape in less then 6 month (I watched it it was crazy). Now you had a really good codec finally as a content producer to upload video in very good quality but the world choosed the worse quality inside a player that sucked up 90% processing power with the codec of choise needing another 90% and all that came out was shit looking video that everybody was happy to be over - but the user experience of hitting an upload button and have everybody watch your video was just unbeatable. Eventually just when people realised how bad these videos looked compared to some Hollywood trailers that still used quicktime and H264 Adobe included H264 into flash and prolonged their death by doing so again (without innovating at all it must be said).
Now fast forward to now - again a group of clever people, big companies and such have sat down to bring us HTML5 and the video tag. That tag as said is going to rid us from any plugins and containers and instead just plays pure video as fast as possible right inside the browser that supports HTML5. Now the problem is that people can not agree on the codec to be used. Why you ask if H264 is so great? Because H264 was developed by a for profit group of people and they want to make money and they have freaking patents on it - not that this has hampered web video to this day in any way - but for the future standard it seems lots of people have taking offense to that. There is in fact a whole ecosystem of alternative codecs (audio and video) in the open source world and the most prominent is Ogg and its video incarnation Theora. They are mostly patent free because the company who originally developed these codecs gave the patents to the open source community (yet its still not clear if that covers the whole codec). Now what happens when patents enter the WorldWideWeb could be seen with GIFs. The GIF graphics (moving or non moving) where once a cornerstone of the web - a more popular choice for graphics then anything else (small could be read by anything blahblah) then a company found out that they had the patents on that (luckily just shortly before they run out) and sued a lot of big websites for patent infringement and wanted to have royalties of $5000 from every website that used GIFs - they would have killed the web with that move (and they where in the right - law wise) if the big companies they sued first would have dragged out the court case until the patents run out - now the GIF file format is in public domain.
Now its understandable that this lesson should be learned BUT and here is
my take on the codec war:
Flash is a MUCH bigger threat then patents on the codecs used. Because not only does it use the patent infringed codec inside its container but the container is totally owned by one company and a company that has shown often (PDF) that it will do everything to take control of anybody using their technology - even if it is the whole world.
Now 95% of all web videos are delivered by flash these days and to change that a lot of things need to happen. First google needs to drop it on youtube - they just announced a beta which does just that - but even with googles might its just not enough - content producers need to hop on board as well. And here is where the chain breaks for the "free and open codecs of OGG". See from the history above H264 has been the industry standard on a wide range of devices including the web for years now. The whole backend has settled on this and there are really good workflows to create H264 video. With the video tag - Youtube is less relevant then it was at its beginning because all of the sudden its easy to incorporate video into your webpage. Now if google where to say "we use Theora only" high quality content producers would just say "fuck you" and post their videos on their own sites in a much better quality without the hassle to find any workflow to produce theora videos (for the non terminal using people there just isnīt an easy way to do that still that can be used in a professional non frickly environment - we like to create not code for a delivery sorry).
But thats not enough - almost ALL consumer cameras released over the last 2 years including the hot shit DSLRs with video functionality produce H264 that can be "just" uploaded to the web without reencoding - thats saves Youtube and vimeo a lot of processing capabilities - and with their lossy revenues they sure donīt want to add another server farm just to reencode every and all video they have to a codec that has worse quality. You know 90% of all videos on the web are already encoded in H264 as of now (and Theora maybe has 0.2% of the other 10% that are left over). Its uneconomical and not sensible to re-encode all of that any way you look at it - especially since the quality is not surpassed by any other codec out there - patent free or not.
I would say go H264 now and have a new consortioum of browser developers and other companies develop a new codec (or build upon theora) from scratch that is patent free AND high quality AND has a good workflow (means is supported by hardware vendors and OS vendors across the board). That can then take over H264 (just like PNG took over GIF in less then 2 years following the patent threat). Leave the codec question open for now and let the web sort it out for itself (for the moment) - like in the img tag - doesnīt matter if you put PNGs, GIFs or JPGs in it (or any of the other plethora of IMG formats) as long as the browsers support it its watchable and so far has shaken out a good road to take (see the switch to PNG with transparencies that also helped a lot to bring down IE5 in my opinion which didnīt fully support that - so the market sorted it out quickly (as in 5 years quickly)).
BTW the only browser that just does not want to go down that route (and rather wants to cripple itself with Flash in the meantime) is the oh so open minded Firefox. Sorry I fail to see your point dear Mozilla developers - you are not gonna make a lot of friends that way outside of the very very small open source community (and even here your approach is not liked universally by those who can not install a flash plugin for example because flash is not supported on their platform (ppc linux f.e.).
Get rid of Flash first then get rid of H264 later when you have something equally good on all accounts. Going backward with technology is just never the way forward - open source or not.