Sunday, September 6, 2009

Notes on Speex

I've been working on improving the Speex support in FFmpeg. I pretty much have everything completed except for getting it all through review to SVN.

I started out just making an encoder for libavcodec based on libspeex, along with adding support for Speex in the Ogg muxer. This was fairly simple. Art Clarke of Xuggler also submitted similar patches around the same time, so there was some good discussion as well as some shared code.

The Ogg muxer is obviously needed first, so I tested it using stream copy from Ogg-to-Ogg and FLV-to-Ogg. In both cases I ran into problems with a small pop near the beginning of the output file when decoded using speexdec. After about a full day of debugging, I managed to figure out the problem. The first frame is supposed to have skipped samples due to transmission delay. The way this is communicated is through Ogg granule positions. However, the first granule position reflects the last sample of the last frame in the first Ogg page. In order to determine how much of the first frame to skip, one must calculate what that granule position would normally be without skipping, then calculate the difference. I submitted a patch to ffmpeg-devel to handle this in the Ogg demuxer. As a side note, it seems that Adobe Flash Media Server does not take this into account, and therefore the timestamps skip samples in the middle of the stream instead of at the start. So stream copy of those files from FLV-to-Ogg will always produce incorrect granule positions.

The libspeex decoder should also be modified to skip these samples in the first frame. I have a patch ready for that. Cutting off samples from the last frame is not possible though because you never know if you're encoding the last frame. That's not as big a deal though.

Speex-in-FLV muxing support is a pretty simple and straightforward patch, but should not be committed until the Ogg timestamp issues are fixed to prevent more broken files in the wild. The same goes for the Speex-in-Ogg muxing.

Today I have been cleaning up the libspeex encoder for lavc. Speex is weird in that it can use either a quality setting or bitrate to set the CBR rate. My proposal will be to add a flag2 for explicitly enabling VBR. That way it will not just depend on CODEC_FLAG_QSCALE like in most other audio encoders. For setting frames-per-packet, the user can set AVCodecContext.frame_size to a multiple of the Speex frame size (e.g. 960 in wideband mode would be 3 frames-per-packet). Complexity is set using AVCodecContext.compression_level.

After I get all this through review (hopefully it won't take terribly long) I think I'll try adding support for vorbiscomment muxing in lavf.

Sunday, August 9, 2009

wow, great quote

"The first question I ask myself when something doesn't seem to be beautiful is why do I think it's not beautiful. And very shortly you discover that there is no reason." -- John Cage

Saturday, July 25, 2009

Remember the Milk

I stumbled across a pretty neat TODO list website called Remember the Milk. I started to use it for work, but work soon became too crazy to have time to keep up with a TODO list in addition to doing the things on it. But I do like it for keeping up with my personal long-term and short-term programming tasks. I've made my list public so anyone can see what I'm working on.

Thursday, April 2, 2009

more fail-safe FLAC parser

I submitted my FLAC parser (as mentioned in my last post) to the ffmpeg-devel mailing list and got a suggestion (a.k.a. requirement) from Michael for decreasing the likelihood of false-positive frame detection. Basically it would analyze a sequence of potential frame headers and pick the most likely sub-sequence based on concurrent non-overlapping CRC matches. It would also take header parameter changes into account. I am currently working on an implementation of this idea.

Thursday, March 26, 2009

more FLAC parsing

As noted in my previous post, I was able to make a working FLAC parser, but I was not completely happy with it. I just had a feeling if I kept trying I could get it to work without buffering max_frame_size bytes and use a state variable instead.

Well, after many hours I finally got it working! At least it works for all samples I've tried so far. I still want to do more tests just to be sure though. Here is how it works:
  • The FLACParseContext has a 31-byte state buffer and state size.
  • I had to modify ff_combine_frame() to pad the parser buffer with 16 bytes instead of 8 bytes to accommodate negative values for start of frame less than -8.
  1. If state buffer has data, copy data from current buffer to fill state buffer (or as much as possible). Search the first 16 bytes of the state buffer for start of frame and CRC of previous frame. If it passes, return start position within the buffer (will be negative).
  2. Search the current buffer, up to 16 bytes before the end, for start of frame and CRC of previous frame. If the next frame is found, return the start position with the buffer. If none is found, check last 16 bytes for a frame sync word and if found, store in the state buffer. Return END_NOT_FOUND.
  3. There are 2 exceptions to the CRC check. One is if an inline header is found since it will not have a CRC match. The other is if there are 10 CRC mismatches in a single frame. The reason for that is to abort the frame search for damaged files. It is extremely unlikely that a valid frame will have 10 CRC mismatches.

Monday, March 23, 2009

passing off work

I've been wanting to take some time to write an MPEG-4 ALS decoder for a while now, but haven't had the motivation. So this year I decided to propose the project for GSoC '09. Hopefully a good student will qualify and want to do the project.

FFmpeg Summer Of Code 2009 - MultimediaWiki

After writing a raw FLAC parser, then looking at the raw ALS format, I've decided to aggresively push for not touching raw .als files with a 10-foot pole. We just need to stick to MP4 first, and other containers like NUT and MKV second.

FLAC parsing

Let me start by saying that I very much dislike the raw FLAC format! After a lot of failed attempts I finally got a parser working. Part of the frustration was due to FLAC, and the other part was the way FFmpeg's parsing works.

First issue: reliable frame detection
  • Using only the 16-bit frame header gives many many false positive frame start matches.
  • Using the frame header + frame CRC-16 gives a handful of false positives.
  • The only reliable way seems to be a full frame header validation including the header CRC-8, along with checking the CRC-16 for the whole frame.
Second issue: using FFmpeg's parser to combine pieces of frames from chunks of data
  • Frame headers are variable-sized from 6 bytes to 16 bytes.
  • The whole frame can be as small as 11 bytes, and silent frames are often less than 16 bytes.
  • ff_combine_frame() only keeps up to 8 bytes of state information
So, the way I finally ended up with a working parser was to basically use the same strategy as the buffering that's done currently in the FLAC decoder, but do it with ff_combine_frame().
  1. read first header to get starting point for block size/channels/bps
  2. estimate maximum frame size
  3. buffer enough data for maximum frame
  4. detect next header
  5. update maximum frame size with header info
  6. update buffer
  7. return frame start and size
  8. goto 3
There are special cases for when there is no more data to read and for checking up to 11 bytes from the end of the buffer to make sure there isn't a small frame that's not being detected due to the 16-byte state size.

Thursday, October 30, 2008

TODO list

My TODO list is getting long, so I'm writing it down here. I'll keep updating this post if I decide to add/remove items or when I complete an item.
  1. finish simplifying Aften's asm structure
  2. FFmpeg AC3 decoder channel reordering (after channel order patch is applied)
  3. add FLAC 24-bit encoding to Flake and FFmpeg
  4. finish Alsophila bit allocation improvement and port to Aften
  5. port Alsophila frame encoding structure to Aften
  6. add a trellis mode for variable frame size in Flake
  7. FFmpeg FLAC parser
  8. better FFmpeg raw FLAC muxer
  9. better FFmpeg raw FLAC demuxer
  10. MP4ALS decoder for FFmpeg
  11. clean up MP4ALS encoder for FFmpeg
  12. AC3 metadata editor

Monday, August 4, 2008

ALAC update

Jai's ALAC encoder for FFmpeg SoC is going very well. Entropy coding and prediction are working. Adaptive prediction order and stereo decorrelation have been implemented, but are still works-in-progress. There is still an odd bug which generates non-lossless output when decoded. It's very rare from what I've found, and hopefully the cause will be tracked-down and fixed.

Even with the encoder as it is right now, it could be very easily cleaned-up for inclusion in FFmpeg SVN. The compression isn't great, but a decent working ALAC encoder is better than no encoder, and it's definitely on the right path to being a great encoder.

Saturday, May 10, 2008

MPEG-4 ALS encoder

Last weekend, I started working on an mp4als encoder for FFmpeg. Here is a summary of what I have implemented so far.
  • rice coding (slightly different from jpeg-ls)
  • random access (progressive prediction order)
  • lpc prediction with fixed order
  • joint stereo
  • entropy block partitioning
  • constant and zero block modes
I have quite a bit left to do though for a complete encoder. What I have so far gets comparable compression to the RM20 reference encoder with default settings, and is 3x faster.

Here is a patch against FFmpeg SVN.

Monday, April 28, 2008

staying busy

Some of the things I've been working on or will be soon...

I am mentoring 2 Google Summer of Code projects this year, an ALAC encoder and an MLP/TrueHD encoder.

Jai Menon will be creating the ALAC encoder. I'm very excited about the possibilities of this project. The format seems fairly straightforward, and Jai seems very bright and motivated, so I'm optimistic about him having time this summer for some good experimentation with the various encoding features.

Ramiro Polla will be making the MLP encoder. This seems like a bit more complex task, but I'm confident that Ramiro can complete it. This will be a wonderful addition to FFmpeg and will be very valuable for the open source community in general.

In addition to starting my mentoring, I'm also working on the E-AC3 decoder and on FLAC. E-AC3 is very close to being ready for FFmpeg SVN. The FLAC decoder just needs some work... It's acting as a demuxer, parser, and decoder all in one, with some not-so-fun side-effects. Now I just have to be persistent to get through Michael's tough reviews. :)

Maybe one of these days I'll get back to working on Aften and Flake... But FFmpeg definitely takes priority for now.

Monday, March 10, 2008

March update

I haven't posted in a while. Here is what I've been working on.

Alsophila : experimental AC3 encoder. I wanted to do some major changes to Aften, namely channel coupling and E-AC3 encoding. It was really difficult trying to get it working with Aften. I tried channel coupling at least 5 times, all without success. So I stripped-down the code to the bare bones (no simd, no threading, simplified interface). I quickly got channel coupling working, so I decided to keep the stripped-down version around for future experimental stuff before porting it over to Aften.

I've also been trying to get the FFmpeg E-AC3 decoder up-to-snuff. In the process, I'm also working on the normal AC3 decoder. I am mainly dealing with stability (segfault) issues and speeding up the downmixing without sacrificing quality.

Unrelated to coding, I've been playing PlaneShift, a fun (and free) MMORPG. They just released a new version, which is great for the added features, but has made the server a bit more unstable than usual.

I got a new camera for Christmas! It's a Fujifilm FinePix F50fd. The "fd" stands for face-detection, which is pretty cool. But the real reason I chose it was for the great
reviews. It is one of the best in its class for image quality in general, and puts the rest to shame when it comes to high-ISO/low-light picture-taking. I like to avoid flash if possible, and using a high ISO setting makes this much easier to do even without a tripod. If I was wealthy, I probably would have gotten a digital SLR, most likely a Fujifilm, but this camera is a close second place!

Saturday, November 17, 2007

git is fun

I've been trying to learn how to use git to make my software development easier. Right now I'm still trying to get the hang of it and have had to wipe the tree clean and reload from SVN several times. I think I just need to play around with it some more.

Git is great for adding changes with multiple commits and for reducing the need to have multiple working copies of different ideas.

Saturday, October 27, 2007

Flake updates

I decided to do some work on Flake recently. Among other things, I've tweaked the compression level settings, modified the API, added the pcm_io library from Aften and got rid of the internal WAVE reader, changed the build system to CMake. I added more stuff to the TODO list as well, so more changes will be forthcoming.

Saturday, September 8, 2007

flake vs flac comparison

Today I did some speed/compression tests to compare Flake 0.11 to FLAC 1.2.0. The compression ratio for FLAC -8 has gotten a bit better since my last tests, so it currently beats Flake -8 in compression, but is still nowhere close to it in speed. I would recommend using Flake -9 now, which is still better than FLAC in both speed and compression.

I also included 2 high compression Flake modes in the tests. Both modes use large/variable frame size and high prediction orders, which makes them non-Subset-compliant. Flake -99 gives the highest possible compression without regard to speed. Flake -99 -v1 gives almost as good compression and is significantly faster. Just for fun I also included summary tests using WavPack, Monkey's Audio, and La.

All 6 Tests
Playing Time: 26:47.0
File Size: 270.34 MiB

enc time rate size compression
---------- ------ ------ -------------
FLAC -8 0:34.7 46x 161.61 59.78%
Flake -8 0:17.4 92x 161.86 59.87%
Flake -9 0:20.1 80x 158.95 58.80%
Flake -99 -v 1 1:25.0 19x 152.73 56.50%
Flake -99 7:27.2 4x 151.22 55.94%

FLAC -0 0:05.1 315x 188.32 69.66%
WavPack -hh 0:24.3 66x 157.15 58.13%
WavPack -hhx3 4:37.7 6x 153.83 56.90%
MAC -c3000 0:26.4 61x 152.81 56.53%
MAC -c4000 0:44.8 36x 145.92 53.98%
La -high 6:29.5 4x 143.06 52.92%

Sunday, September 2, 2007

new computer

yay! I finally got a new computer! My previous computer was zapped by lightning or a power surge or something... All I know is that there was a big thunderstorm that day and when I returned from work my computer was dead. Luckily, my harddrive was ok. Here are some specs on my shiny new machine.

AMD64 X2 6000+
2 GB RAM
nVidia GeForce 7600 GT
22" widescreen LCD monitor
AC'97 7.1 onboard audio

I've never built a computer from parts before, so it took me the better part of 2 days to get it working properly. I'm still convinced that my first IDE channel is defective, but I worked around it. I also haven't been able to get the floppy drive working. But everything else seems to be working great. I haven't yet gotten used to the noise of 8 fans!

Anyway, I finally got Ubuntu up and running at the proper resolution and playing DVD's, both of which were a PITA to get working right. To even get started I actually had to manually edit xorg.conf to run in vesa mode, update the nvidia driver, then edit xorg.conf again. I also had to manually add the monitor's 1680x1050 optimal resolution to xorg.conf. As for playing DVD's, luckily there were some good instructions online.

Sunday, August 19, 2007

FFmpeg E-AC-3 decoder

Well, the Google Summer of Code is almost over. Bartek has done a great job with the E-AC-3 decoder. The code is now nice and cleaned-up, and it works well for all known samples. Unfortunately, there aren't many very good known samples. Despite this setback, most of the unused features have already been implemented, just not tested. With some luck, adding the decoder to FFmpeg's main SVN branch will bring on reports of samples which utilize more features.

Friday, August 10, 2007

MP4-ALS

I've been doing work recently on the MP4-ALS audio format. I am in the process of writing a decoder by first writing my own specification document based on the reference software. I may eventually break down and purchase the ISO specification, but doing my own specs seems to be going okay for now. Just to have a place to keep my work online and organized, I've applied for a new Sourceforge project which I'm calling Decals. I'm already stretched thin as far as programming projects, so there may not see a release for a while. Another reason I'm putting the work online is because I hope to get some help as well.

So far, my documentation is about 600 lines of text and already covers most of the decoding process. There are some modes which are more complex that haven't been done yet though. As for the code, it's now at about 750 lines, and consists of a partially-working parser covering much of what I've documented up to this point.

Sunday, August 5, 2007

Flake 0.11

I finally got around to creating a new release for Flake. Version 0.11 was released today. I also updated the website. I had created the new design last year, but it never made its way onto the web...until today. I know I haven't done a lot with Flake for a while, but it's not dead...just lower priority at the moment. Although some fancy meta-data features would be more fun to add, I think my next move will be to get a testing framework up and running using the official FLAC testsuite. I mainly want to do this so the Winamp developers can confidently recommend using their FLAC encoder (which currently uses Flake r117) for archival use.

Thursday, July 19, 2007

scattered thoughts

I just started a new job this week, so I'm in information-overload mode right now. I'll post more technical stuff this weekend. As far as programming goes, I am currently working on the FFmpeg AC-3 decoder, Aften, and mp4-als decoding. The FFmpeg work is mostly just figuring out logical steps to take the code from point A to point B. My Aften development is experiencing a temporary slow-down until I finish the other AC-3 work.

Working on mp4-als is pretty exciting to me. If you don't know, mp4-als is a lossless audio codec based on LPAC and standardized by ISO as part of the MPEG-4 specification. I don't want to fork out the money for the ISO specs, so I'm trying to create my own format notes from the very incomplete draft specs and the reference software. Then I'm using the format notes to implement a decoder.

As for non-programming stuff... I think I'm going to purchase a digital camera this weekend. I love photography, but I haven't done it much since my wonderful Minolta SLR film camera was stolen a few years back. I desperately want a camera which supports raw image format, but I don't think I'll be able to afford it.

I've been trying for a while to find a really good music collection organizer / player for Linux. The best I've found so far is Amarok. I don't like that I had to install the qt libraries (I use gnome), but it's worth it at this point. I do wish it was more customizable though.