David Haywood's Homepage | Taking the FLAC

January 18, 2012 Haze Categories: General News. 70 Comments

One of the criticisms often made of MAME / MESS’s CHD format is that it doesn’t actually provide very efficient compression, especially when it comes to CD AUDIO data. I’ve had a number of people ask me if I can look into improving this, especially when you consider that in with the current format a complete Saturn set is almost 1TB, with a large portion of that being AUDIO.

The reason it’s inefficient is because it’s using zlib’s inflate algorithm for the blocks, blocks which are rather small to ensure that data is decompressed quickly. While this is fine for DATA (it’s the same thing that ZIP files use) it’s absolutely hopeless for AUDIO.

There are dedicated audio lossless compressors out there, FLAC is a popular one.

I’ve spent the last 4-5 days solid integrating support for this into the MAME / MESS tree, and extending the CHD format to not only support it’s native blocks (hunks), but also reference to embedded streams via ‘virtual hunks’ which point at a stream, and allow the actual FLAC codec to do the seeking and decoding work for this.

By doing this I can achieve a good level of compression with FLAC, far better than trying to split it into CHD hunks due to the lower overhead, and improved ability of the compression algorithms to predict how data best compresses. I also still get good speed decoding, as the FLAC format is designed to be quick to seek, and has built in seektable support of it’s own which I’m levering.

I have to say FLAC is an absolute joy to work with, the API does everything you can expect, the documentation is great, and it’s very good at letting you know if something is wrong. (the only issue I had with the documentation / API was with the seektables, whereby calling things in the wrong order / wrong time during encoding could cause data to be overwritten without throwing an error)

I’ve also added support to the MAME SAMPLE interface to playback files from FLAC sources, this should allow the recently dumped tape loops to be compressed much better than they are now (they’re uncompressed PCM .wav files)

The other possibilities for this are endless, -wavwrite could also output FLAC data if support was added, MESS could potentially load cassette based software from FLAC images. It’s an incredibly useful codec to have around.

I’ve uploaded my first pass of this code ~~Here~~ (link offline for the time being, there is definitely still an error). This should be considered ALPHA SOFTWARE and I won’t be held responsible if you end up destroying your CHDs with it. I’m currently in the process of batch converting many images and haven’t found a broken case yet, but still, it’s in testing. While I’m happy with the current format extensions and CHD format created it could change in a final version, you have been warned.

This code has been submitted to R.Belmont, who is currently making some portability fixes. FLAC is designed to be portable, so this shouldn’t be too much of a problem, so fingers crossed it can be sorted out soon.

Usage is simple, I’ve added an additional -createcdflac commandline option which will use the FLAC routines when compressing AUDIO. If you already know how to use CHDMAN then it’s simple enough.

Have fun :-)

70 Comments

You can follow any responses to this entry through the RSS 2.0 feed.

zyrobs January 18, 2012

>a complete Saturn set is almost 1TB

Wait, what? We have a complete Saturn set? Even if I count iso/mp3 and “found somewhere on snesorama” copies, I don’t think we are past 90% completion. Depends on what you consider complete though…

I don’t know about the efficiency of CHDs, but if a full Saturn set is 1TB, that means it achieves very minimal compression ratio. So using FLAC for this is a welcome addition.

Haze January 18, 2012

well ‘complete’…

a lot of demo discs and such are missing, but the majority of games actually seem accounted for in one form or another.. re-release discs are missing, but more often than not they’re just different pressings of the same data anyway.

To be fair, it’s almost 1tb because there are a lot of duplicate (same game from different sources) at the moment, but that’s another thing on my todo list, give CHDman some kind of comparison option if possible, or at least allow it to spit out more metadata about the discs so I can more easily conclude how the rips differ.

http://git.redump.net/mess/tree/hash/saturn.xml

zyrobs January 18, 2012

Aaah, that’s an interesting xml.

You may want to drop me an email. I can clear up a few of your assumptions there, but there are a few things I’d rather not say in public. Suffice to say that I should have a MUCH more accurate list than the one you are using there.

etabeta January 18, 2012

imho, while FLAC is definitely interesting for tapes and samples, CD audio tracks should simply be ripped as binary tracks (e.g. like in redump.org sets)

I understand the advantage of compressing old rips with the new method, but it does not make them good dumps ;)

Haze January 18, 2012

there’s no real difference….

the redump.org ‘binary’ dumps are just headerless, byteswapped data compared to the tosec wavs. It’s still 2352 bytes per sector PCM audio data. .cue / .bin dumps are exactly the same. The only thing that changes is the representation, or the ‘format’

from a point of view of the internal CHD structure it makes no difference at all, they’re all stored in the same way. Zip / Inflate simply isn’t suitable for storing data of this nature, so we can be intelligent and store them as an audio stream. It’s still lossless compression of the same data, you still get the same out as you put in, trying to say one is better than the other is like saying ZIP is unsuitable for ROMs because it’s internal data storage is in a different endian to the one you want ;-)

I’ve even made sure that if there is any subdata in the audio (REAL binary audio tracks) we have the possibility to store that too, it simply compresses the subdata part with inflate, and passes the audio data part quite sensibly to FLAC.

You could argue that CHD should not exist at all, and that MESS/MAME should be listing the checksums for the individual tracks in binary format, which would be treated like ROMs, but a decision was taken long ago to store CD data in CHDs…

Huggybaby January 18, 2012

As usual, Haze, you’re taking the initiative to produce original, interesting and useful stuff.

Using flac to compress cassette tape data is a cool use that didn’t immediately spring to mind!

So, what size will that 1TB Saturn set be after flac compression? I reckon around 40% smaller?

Haze January 18, 2012

I’m guessing it will save around 200gig, there are a lot of discs with many audio tracks.

Obviously it’s not going to make much difference for video based games, or ones with lots of FMV, so some images won’t compress better. If I’m to tackle those I’ll have to look at LMZA2 blocks or similar, which will give better compression than inflate stuff (their suitability would obviously be based on factors like decompression speed too tho)

For most of the PCE discs (which are usually small data track + audio) I’m seeing around 150-200meg savings on each CHD. For the Saturn discs which are audio heavy I’ve seen similar.

Haze January 18, 2012

(and there are around 1900 saturn CHDs, so if I assume half of them to be giving such savings, it’s still rather impressive)

zyrobs January 18, 2012

For comparison, the ~1900 saturn games I have (all unique titles, no dupes, all bin/cue dumped from original disc) is 525 GB in RAR.

Apples to oranges, I know.

Haze January 18, 2012

Yeah, but you can’t stream RAR straight into an emulator, needs to be decompressed first, the nature of the compression makes it compress better, but not really be suitable for streaming :-) That’s kinda the point of CHD, it’s a compressed format, like ZIP, but also suitable for streaming directly. Of course to use it in non-MAME/MESS emus first you’d still have to extract it (just like RAR) although an interesting project might be to extend something like http://wincdemu.sysprogs.org/ to a) support audio tracks and b) mount CHDs directly….

Anyway, MESS lists can only document what’s public… I can’t document what I don’t have access to, and we don’t document what people can’t test. The point of the lists is to have a fixed point of reference, an image which can be pointed at by setname for any bugs etc. That’s what I wanted to build the most complete list of what was available to the public.

zyrobs January 18, 2012

Like I said, apples to oranges. I was just giving some numbers for comparisons sake regards the compression ratio.

And I’ve been building that list you’ve started with for near a decade already.

Haze January 18, 2012

Alright, well I’ll drop you an email at some point (bit busy right now)

basically I needed some kind of checklist to cross reference things, so I used that list as the most complete one I could find. Improvements are obviously always welcome.

It’s probably going to *take* decades for such lists to evolve and become complete, but I still feel MESS can serve the same purpose as MAME when it comes to such things :-)

Haze January 19, 2012

Fwiw there does appear to be a lingering issue with this still, I’ve noticed one or two PCE images don’t decode properly.

To be expected at this stage, I’m going to look into it :-)

Haze January 19, 2012

and it appears the powers that be have rejected this…

guess they’d rather continue more pointless rewrites of the basic code, with no end user benefits.

Haze January 19, 2012

I’m wondering if it’s some kind of edge case library bug with the built in FLAC seeking actually..

we get here

/* make sure we are not seeking in corrupted stream */
if (this_frame_sample protected_->state = FLAC__STREAM_DECODER_SEEK_ERROR;
return false;
}

but this_frame_sample has aquired a wacky value from somewhere else…

If I reset to the upper seek boundary

/* make sure we are not seeking in corrupted stream */
if (this_frame_sample protected_->state = FLAC__STREAM_DECODER_SEEK_ERROR;
// printf(“corrupt stream error %d %d\n”, this_frame_sample, lower_bound_sample);
this_frame_sample = upper_bound_sample;
// return false;
}

then the CHD decodes just fine, so I don’t think it’s the CHD that’s corrupt, just something completely throws the seeking in a few rare cases… (seektables are valid..)

Haze January 19, 2012

hmm ok.. can’t paste code in comments ;-)

http://pastebin.com/brvHXDFS

Haze January 19, 2012

(note, I don’t consider that a fix, just a hack.. looking into the real problem now)

ben January 19, 2012

I think you’re making this too complicated. Unlike zlib, FLAC splits the audio into small hunks that are compressed independently anyway. There is no benefit to feeding it whole tracks of data. It would be cleaner to use the existing CHD hunk support. I believe the FLAC__stream_encoder_* functions in libFLAC can give you a simple compressed stream without any metadata crud.

Haze January 19, 2012

hmm, not that I saw, it does split it into blocks, yes, but from what I can see it expects the metadata to be read in etc.

Anyway there’s clearly a bug in the actual FLAC library with seeking, not in my code. I’ve been examining it.

The code in read_frame_header_ manages to pass the check for ‘Magic Number’ as well as the 8-bit CRC check.
The

if(x == 0xffffffff) { /* i.e. non-UTF8 code… */

doesn’t catch it either, so eventually it results in frame.header.number.frame_number being set to something invalid.

that eventually falls down into the seek code, which gets a wacky current sample / frame number, and it dies.

Haze January 19, 2012

also the most efficient block size for FLAC differs from that being used as the hunk size in the CHDs, so the other option would require me to add the ability to have variable (or at least 2) different hunk sizes in a CHD, one optimal for zip, the other optimal for flac..

either way, it’s a significant amount of work :-)

Haze January 19, 2012

http://pastebin.com/JgXuYbjh

ok….

the code FAILS a CRC16 check on the footer, then continues anyway, using the bad data..

it needs to return to a seek / sync state after detecting the CRC16 fail surely?

Haze January 19, 2012

anyway I’m going to try the alt approach of using the FLAC data directly in standard blocks, even if it means crafting pseudo headers to keep the decoder happy :-)

let’s see what happens..

(btw, if anybody did convert CHDs with the code posted earlier, they’re not corrupt, they just need that correction to the FLAC library code in order for them to all decode.. however, as I said, what I’m doing is subject to change anyway, and I do plan on using this for myself, even if means keeping my own fork of the CHD code)

Haze January 19, 2012

hmm no, there are still some seek problems even with that patch, right, I’m definitely going to have to try doing my own frame / seek system using the CHD blocks, the FLAC one just doesn’t seem reliable.

ben January 19, 2012

FLAC is designed to work in streaming environments where there’s no header or index, just endless audio packets. I think it allows parameters like bit depth to be supplied out of band by the streaming protocol. So it should work here without header hacks. Regarding optimal block sizes: FLAC only looks at a few previous samples to predict the next one. The only thing that’s done at the block level (as far as I know) is computing the most efficient way to store the residual for that block, which depends on the nature of the waveform, which I wouldn’t expect to change very often. So I’d expect you to see little dependence on block size in the “reasonable” range (1K-16K, or whatever). Obviously, it would be better to gather some actual data…

It would be interesting if libFLAC has a seeking bug after all these years. It may be that it’s not designed to support sample-accurate seeking.

Haze January 19, 2012

well, it certainly seems to be a bug.

My seektables are valid (and I can even disable them and get the same behavior)

There are just various conditions under which it can get an invalid frame, then fall through and attempt to use that data for seeking.

It’s meant to do absolute seeking, there is a function call to do such, which I’m using, but maybe when combined with unusual block sizes, and heavy use it fails. I’ll try to isolate a test case out of it, outside of MAME/CHDMAN later using the standard FLAC file i/o stuff, but after adding a few billion printf lines and analysing what goes wrong it looks like the library just doesn’t cope well with certain scenarios.

I’m going to give direct inclusion of the FLAC data another go anyway, that was my initial thought, that it should be fine like that, but I was getting bad compression, although thinking about it, I think I know why now.

If I can compress any regular sized block as FLAC it also opens up another possibility, I can attempt to compress *every* block of the CHD both ways, that way if there is PCM audio data stored in regular tracks it will stand to benefit too.

Haze January 19, 2012

(Naturally, I’m thinking it could be expanded to cover the HDD based games, as long as the block sizes are suitable. I’d be very surprised if the BeatMania CHDs aren’t packed full of potententially suitable data)

Haze January 19, 2012

ok.. I have it working in a more generic way.. going to do some tests, then post a new version.

Haze January 19, 2012

Right, here is a revised (rewritten) patch

— removed, use latest code from MAMESVN / GIT —

this uses the MAME CHD block structure, no FLAC seeking.

Compression isn’t quite as good, I’m losing 6-10 meg on some images compared to the previous patch, but hopefully it’s afer. I’ve tested a few images without problems, but I managed to test the last one on a few images without problems too, so I’m still expecting to find some bugs.

As before, consider this a ‘technology preview’, not something for stable use just yet

Huggybaby January 19, 2012

No surprise the code refactorers rejected another logical proposal. After a year or so, maybe they will wake up with the same brilliant idea.

VERY interested if you can find and squash a flac bug Haze, since that applies to a lot more than mame.

jorge January 19, 2012

FLAC is an important feature, thank you haze!
Have the MAME team considered FFV1 (or any other good lossless video codec) for video based games (e.g. Laserdisc’) ??

etabeta January 19, 2012

1. you don’t know what you’re talking about ;)
2. without refactoring, there would be (among other stuff) no discarding of polygon output to a second CPU nor DRC alternatives for any CPUs… maybe you should say thanks to Aaron any time you start a 3d games which goes close to full speed in MAME
3. I’m not sure where Haze got the idea of the rejection: there were no hostile comments comments about the patch (the worst was that reducing size per-se is probably not a priority, but the additional FLAC availability for tapes and samples was not criticized by anyone)… point is that if you send code to Arbee, then you have to wait for the moment Arbee has time to integrate such code, there is nothing unusual in this. emulation in the end is an hobby and devs can only spend the free time on it…

etabeta January 19, 2012

the previous comment was directed to Huggybaby of course, but also to all other conspiracy theorists around here

Haze January 19, 2012

RB was saying it was rejected in the shoutbox, now he’s fine with it… If the only issue was he wanted it rewriting to a different standard (which I’ve now done anyway) he should have said that instead ;-)

Anyway the LD games are (AFAIK) already losslessly compressed, the problem is lossless compression really doesn’t buy you much with video, especially not gritty, dirty video from old laserdiscs.

Haze January 19, 2012

As for the FLAC bug, I’ll go back to it and try to isolate some test case, it’s definitely a weird one, the very fact the code can fall right back down to the seeking routines, oblivious of the fact there was a CRC error in the block footer just doesn’t seem right at all, but that’s not the whole problem because there are still other instances where the seek code ends up with a screwy number.

It does make me wonder if I’m somehow giving it the wrong data, but I’ve checked over the (old) code a number of times and can’t see what I would be doing wrong, could be something incredibly stupid tho. The FLAC code in the area concerned isn’t the most readable, excessive overuse of a single pairs of temporary variables ‘x’ and ‘xx’ is a bit annoying to follow)

If it is some combination of the exact blocksize I’ve used amongst other logic, it probably really doesn’t affect anything other than the case here, as 99.99% of FLAC users will be sticking with the standard settings.

As for moaning about the refactoring, of course I’m not moaning about actual functional improvements, however there’s a lot of distance between functional improvements, and simply converting stuff over to different coding standards en mass. I was praising the stability of the codebase a few weeks back, and now it’s ‘re-learn everything again’ time. I can understand wanting to simplify the core, but now extra work is moved to the drivers (although to be fair both STV and Megadrive did need converting away from 15-bit modes anyway, as the Saturn has a 24-bit mode, and 32x too) I just don’t really see where things are going, other than another cycle of break things and fix some of them with no obvious end-user improvements? The projects are still crying out for functional improvements, be it real multi-channel sound routing (it’s 2012 and the best we do is stereo?) a better decoupling of game screen updates and UI updates so that the UI is always responsive (I had hoped Aaron might have something like this on the cards with the screen update changes) or even just things like this I’m doing here, keeping the projects competitive with others by adding features people want to see. Of course, it’s good to see some things, like the Laserdiscs finally fixed again with these changes, but the only reason they’re broken in the first place is due to the last lot ;-)

I might be the only one stating this in public, but there are 2 other ‘currently active’ developers moaning exactly the same in private, and that concerns me. I’d hate to see people just give up developing because it becomes too much work to simply keep track of, and maintain an active memory of what the current standards are, especially when they aren’t that well defined (strcmps instead of the old driver inits in the ‘modern’ system E driver really doesn’t look like progress?) I’ve said before MESS has more ‘Modern’ stuff than MAME by far.

To end with a more positive note, If MG goes through with what he was saying about converting the MAME layout / presentation layer closer to something like a properly optimized game engine that would be fantastic tho, and really help when it comes to things like the Mechanical games, or dare I even say it, Pinball simulations.

ben January 20, 2012

It looks like there is no way to prevent FLAC__stream_encoder_init_stream() from writing the fLaC signature and two metadata blocks (STREAMINFO and, for some reason, VORBIS_COMMENT). But the decoder does not need any of that. Instead of discarding exactly 86 bytes on encode as you currently do, you can ignore calls of flac_encoder_write_callback with samples==0 (or discard everything written during FLAC__stream_encoder_init_stream, which I think is the same). On decode, you don’t need to prepend a fake 86-byte header: just supply the data that’s actually in the CHD.

libFLAC seems to use blocks of 4096 samples (16K bytes) at maximum compression, which is larger than I’d expected. That could explain the reduced compression you’re seeing.

Haze January 20, 2012

yeah, the sweet spot for the compression appears to be around there, which is double what I’m currently using. I’ll probably move the hunk size multiplier to the commandline.

There’s a balance issue of course, if any single decode request takes longer than 1/60th of a second then it’s going to introduce stutter into the emulation.

I’ll optimize / rework the code a little anyway, the initial pass there was just to make sure it would be happy by not changing the data at all from what got encoded.

me January 21, 2012

Thats interesting I was thinking something similar (but not nearly as radical). I was thinking of making some sort of ‘extent’ sort of thing in there where it would compress ranges with different compression algos. Then let chdman could also compress things 3-5 different ways and pick the best one. I also was thinking of at the very least probably just wholesale swapping out the zlib lib for something better like the 7zip lib (which usually does 1-3% better) and produces equivalent streams. It also opens up 2-3 other types of compression types.

Haze January 21, 2012

Well as said earlier, you have to consider the performance too. I will look at LMZA2 (the main part of 7z) but general indications are that it’s significantly slower than standard zip.

It’s essential here that things decompress *quickly*. If the emulation wants data from a block it has to wait until that block is decompressed, if that block takes above 1/60th of a second to decompress (including actual time needed for the emulation that frame) then it WILL cause stuttering.

Wavpack (for audio) looks comparable to FLAC, so I’ll probably throw that in and see if anything compresses better with that than FLAC.. APE, which others have suggested to me is basically unsuitable due to performance considerations.

Haze January 22, 2012

ok the basic version of this is in now

http://git.redump.net/mame/commit/?id=569d44e26e1b8a575b3d52b2c7bc7eaf665b5e74

obviously I’ll be improving the efficiency of the code with some of the suggestions above now, but the format is unlikely to change :-)

me January 22, 2012

Good point I had forgotten about the performance one. One of my many reasons (mostly I lack time) why I had never really committed to it. I think just the zlib replacement (using 7zips equivalent) would be a nice to have. Then leave the decompresser that is there in place (unless it is slower). Does the decompresser in the core have a bit of cache in the middle (to avoid decompressing over and over)? That would be useless for things such as a streaming sort of situation. But for program data it would probably work out ok? Course that would mean some sort of hash lookup and what not… Then the question also is how much cache to use? Too much and you are just wasting memory. Too little and you are thrashing the cache.

Took a look at the patch. Looks decent. Only comment would say in a few functions you pass in a void * instead of a void for the last item set the type instead of having to cast it. Or inside the functions create a local var to just hold it as the right type. Would help readability a bit. Very cool stuff :)

Haze January 23, 2012

Yeah the CHD code caches the last block at least, so that it doesn’t need to decode it for every read request.

Turning *off* the cache would probably give a better indication of what the performance cost of the CHD code is tho.

Yep, there are various ways in which the code could be cleaned up, I wanted to get the functionality in first.

The use for Samples needs some work too, right now MAME has the filenames of the samples hardcoded (with extension) in the drivers, instead they should probably just be the basenames, with the emulator checking for the supported formats. Also it’s about time I updated the sample code to support stereo, the tape loop recorded for one game has already had to be split into 2 channels as a result of the limitation.

Beyond that I’m also going to look at multi-channel audio in MAME as a whole, it would be nice to be able to properly support the titles which had more than 2 speakers (TX1 and the like)

Huggybaby January 23, 2012

Haze, eventually my idea will converge with your inspiration, so I remind you again of my request to at least look at adding convolution support. Whereas now we hear the direct output of the sound generator (I think), imagine that sound going first through a filter that sounds like the arcade cabinet and speakers.

The code already exists, for example here’s a GPL licensed app: http://convolver.sourceforge.net/

This type of filtering is not without precedent, it’s analogous to HLSL isn’t it?

Haze January 23, 2012

I’m sure somebody could add support for VST plugins, but I wouldn’t really be keen on the idea myself (I don’t like MAME linking / using any closed library binary blobs, and the license doesn’t really permit it)

As for in MAME itself, yeah I’m sure it could be done eventually, the cabinets / speaker types used do add dynamics of their own to the sound, a number of games in MAME right now have an annoying hiss / high pitched whine simply because the speakers on the real cabinets are incapable of responding to such frequencies and it was never noticed at the time.

Somebody would have to actually write the code tho, the only reason you have HLSL is because MG wrote it, and he’s done a fair bit of work on shaders as part of his job, so it was of interest to him. Sound engineers are probably harder to come by, as most people just use premade solutions these days rather than writing their own.

dave January 26, 2012

Seems createcdflac has been taken out of mame for the time being (something about a compatibility with v4). (So much for doing that git pull!) So if this was still in there, would someone just run chdman -createcdflac input.chd output.chd to get the audio in the chd compressed with FLAC?

Haze January 26, 2012

I don’t think chdman accepts chds as inputs on such options, so you’d have to extract the existing CHD first.

Not sure why Aaron thinks it violates the standards, from my understanding of the existing standard it’s designed so that new compression types can be added, and I’d added a new one, and added a new compression type header.

Obviously older tools won’t decompress that, but they should recognize it’s an unsupported compression type and throw a warning, which is how I thought the system had been designed, I don’t really think it warrants being called a ‘new version’ as such.

I’d actually increased the version number the first time round, for the submission that wasn’t accepted, and decided not to as part of the simplification process of the submission to get it accepted…

My worry now is that Aaron will miss some of the finer details of the implementation if he does it himself, things like needing to only checksum end of track hunk padding data as if they were padded to the old hunk size, which might only be needed to do an annoying lack of foresight with the original CD addition, but will completely break compatibility with all the SHA1s in the softlists if it’s changed now (basically invaliding them all) That wouldn’t be good. The same encoded image should ALWAYS have the same checksum regardless of format version otherwise -romident will fail to correctly identify a CHD of different version.

Likewise, if he changes the code that shuffles the WAV data around so that the sub-data ends up being encoded as part of the FLAC stream it will ruin the compression by a significant amount over the course of everything…

So I’m kinda hoping he just bumps the version number and calls what I sent the standard, it’s also tested, and trusted on a large number of images. I designed everything to be as painless as possible, and minimal impact to everybody, with nobody being forced to upgrade CHDs or lists.

But I guess we’ll see… quite how the lists will get updated if the SHA1s change I don’t know, I don’t even have the sets anymore, needed the HDD for an emergency repair of another machine (the current HDDs being sold are not only overpriced, but absolute junk quality, no wonder everybody has dropped the warranty to 1 year)

etabeta January 27, 2012

@Haze: rather than ranting here have you considered sending an email to Aaron explaining which details are important to be kept?

Also, you are mixing up different critics to the original submission.
I think your code was “violating standard” because you said modeled it on the MESS tools while you should have modeled on the other libraries used (e.g. softfloat) and apparently the code did not compile on MSVC.
Anyway, this was already rectified by Aaron the commit before the one in which he disabled the option. The latter temporary removal is due to Aaron considering a bit hacky the half chunk code. I haven’t checked the code so I cannot judge it, nor I have any idea of what Aaron plans to do about it. we will see, I guess.
Anyway you’re code is still available if you roll back the mess svn and copy the library files in an updated tree… so nothing is lost

Finally, the reason why FLAC has not been yet re-enabled is that
1. also Aaron is not willing to force people to update their CHD (Haze’s not the only people thinking to users, what a surprise huh) and the devs to update all checksums, so he’s deciding in which way to implement it
2. in the past two days he has been busy with fixing the device tag syntax

and before people start complaining once again about the n-th refactoring of working code, maybe they shall think that without the updates from old drivers to driver struct, and then to c++ classes and c++ devices, things like deciding how many 1541 drives to connect to a C64 (up to 4), or proper emulation of IEEE expansions in VIC20 and PET, or proper video cards in the PC slots would have never been possible, since the slot expansion code indeed would have been impossible to implement with the old core.

I hope this will give something to think about to fanboys that think core rewrite are pointless waste of time…

Huggybaby January 27, 2012

etabeta, you’re a blowhard. It’s haze’s blog and he can fucking rant as much as he wants. If he “rants” on YOUR blog then you have a say.

Nice try, but nobody ever said refactoring per se was a bad thing. But anybody who reads the release notes can easily deduce that all this refactoring is an ad hoc one man show, with no long term plan other than to scratch whatever itch Aaron has today, which has caused sets to needlessly break over and over again. But now you say the latest changes by Haze are no good because they break sets. Riiight.

And, you claim to be on the side of the end user…too bad the only end users that count, the ones who actually care enough to contribute, to test, to make suggestions, observations and bug reports, are the “fan boys” you denigrate.

If there’s a fan boy here, it is you etabeta. Why don’t you pull your head out of mamedev’s ass and look around once in a while.

The future is here and breathing down your neck. Haze has some awesome ideas and you can fight them but you will lose. The end users want ultimate mame, and flac, and more. But NONE of these truly fresh and useful ideas, ideas that actually enhance the end user experience, not just tickle the fancy of a coder obsessed with form, have come from you, or Aaron, or anyone else.

Haze January 27, 2012

> 1. also Aaron is not willing to force people to update > their CHD (Haze’s not the only people thinking to#
> users, what a surprise huh) and the devs to update > all checksums, so he’s deciding in which way to
> implement it

you don’t have to update the checksum or be forced to do anything with my code, that was the point, that’s how I designed it, that’s why the ‘half block’ thing exists, because of a deisgn flaw in the previous version. I have no desire to put users through the pain of breaking checksum compatibiltiy with previous versions.

If Aaron changes all the checksums then the Saturn list at least is dead, I no longer have ANY of the images, I cannot provide you with updated checksums. As far as I’m concerned I’ve filled my role in that, listed the software, and I’m done with it. Also it will just piss people off and break compatibility with older versions because people will be forced to upgrade to get things to identify. I wrote the code the way I did for VERY good reasons.

and yes, I modelled it on the MESS build system, which was no big deal, I assumed the MESS system was more modern, like the rest of MESS, turns out it wasn’t… again, having a single project would do wonders for the clarity there.

Haze January 27, 2012

(and when I say previous version, I mean the original CHDCD code RB wrote)

Haze January 27, 2012

basically I’ve provided you with something good, that works, that is compeltely seamless and painless for all users, and it looks like you guys are hell-bent on fucking it up somehow from the quotes and misguided views I’ve seen posted.

Haze January 27, 2012

and if new CHDs decompressing with older builds is such a big problem then just call the compression format with the half block markers sometihng else, so you get an ‘unsupported’ warning if you try and use such a CHD on an old version.

I’d say a small workaround, for a problem which I didn’t even create, if far less of an issue than changing the checksums for everything. It’s storing the exact same ‘real’ data, the checksum should not change.

Haze January 27, 2012

or, if the FLAC mode isn’t used for a CHD, just switch the hunk size back down, that would require a variable rather than a define on the hunk size, but it would give you full compatibility still with older versions as long as FLAC was turned off.

there are multiple ‘good’ solutions to this.

Haze January 27, 2012

(and that’s the other thing I can’t understand, if the half hunk padding checksum thing is what Aaron is taking offense to, why is it the only thing he’s left enabled, ie, by not reverting the hunk size)

The newly created (non-flac) files work fine with older verisons anyway, they just won’t verify as a result of the bigger hunk size, but again, that’s not my flaw in the first place and is a far lesser evil than breaking compatibility completely IMHO.

ben January 29, 2012

Can’t you talk to him to find out?

FLAC is a drop-in alternative to zlib that happens to work well on different types of data. Any argument for increasing the block size to improve compression applies just as well to zlib as to FLAC. Combining these two unrelated things in one patch — adding FLAC and bumping the block size — was probably a mistake. Especially since the benefit from FLAC is huge, up to 40-50%, while the benefit from the larger block size is a few measly percentage points at most.

Haze January 29, 2012

The existing block size was too small for FLAC to be effective (it actually failed to effectively compress most blocks) so had to be boosted. (Optimal is actually double what I’m currently using)

There were actually good benefits to boosting it with zlib as well tho, yes.

The problem was, in the infinite wisdom of whoever wrote the original CD code the CD data/audio tracks were padded to the hunk size used (with code to ignore the padding in the CD code) In the even further infinite wisdom, the padding data was checksummed meaning if you change the hunk size, you change the data checksum which is a terrible thing, because you’re actually still representing the same data.

Now, in an ideal world, that would have never happened, but, it did, and all the existing lists have been built around the CRCs you get with padded data for the old hunk size. Therefore, I had to deal with it, _without_ invalidating all those lists. There were a number of possible approaches, I simple chose to mark the last block so that it knew not to checksum the excess padding and the CRC remained equal. The alternative would be to only compress partial data (padded to the old hunk size), and check if the data being decompressed decompressed to less than the expected hunk size. Note, this applies to both the zlib compressed hunks and the FLAC ones, which is why Aarons simple disabling of the FLAC bit makes no sense, it was done to support the larger hunks, not FLAC.

I consider the problem to be a bug in the original CHDMAN implementation, but short of breaking compatibility with the existing lists (forcing all the CRCs to change, and old CHDs failing to identify because their CRCs are no longer listed) I had to pick a suitable workaround.

I know the checksums will change eventually, once more ‘raw’ scrambled CD dumps are used, but that will happen slowly, one game at a time, if ever, but the checksums will be changing because we’re actually representing different data, not due to a silly padding issue. Hopefully when support is added for multisession discs we don’t find other similar issues :/

Adding the FLAC support was meant to be a nice user friendly option, which people could make use of if they desired. Not something forced upon people.

If the change becomes too aggressive, not just ‘convert/use if you want’ I think the public reception will be more negative.

It’s out of my hands now anyway, but I fear for the worst given MAME’s general ‘F**k You, Deal with it’ attitude towards both code and users. Coders might tolerate it, forced changes to all the code standards, interfaces, names, the code changing all the time, but at least it’s usually clear why. Users won’t tolerate being messed about like that. Hopefully a good balance can be found, and as much as I hate to say it, maybe listing both the legacy and ‘new’ CRCs if they do change (for -ident purposes) would be the most user friendly way to go (and allow getting rid of the initial padding problem for good while retaining the ability to properly identify things) but IMHO is uglier than the solution I came up with.

As I’ve said tho, I can but watch now, I’ve given my input on the issues, given my code, the changes from this point forward are down to the rest of Mamedev, but if they’re expecting all the existing softlists and CRCs to be updated they’re going to have to do that themselves too, I don’t even have the material anymore to help there. I’m done with that side of things now, and I’m going to look at improving the sample inferface instead so that it supports more than single channel wavs (the tape loops are stereo..) and also so that it doesn’t expect the file extensions hardcoded in the drivers.

I guess I’m also not appreciating some of the aggression being thrown my way, as if it was my fault. I’ve said for years I’m not sure the CHDCD standard is currently good enough, but been told basically I was rude and disrespectful for saying that. Now I’ve made a few (large) software lists using CHDs I’m being told I’m stupid and wrong for making those, because the CHDCD isn’t currently good enough. Welcome to trying to work with Mamedev…..

Haze January 29, 2012

Right.. I’m told now that the CRCs / SHA1s will not be changing, that’s good news at least.

The block size will also be bumped back down for the time being, which makes more sense until the FLAC stuff is turned on for real.

ben January 29, 2012

FLAC failed to effectively compress blocks of 2352*4 bytes? That makes no sense, given how FLAC works, and it doesn’t happen in my tests.

Q&D compression ratio test of Ana Ng by TMBG: 2352-byte blocks: 63.8%, 4704-byte blocks: 62.8%,
9408-byte blocks: 62.5%, 18816-byte blocks: 62.6%,
37632-byte blocks: 63.1%, flac.exe -8 -P0: 62.6%. Four-sector blocks are the best for this song, better even than the single-FLAC-stream approach you originally tried.

Looking at your code I do see one bug: you’re passing the block size in bytes to FLAC__stream_encoder_set_blocksize, instead of the size in samples. When I tried that it worsened the compression by about 3%, so it should be fixed, but it’s not enough to explain your problem. The code is so messy (sorry, but it’s true) that there could be other bugs in there. Maybe you accidentally compressed every sample twice when testing the smaller block size, or something like that.

Haze January 29, 2012

the old hunk size 4 * 2352 bytes, and at least on the track I was testing with at the time that didn’t seem to produce good results at all… even just using 8 * 2352 instead of 16*2352 is costing a good 4gb across the PCE set. Maybe it just wasn’t a good test case I was using, or I had another error at the time. Either way, the smaller hunk sizes don’t give as good results, and are less desirable.

I can’t say I know why, the initial tests done there were just part of my feasibility study, to see if the idea was going to work at all, they weren’t even based in the MAME code, but a hacked up version of the standalone encoder / decoder, which I could easily have misunderstood / made a mistake somewhere with.

And yeah, you’re right, block size should be in samples, I misread that as bytes somewhere along the line. That’s an easy enough fix, surprised it has an impact tho, but I guess it must set up some default assumptions based on it. Could even explain the 4gb loss I’m seeing with the lower size anyway.

I wouldn’t really call the code that messy, I’ve had to work with far worse from other projects, including the stuff in the actual FLAC library which is horrendous in places (the stream_decoder stuff is just plain spaghetti code in places, and even RB who fixed it up to compile / link on Linux found some of it questionable). Maybe it’s a bit overly verbose (or at least was) but the majority is just based on the ‘this is how you use FLAC’ examples anyway. I know some of the memory copying could be stripped out, but again I was getting the functionality there before optimizing things and risking breaking it. Personal opinion I guess, I’m used to working out how things work, and making them work, and ensuring the code that makes them work is as straightforward as possible for if somebody else wants to pick it up and clean it up, IMHO it fills that criteria.

Personally I’d rather stick to emulation tasks than core work, but getting things done these days seems to require a hands on approach, even outside of the drivers. Given the constant code churn and change in MAME as a whole, I’ve no idea what the expectations are, people seem to have made a living out of rewriting what’s already there, so they’re welcome to update the code if they don’t like it from a presentation point of view, heck I even did the initial .mak file to be in it’s own sub-folder because I thought that was the new standard, turns out it wasn’t….. It’s not necessarily an arrangement I dislike, I enjoy figuring things out, putting them in code, and having them there for future generations, giving the functionality.. if others want to alter the form that complements what I’m doing nicely. I think one of the big problems MAME has right now tho is that there are too many major players *only* caring about the form, and doing absolutely nothing in terms of the functionality, hence why I’m having to work in areas of the core in the first place.

Ultimately MAME will be judged by users on what it does. If we can get the likes of Raiden 2 or Space Lords working it will mean a lot more to people, and add a lot more value to the project (how they work is then written in stone) than changes which might allow running 2 copies of Pacman side-by-side, because even if that would be a great achievement it adds little value unless you want to open an arcade with a single MAME box driving 16 Pacman cabinets or something ;-)

etabeta January 29, 2012

Haze, no offense meant, but the “F**k You, Deal with it attitude” has never had anything to do with the FLAC code submission and inclusion
I had already written that Aaron was trying not to change CRC to make life easier to the users even before you started complaining, so it’s not that he changed his plans because of your comments

my original point was: if there is any specific detail of your implementation that you fear it gets lost, it would be more effective if you drop a line about it to Arbee or Aaron, instead of commenting about it here.

no more. no less.

p.s. and updates to xml lists are better sent by mail to me, than linked in the shoutbox, where usually they get scrolled away pretty fast ;)

Haze January 29, 2012

No, I’m saying the “F**k You, Deal with it attitude” is more one of MAME in general*, and if it ended up being applied here it would piss a lot of people off. Again tho, I’m getting mixed messages from different people, on one hand I’m being told the CRCs will *definitely* change, on the other I’m told they won’t. It’s incredibly frustrating.

It was bad enough last time all the CHDs changed checksums, but then it was needed, because the old design was inherently insecure (no metadata checksumming, so you could produce broken images which said they were the expected one)

As for email, I didn’t have your address handy, I’ve closed the email account I usually use because of repeated hack attempts / spam, and your PM box was full.

* and it’s not ALWAYS a bad thing, but when you’re talking about something people have reservations over anyway, such as the CHDs, it’s best to tread carefully.

dave January 30, 2012

I was able to get the createflaccd working again (it was just commented out), and ran a little test on my chd collection. After making copies of all chd’s that had an audio track, I extracted and flac created them. For 33 items I went from 9.4GB down to 9.1GB. Not huge but still some savings. Given I don’t have a full set, and I don’t have any of the “dance” ones (I believe those are cds), which I believe would compress quite well, it shows the potential of the flac stuff. I’m pretty sure I was using the larger chunk size (though not positive on how to check that), and I didn’t encounter any errors. The hardest bit was creating a script to locate/duplicate/process the chds and it was pretty good return for the effort.

Haze January 31, 2012

The non-digital dance ones would probably show a good saving, yes, the digital ones are already MPEG on the CD so wouldn’t.

The PCE set, as I’ve said, is where you see the biggest benefit because most of the games are tiny data tracks + audio.

A few things need fixing, as pointed out (mainly a /4 on the blocksize passed to the encoder, because it’s meant to be in samples, not bytes) These will further improve compression, and if Aaron hasn’t modified those once he’s done with it then I’ll send a quick patch to do so.

Haze January 31, 2012

fwiw with the encoder blocksize fixed (samples, not bytes) the test case I’m using ended up 2meg smaller with the smaller hunk size, rather than significantly bigger.. so yeah, that was a pretty nasty bug :-) It effectively renders the small hunk size currently used to be better, at least on audio tracks…

me February 3, 2012

So now there is more than one compression type does it dynamically try more than one or is it the same across the whole archive?

dave February 3, 2012

I just ran the updated chdman over my PCE CD stuff (not a compelte set) and I saw a huge savings, in the 6 GB range. I reran my chdman flac convert script with the updated chdman over my mame chds and in total they came out 100MB larger then last time, which is kind of strange.

Haze February 3, 2012

which version? Maybe the larger block size on zip is actually worse for stuff in the MAME set, in which case sticking with the current size makes more sense.

the *latest* code should be the best, old block size, with bug fix described in previous posts.

should never come out larger with the same block size tho, it attempts both types of compression and picks the best one, so if the old zip blocks were smaller, those get used.

(or are you talking about old flac code with larger blocks vs. new flac code with smaller?)

dave February 5, 2012

So I did a git clone today of mame to make sure I had the most up-to-date code (proper blocksize and smaller hunk size) and none of my changes were causing an issue. I turned the createcdflac on, and ran my conversion script over the mame chds I have that also have audio tracks. These I will call “new FLACs” (hunk size of 9792). I also have my FLACs from back on 1/30, which have a larger hunk size (19584) but also with the improper blocksize (i.e. using bytes instead of samples), and those will be “old FLACs”.

The old FLACs as a whole were smaller then the new FLACs (in total they were 100MB smaller). The new FLACs were the same size as the original CHDs in most cases (some of the new FLACs were 4 to 12 kilobytes smaller then the original CHDs, and I am lumping those in as being “the same size”). Of 33 CHDs only a few (3 or 4) had significant size savings with either the new or old FLACs.

So it seems that the larger hunk size, even with improper block size, had the better compression. I guess my next step is to see if I can figure out where to change the hunk size in the code up to the larger value, and then re-run the conversion script. In that way the blocksize will be correct, and the hunk size will be “better” (at least from my testing so far).

Anyway, interesting stuff (I’m finding it fascinating at least), and keep up the good work.

Haze February 5, 2012

Hmm.. I guess I should look at the MAME CHDs.

It’s possible they were ripped with audio sub-data (which isn’t a bad thing) but given the majority of images used in MESS weren’t I fall back to zip mode for hunks with sub-data. If you’re getting sizes close to the original CHDs that seems the most likely explanation. If that’s the case I can shuffle the sub-data to the end when encoding instead of not bothering with tracks containing it (encoding the blank data inline throws off the algorithm too much in normal no-subdata cases) I could also encode the sub-data part with zlib as a sort of hybrid hunk.

I won’t lie, it was developed more with MESS in mind than MAME, because MESS is where you really have huge numbers of CD based games. I don’t actually keep a complete set of the CHDs needed for MAME around, but I’ll take a look at some point.

The larger block size does work better on (some) zip data at least, which is probably where your 100meg is coming from.

It’s in Aaron’s hands now tho… so without permission to make further tweaks to my code there isn’t actually much I can do until it gets enabled officially, at which point it might be too late to make further improvements.

dave February 5, 2012

And that would be the answer. I just checked (with chdman -info) the three MAME CHDs that had a file size savings with “new FLAC” (as referenced in my earlier post), and all of them had SUBTYPE:NONE. I did a random check of three MAME CHDs that did not have a file savings with “new FLAC” and all had SUBTYPE:RW_RAW for their tracks.

None-the-less the savings dealing with PCE CDs has been significant, and greatly appreciated, and I assume the same can be said with numerous other MESS CD based systems.