David Haywood's Homepage
MAME work and other stuff
February 16, 2012 Haze Categories: General News. 56 Comments on CHD v5

Aaron has checked in a change to the CHD format featuring his further work on the format, and bumping the version to v5.

This includes a refined / cleaned up version of the FLAC support I was working on (which is now official) as well as a number of other improvements (LZMA support based on the 7-zip code is also in there)

I haven’t tested / used this code yet, so I can’t say how compressions compare (if data is being optimally ordered for FLAC etc.) but you should see substantial gains, even over the previous versions I posted due to the use of LZMA which in tests on regular romsets usually yields around a 20% improvement in compression. The LZMA improvements mean there will also be benefits to non-CD CHDs. LD ones are also likely to shrink very slightly as FLAC is being used for the audio part of those now (don’t expect big savings tho, they’ve always been video-heavy and lossless video compression is always going to give big files)

The created CD CHDs are SHA1-compatible with the previous ones, so none of the software lists need updating thankfully.

Just a heads up, because unless there are any bugs this will be the expected version from the next release, as opposed to the previous FLAC trial runs.

In case you haven’t been following any of this here’s a summary:
Old V4 (and below) CHD format used ‘zip’ compression internally.
New V5 format supports ‘zip’ , ‘lzma’ (7-zip) and ‘flac’

FLAC seems to give about 40% better compression than ‘zip’ for audio data (CD AUDIO tracks)
LZMA seems to give about 20% better compression than ‘zip’ on regular data.

Of course some data just plain doesn’t compress well, so you’re not going to see those saving everywhere but you’re likely to see on average a 20-40% reduction in CHD size for all non Laserdisc CHDs.

This naturally makes CHD better for storing large amounts of data, which was it’s original purpose.

56 Comments

You can follow any responses to this entry through the RSS 2.0 feed.

Excellent work :)

Will have to fire up my conversion scripts and see what happens.

looks like Aaron is making the same mistake I made and passing blocksize in bytes, which is costing a bit on the encoding..

// configure the encoder
m_encoder.set_sample_rate(44100);
m_encoder.set_num_channels(2);
m_encoder.set_block_size(chd.hunk_bytes());
m_encoder.set_strip_metadata(true);

he should be alerted to this shortly.

Initial tests indicate even with that fixed, for pure audio CDs compression isn’t *quite* as good as the FLAC-CD code I was using before.

I’d guess (from looking over the code) this is because Aaron has opted for a simpler solution, and is passing cd frame data as-is to the encoder, rather than rearranging it so that the sub-data doesn’t have a negative effect on the compression. Hopefully the LZMA compression makes up for the losses in other areas tho for images which are less Audio-heavy.

“I’d guess (from looking over the code) this is because Aaron has opted for a simpler solution, and is passing cd frame data as-is to the encoder, rather than rearranging it so that the sub-data doesn’t have a negative effect on the compression.”

Any reason the code can’t or won’t be optimized as you originally suggested?

“The created CD CHDs are SHA1-compatible with the previous ones, so none of the software lists need updating thankfully.” Hallelujah!

There is a group of rom collectors who are inordinately excited by having the latest perfect set, and this LZMA business is going to keep them excited for a while! NTM you just gave the internet some of its bandwidth back.

I don’t know, but it’s costing on average 40meg per CD when they’re audio heavy, which isn’t ideal.

I think Aaron wants it more generic and more flexible, less hardcoded for CDs, so it’s probably unlikely he’ll add a mode with the sub-data shifted around.

fwiw using

#define CD_FRAMES_PER_HUNK (8)

in cdrom.h

(or -hs 19584 on the commandline) works better than the default in all cases I’ve tested now for CDs. (and significantly better on non-audio data)

unfortunately, as I’ve said, it’s quite a lot worse for audio tracks than the previous FLAC code, so if you’re expecting savings on audio heavy sets like PCE vs. that then you’re going to be disappointed. It’s still better than the *old* CHDs, but nowhere near as much as it was.

I’ve sent Aaron a mail explaining the problem with the way things have been setup anyway. If there are any further changes to claw back what’s been lost it’s up to him.

btw with 4 frames per hunk LZMA isn’t really much better than ZIP, with 8 the gains are noticable, but unfortunately they’re not around 20% as you get with the MAME set on a whole.

Hopefully these things will be tweaked a bit over the coming days, if not, it’s a shame, but you do still have something which works better than before, although not really living up to it’s full potential.

I don’t mess with CHD’s much. Are they all created using the same compression setting?

If so, it would be better if they were all created with individually optimized settings. These could be easily stored in a text file.

there’s not much difference between settings (at a certain point) compared to the difference between actual approaches, which is why having the data in a less desirable order is causing bigger issues than any setting can resolve.

right now the large hunk size is always better but the actual approach being used to the data ordering is costing ~50meg on an average sized audio CD. As I said, I’ve mailed Aaron about it, I’m crossing my fingers he can implement some data re-ordering at the compression/decompression stage for CDs before this is too extensively used.

Indeed. OTOH anyone who starts recompressing their files at this stage deserves what they get later.

The block size has been changed:
“m_encoder.set_block_size(chd.hunk_bytes() / 4);

Is there some place where all of chdman’s command line arguments are listed? I don’t see any reference to -hs when I fire off chdman.

If someone were to create a cd with the -hs command, would they have to extract it with a similar command or would chdman realize it automatically?

chdman would handle it fine when extracting..

I’m not sure the extract command currently works tho, doesn’t seem to accept what I think is a valid commandline ;-)

the blocksize fix there isn’t the issue I’m talking about anyway, that was just making the problem even worse ;-)

Since multiple compression algorithms are supported now in a CHD, will there be an option in CHDMAN to compress every hunk (regardless of content) with all three algorithms and keep whichever is smallest?

On a related note, I assume the 7-Zip support for ROM archives will only support a subset of the archives that could possibly be produced with the official 7-Zip executable due to non implementation of some of the other algorithms (LZMA2, PPMD, etc…) and features (i.e. solid vs non-solid). Do we know at this point what MAME will support and what is “optimal” for size vs decompression performance?

>>>> Since multiple compression algorithms are supported now in a CHD, will there be an option in CHDMAN to compress every hunk (regardless of content) with all three algorithms and keep whichever is smallest?

that’s what it does, and what my code posted before did too.

>>>>> On a related note, I assume the 7-Zip support for ROM archives will only support a subset of the archives that could possibly be produced with the official 7-Zip executable due to non implementation of some of the other algorithms (LZMA2, PPMD, etc…) and features (i.e. solid vs non-solid). Do we know at this point what MAME will support and what is “optimal” for size vs decompression performance?

It decompresses LMZA2, PPMD and BZ2 internal compression methods just fine, I wouldn’t claim .7z support otherwise ;-) IMHO ‘optimal’ is just LMZA, ultra mode, reasonable dictionary size, ~16meg solid tho. Turning off solid makes it easier to update files however.

*EDIT* Aaron has checked in his own version of this now :-)

essentially to gain FLAC ratios close to what we were getting before you need to do something like this

— removed —

*WARNING* proof of concept code, will create CHDS incompatible with what’s currently checked in.

that quickly rearranges the data before sending it to the FLAC encoder, although it would be more efficient to do it when actually encoding.

Doing this saves you 22Mb on
akumajou dracula x – chi no rondo (scd)(jpn).chd

vs the current checked in code, which is an 8% improvement.

I’ve sent this to Aaron, I’m hoping he can implement something similar ‘his way(tm)’

http://mamedev.emulab.it/haze/files2012/chdoptimal.zip

some tweaks to the encoder defaults, files are 100% compatible, compression is improved.

I have some questions:

A) What about chd mounting in Windows? Are some plans for supporting wincdemu or daemontools? or perhaps a Pismo file mount (http://www.pismotechnic.com/pfm/ap/) plugin, so we get a .bin, .cue files after mouting a chd.
That will make chd more usefull for other people and perhaps cd images are presserved in chd instead of compressed bin/cue with 7z.

B) In your way on make Mame more open minded, what abouting getting GPU acceleration, not to improve graphics but for use that piece of expensive hardware that can make some litte things such as triangles fly…

C) Why are you using 8.3 names for roms in mess. I know that last debate was about what difficult is to rename all sets again, but in mess that has no sense. There’re sets already renamed (No-Intro, Tosec) and those names and understandable by human beings. 8.3 names (reminds me something like hungarian notation) are horrible, i need a tool dependency to know what game is (even if i have 10 or 12 games of a system).

A) Yes, I’d like to see something like that, there are some open source CD mounting programs on Windows, but they need a bit of work. Something like http://wincdemu.sysprogs.org/ as a template for further development would be good.

It’s something I’ll have to look into at some point if nobody beats me to it. Emulators like SSF (Saturn) expect a CD in a real or virtual drive anyway, so it would be ideal to be able to have something capable of doing that with CHDs

B) It’s been much discussed, but never finalized. Basically the plan would be to write a software renderer on the GPU using the high level languages you have available today, which could fallback to standard PC software rendering if no GPU was available. Easier said than done, whatever solution is chosen it has to be both portable, and offer a fallback (much like the dynamic recompiler cores can be built with a C backend)

C) They’re easier to remember, easier to type, and easier to use. Remember all development occurs from the commandline. Assigning an 8 letter name to a set is done for that reason (as well as easy management of bug reports, requiring the setname on which the bugs occur)

It has nothing to do with DOS anymore (clones and actual CHD filenames are even allowed to go >8 letters to ensure neat naming) It’s enforced simply to keep names short and snappy. End users are expected to use a frontend like QMC2 where the actual names don’t matter and will most likely be hidden from them.

Thanks for your answers

>>>> Since multiple compression algorithms are supported now in a CHD, will there be an option in CHDMAN to compress every hunk (regardless of content) with all three algorithms and keep whichever is smallest?

>>that’s what it does, and what my code posted before did too.

100% exact
anyway, I think there is a command to force chdman to use specific algorithms, if the user wants to (e.g. to experiment with the various outputs), or at least this is what I understood from the emails on the list. I still have to wait for my return from this work trip to test

“IMHO ‘optimal’ is just LMZA, ultra mode, reasonable dictionary size, ~16meg solid tho.”

LZMA :)
It will be interesting to see some test results…I’ve never found the Ultra setting to be worth much. Tiny to no savings compared to Maximum, but MUCH longer compression/decompression times.

Fair enough, I’ve not really noticed much difference between Ultra and the setting below as far as MAME is concerned tho, maybe for huge sets you’d notice but I can compress all 15,000 cheat XMLs into a solid 690kb zip and it decompresses with no noticable delay (less than half a second)

Maybe if you’re on a low-end system you might see a real difference, but standard LZMA (yeah I know I keep typing LMZA….) isn’t that slow. the PPMD is slow, granted, I’m not really sure what that’s meant to be optimized for because in every case I’ve tested it also gives worse ratios.

PPMD works best for text files.

Original file:

URL: http: //shakespeare.mit.edu/macbeth/full.html
Compressed gzip on server: 55186 bytes
Decompressed html on disk: 195747 bytes

PPMd Test:

Format: 7z
Compression level: Ultra
Dictionary size: 192 MB
Word size: 32
Solid block size: ‘solid’
SIZE: 37211 bytes

GZIP Test:

Format: GZip
Compression level: Ultra
Dictionary size: 32 KB
Word size: 258 bytes
SIZE: 52530 bytes

LZMA Test:

Format: 7z
Compression level: Ultra
Dictionary size: 64 MB
Word size: 273
Solid block size: ‘solid’
SIZE: 47921 bytes

BZIP2 Test:

Format: 7z
Compression level: Ultra
Dictionary size: 900 KB
Solid block size: ‘solid’
SIZE: 39892 bytes

FWIW, I still doubt I will use 7z for roms, be them for MESS or for MAME. cmpro seems slower when rebuilding to 7z and if you rebuild from 7z it can fail to remove the original files. zip otoh works like a charm and in both my external HDs there is still a lot of spare space, so I doubt I will ever convert my romsets. but I can understand if others prefer to do so

For CHDs things are quite the opposite: FLAC+7z seem to produce really a marvelous improvement over the previous zlib only, at a limited cost (longer compression times, but it’s a one time procedure so not a big deal)!

yeah, some operations will be slower with 7z (removing matching source files from solid zips will be difficult for example, you’d effectively have to decompress everything, and recompress what was left)

right now ClrMame is just calling external 7z packers / unpackers too, rather than handling it internally.

That doesn’t mean 7z ROM support isn’t useful tho, but as I’ve said before people are free to weigh up the pros and cons on their own. Plain zip isn’t going away, although there may end up being a few cases where it doesn’t work due to the size limits.

Interestingly the cheat xmls compress *worse* as PPMD than they do as LZMA, but I guess they’re more ‘code’ than ‘text’

one more q :)

Did you guys switch out the zlib compressor for the 7-zip ver of the same stream type? That should get you a bit more.

I believe it still uses zlib for those cases

Nice work.

FWIW forcing FLAC-BE as the final compression type gives significant savings on the Bemani CHDs too (Beatmania) I noticed ~150mb shaved off the 850mb one.

Well 0.145u1 is now out… let’s see how this has officially landed.

I don’t really consider it all stable enough yet, people have been having issues with the new chd tools and file access. Aaron has seemingly introduced some bugs at the same time as rewriting it (unsurprising, it happens) Also there are new bugs in the FLAC sample handling caused by Aaron’s refactoring of that code too (probably 8-bit samples.. as I found those to work in a slightly unexpected way when I did the code myself)

Not sure why Kale has rushed out a u1 build, he’s even missed a bunch of fixes I gave him for the combined build and some other stuff. IMHO u1 is still pre-alpha quality, no better than a GIT pull.

Anyway, for creating optimal CHDs you’ll probably want to apply this patch:

http://www.sendspace.com/file/ikgp19

It adjusts the default hunk size on CD chds, and uses a hardcoded flac internal block size (more optimal, some images compress 50-60meg better) and turns on FLAC-BE for HDD CHDs (which saves 600meg on bmfinal for example!) Basically it provides more optimal defaults than stock CHDMAN, files are 100% compatible, it has been submitted to Aaron.

The main benefit is still of course the MESS set, where you’ve got several hundred gig shaved off the CD CHDs ;-)

I strongly advise waiting until 0.146 hits before doing any amount of mass conversion of your CHDs, things will no doubt be ironed out between now and then.

After playing with one of the earlier builds. I noticed that you can override the defaults. Yet only have 4 types? Why not just try all the types? I can see why you would want to override them to test. But in normal use why not just do all? Or does that break something?

Also any suggestions on default hunk size for hd? I noticed some decent savings using higher ones. However, that would probably slow some stuff down?

Bigger block sizes aren’t advisable really, as mentioned, you need things to decompress quickly. The moment a game requests a previously undecoded hunk that hunk *must* decode within the space of the frame, including normal emulation overhead. If the block is too big that decompression will be noticable, and cause a stutter, even if the emulation is easily running at over 100%

Double the current CD hunk size seems fine, beyond that is probably going to cause issue.

It’s the same reason APE isn’t really suitable for CHDs, while a PC is perfectly capable of decompressing it in realtime the requirements are significantly higher if you have to decode a large hunk without impact on the emulation.

The current CHD hunk size is quite small, at 4096 bytes, but interestingly a FLAC block of that size on the Beatmania CHDs actually seems optimal (I’m guessing it’s NOT 16-bit stereo samples, so there might be better overall settings, but even as it is there are significant savings)

As for ‘why not try everything’ I guess it’s just a compression speed concern. It already takes a while to try the existing options, adding more would slow compression further. When you’re talking about compressing TB of data that’s quite significant so the routines which are most likely to be useful are prioritized.

Ironically just some days ago I was noticing how we’re lacking a “lingua franca” file format for storing appropriately multimedia-heavy media.
I was ripping some old playstation one games and late ’90 windows games, to play them on a diskless htpc machine.

the most common file formats (i.e. ISO and the BIN/CUE combo) are not exactly meant to be used as a storage technique, with tools creating files heavily departing from the standard (Nero and Toast are the biggest culprits) and non-existing or almost comical storage optimizations: For example, most 7th Level / Disney games (Lion King, Aladdin, etc…) are usually less than 5 Megabytes on their data track, and the remaining 700MB are usually just the soundtrack.

On the other side, we have file formats stemmed from proprietary software like daemon tools, alcohol etc. Poorly documented, and usually associated to warez at similia. Somehow, I doubt that the solution to play my original copy of Final Fantasy IX using a file format that both epsxe and other emulators can digest correctly is “download it from somewhere else”.

Maybe this isn’t something that directly affect MAME or MESS (for now, at least) but it seems that CHDv5 could solve some annoyances that affect most emulation / retro software users.

I hope to see CHD become a decent archival format. It’s still lacking in some features needed for that

(namely multisession support, and ability to handle certain protection types, as well as import/export ability for more advanced image formats capable of representing those things)

There are plans for allowing it to store lower level ‘RAW’ CD images, although you’ll probably find those are significantly *bigger* than standard CD images (similar to SPF files for Amiga)

I also hope for CHD to become a decent archival format and to gain more use outside of MAME. Taking a quick look at the CHD code, I can see it and its dependencies spread across atleast tools, lib and osd. Perhaps it could be made a bit more standalone by giving it its own folder and makefile like ldplayer?

The way MAME builds is up to Aaron. Ldplayer and the like get missed for fixes so often it’s not funny tho.

Yeah, I’ve seen, which is another reason I’m saying this should have had some more internal testing before going out.

Aaron turned a trivial feature addition most people probably wouldn’t have actively noticed* (flac compressed blocks) into a complete rewrite of the format, so this kind of thing is expected, but I would have definitely delayed u1 by a week for more internal test time, most of these would have been caught and fixed before people started converting what they have that way.

*unless they attempted to use them on older versions

As I’ve said several times, nobody should be mass converting anyhing at this point. The defaults are not optimal with the current CHDman (you need my patch, and a forced rebuild of the CHD code after it) and there are bugs which need squashing.

Converting *now* just means you’ll spend a day or two converting, then even longer reconverting (because the new ones are a bit slower to decompress) and possibly find you’ve trashed several CHDs in the process.

Apparently the SHA1 updates on some older CDs *are* expected, because they used old bad metadata formats, which got updated. Worrying some of those are recent dumps (I noticed it myself with the CHDs added in 0.145, people were making them with 3-4 year old, insecure versions of CHDMAN!) SHA1s for these should have been updated in the source by this logic, but again because u1 was pushed out prematurely there wasn’t a chance.

The SHA1 updates on GD-ROMs are *not* expected at this time. That is going to need investigation

The SHA1 update on LDs are *not* expected at this time, likewise that needs investigation. (probably just a metadata change tho)

The random hangs (thread safety issue?) during conversion of some HDDs are *not* meant to happen, obviously.

You have been warned.

Also several other features are broken in u1 owing to bits of code being rewritten.

MAME only reports SHA1 errors when loading roms now for example, even if a CRC32 is present and fails to match. This is, as I’ve found, a bit annoying when you’re adding new clones and want it to report errors.

Code I’d sent to improve the stability of the Ultimate build, as well as some other driver improvements were also missed.

Hopefully people get the picture that using u1 at this very moment in time is not recommended ;-)

Right.. the GD-ROM SHA1s *are* also expected to change they were using the old metadata format, which no longer exists.

The LD issue and random hanging on conversion of some HDDs remain outstanding tho.

just to sum up the current situation about v5 conversions: the main problem reported so far is chdman randomly hanging forever (but without corrupting any file) and requiring you to kill the process.

concerning SHA1s
1- CHD containing HD images: no problem reported with the conversions
2- CHD containing CD images: all CHDs update with no issues, but some CHDs were created by dumpers using very old chdman and as a result they did not take into account the pregap/postgap data; after being updated these CHD will have a different SHA1, but it’s not a mistake because they need to be redumped (next MAME version will have updated SHA1 and a BAD_DUMP flag)
3- CHD containing GD-ROM images: also in this case, all CHDs update with no issues, but some CHDs were created by dumpers using very old chdman; after being updated these CHD will have a different SHA1, but it’s not a mistake (next MAME version will have updated SHA1)
4- CHD containing LaserDisk images: a few CHDs get converted with wrong SHA1 (but contained data remain perfectly fine, it’s just a metadata issue) and few others are reported to be converted to a broken image (but without any damage to the original file)

for CHDs which got diff SHA1s in cases 2 & 3, you will have anyway to extract and recompress the CHDs either with any-post 2009 chdman or with the latest to get proper metadata and to have the right SHA1s that will required in next MAME updates. in this case, you can well use latest chdman to get a better compression

summing up, there are no big risk converting CHDs, but I’d strongly suggest to skip laserdisks for the moment or at least not to delete the original files for them :)
concerning fixing chdman, Jurgen has reported a few memory issues spotted by valgrind which might be the reason for all the problems with LD metadata and hangs (+ Apple GCC miscompiling chdman with default OPTIMIZE values)

Converting now you’d still be converting to non-optimal settings tho.

optimal setting + improvements to use FLAC for HDs too just hit the repository

as usual, it’s just a matter of waiting for cool things to happen, rather being too impatient ;)

when there’s absolutely no acknowledgment of what you’ve sent I wouldn’t call it impatient, just frustrating because it leaves me in a position of not knowing if a) the submission has been seen, b) if there was something ‘wrong’ with the submission, c) what I should do next.

note, others have found the same thing when trying to work with Mamedev, not even as much as a ‘thank you for your mail’

it’s not a new problem, this is exactly how mamedev ended up with so many bad dumps of unique games back in the day. Nobody acknowledging they were dumped, nobody looking at the dumps, no indication to the dumpers that they should redump something or keep the PCBs.

anyway, yes, a reworked version of the submission is in, and also of note the way Aaron implemented FLAC on the HDDs means you *might* see some benefits on other CHDs too, not just the Beatmania ones (although I doubt many have raw audio in a suitable LE format, but we’ll see)

It seems a “chdman copy” still defaults to a 9.792 hunk size when converting v4 CHDs, while a “chdman createcd” uses the optimized 19.584.

I believe that’s expected behavior, the copy command gets it’s hunk size from the source CHD

// process hunk size
UINT32 hunk_size = input_chd.hunk_bytes();

You could make a case that it’s not ideal behavior, but it does look intentional.

For CDs extract + create is probably the way to go. Hopefully none (currently) have data which isn’t representable when extracted to cue/bin (I know some were toc/dat sourced, so it’s possible) CHDMAN should probably warn you if you’re trying to extract data which can’t be represented, but I have a strange feeling it doesn’t ;-)

(ok, confirmed, extracting to cue/bin can be lossy, without warning, don’t do it, I’m not sure the files produced are even valid.

Anything with

TRACK:xx TYPE:AUDIO SUBTYPE:RW_RAW FRAMES:2086 PREGAP:0 PGTY

MUST be extracted to toc/dat (because it would have almost certainly been sourced from such)
)

Is CHD v5 considered safe with the release of u2?

I don’t think it’s safe for Laserdiscs yet.

Also as mentioned above, if you’re *converting* CDs you really want to extract them to cue/bin or toc/dat (depending on how they were sourced) then re-encode them for best results.

Finally for cases where the hashes DO change (old metadata CHDs in MAME) the MAME sources haven’t been updated with the new SHA1s so it won’t recognize them…

I’d still hold on a while

Thanks haze.

RB also claims it might not be safe for GD-ROM ones either, because the previous conversion code only worked ‘by chance’

Great job with FLAC Haze, hope it’s all comming together soon.
According to the new maintainer, FLAC is getting close to the first release in four years:
http://lists.xiph.org/pipermail/flac-dev/2012-February/003225.html
Perhaps you can try and get some of your changes included before the release?

I think it’s probably safer if we stick with the established release until a new one is better field-tested.

The 4 year old version is proven for 4 years to be safe and reliable , and aside that dodgy seek behavior I was seeing (which we don’t even use in the new spec) it fits our needs well. MAME doesn’t really have anything to submit back upstream at this time, we’re simply using the stock library with a couple of little tweaks to keep it compiling with the tools / platforms commonly used by MAME.

Might be that the new code makes no difference in MAME cases either, but we’ll see what happens when it’s released :-)

By continuing to use the site, you agree to the use of cookies. more information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close