Difference between revisions of "Sims 3:DBPF/Compression"
(→Control Characters) |
(→Overview: Fix the byte order of the header, with note on Uncompressed Size) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
+ | {{TS3ModdingHeader}} | ||
+ | |||
+ | ==Overview== | ||
+ | |||
Based on the write-up of prior [[DBPF/Compression]] formats. | Based on the write-up of prior [[DBPF/Compression]] formats. | ||
− | |||
The idea behind the compression is to reuse previously decoded strings. For example, if the word "heureka" occurs twice in a file, the second occurrence would be encoded by pointing to the first, thus lowering the size of the file. | The idea behind the compression is to reuse previously decoded strings. For example, if the word "heureka" occurs twice in a file, the second occurrence would be encoded by pointing to the first, thus lowering the size of the file. | ||
Line 11: | Line 14: | ||
Thus, the algorithm to decompress these files goes like this: | Thus, the algorithm to decompress these files goes like this: | ||
− | Read the header, which is formatted like so: | + | Read the header, which is formatted like so (these are '''BYTE'''s): |
− | Offset 00 | + | Offset 00 - Compression type (0x10 | 0x40 | 0x80) |
− | + | Offset 01 - 0xFB | |
Offset 02 - Uncompressed Size of file (4 bytes if type contains 0x80, otherwise 3 bytes) | Offset 02 - Uncompressed Size of file (4 bytes if type contains 0x80, otherwise 3 bytes) | ||
+ | IMPORTANT: These bytes are arranged in Big-Endian; the byte with highest offset is the least significant byte. | ||
+ | |||
+ | ''Note: Offsets 00 and 01 may be read as a WORD, in which case the Hi octet will be 0xFB, and the Lo octet will be the compression type.'' | ||
After the header (Offset 5 or 6, depending on the compression type flags) is the start of the actual compressed file data, which is handled like so: | After the header (Offset 5 or 6, depending on the compression type flags) is the start of the actual compressed file data, which is handled like so: | ||
Line 25: | Line 31: | ||
- Read 0-''n'' characters from source and append them to the output. (''n'' being the "how many" data from above) | - Read 0-''n'' characters from source and append them to the output. (''n'' being the "how many" data from above) | ||
- Copy 0-''n'' characters from somewhere in the output to the end of the output. (''n'' in this case is the | - Copy 0-''n'' characters from somewhere in the output to the end of the output. (''n'' in this case is the | ||
− | } | + | } |
=Control Characters= | =Control Characters= | ||
Line 120: | Line 126: | ||
In addition to compression tagged as 0x10FB, there are other values that can be orred into the 0x10 byte. | In addition to compression tagged as 0x10FB, there are other values that can be orred into the 0x10 byte. | ||
− | * 0x40 : Sims 3 seems to have two compression and decompression routines. The coding is identical between them, however data tagged with 0x40 only uses a subset of the available codes, and limits the window to a much smaller size than would otherwise be possible. If data is written that goes beyond these limits, it will crash Sims 3. | + | * 0x40 : Sims 3 seems to have two compression and decompression routines. The coding is identical between them, however data tagged with 0x40 only uses a subset of the available codes, and limits the window to a much smaller size than would otherwise be possible. If data is written that goes beyond these limits, it will crash Sims 3. (Q: What codes are restricted? What window size is allowed?) |
* 0x80 : If the uncompressed data is longer than 16mb, the size won't fit in the normal 3 bytes in the header. Adding 0x80 in to the compression type increases the uncompressed size field to 4 bytes. | * 0x80 : If the uncompressed data is longer than 16mb, the size won't fit in the normal 3 bytes in the header. Adding 0x80 in to the compression type increases the uncompressed size field to 4 bytes. | ||
Line 132: | Line 138: | ||
*[[DBPF/Compression]] | *[[DBPF/Compression]] | ||
− | + | ||
− | + | {{TS3ModdingHeader}} |
Latest revision as of 19:16, 10 February 2015
Tutorials by Category | |
---|---|
CAS | Patterns | Objects | Building | Worlds | Modding | Modding Reference |
Contents |
[edit] Overview
Based on the write-up of prior DBPF/Compression formats.
The idea behind the compression is to reuse previously decoded strings. For example, if the word "heureka" occurs twice in a file, the second occurrence would be encoded by pointing to the first, thus lowering the size of the file.
The compression is done by defining control characters that tell three things:
- How many characters of plain text that follow should be appended to the output.
- How many characters should be read from the already decoded text (and appended to the output)
- Where to read the characters from in the already decoded text.
Thus, the algorithm to decompress these files goes like this:
Read the header, which is formatted like so (these are BYTEs):
Offset 00 - Compression type (0x10 | 0x40 | 0x80) Offset 01 - 0xFB Offset 02 - Uncompressed Size of file (4 bytes if type contains 0x80, otherwise 3 bytes) IMPORTANT: These bytes are arranged in Big-Endian; the byte with highest offset is the least significant byte.
Note: Offsets 00 and 01 may be read as a WORD, in which case the Hi octet will be 0xFB, and the Lo octet will be the compression type.
After the header (Offset 5 or 6, depending on the compression type flags) is the start of the actual compressed file data, which is handled like so:
{ - Read the next control character. - Depending on the control character, read 0-3 more bytes that are a part of the control character. - Inspect the control character. From this, find out how many characters should be read and where from. - Read 0-n characters from source and append them to the output. (n being the "how many" data from above) - Copy 0-n characters from somewhere in the output to the end of the output. (n in this case is the }
[edit] Control Characters
There are 4 types of control characters. These are used with different restrictions on how many characters that can be read and from how far behind these can be read. The following conventions are used to describe them:
- CC length
- Length of control character.
- Num plain text
- Number of characters immediately after the control character that should be read and appended to output.
- Num to copy
- Number of chars that should be copied from somewhere in the already decoded output and added to the end of the output.
- Copy offset
- Where to start reading characters when copying from somewhere in the already decoded output.
- This is given as an offset from the current end of the output buffer, i.e. an offset of 0 means that you should copy the last character in the output and append it to the output. And offset of 1 means that you should copy the second-to-last character.
- byte0
- first byte of control character.
- Bits
- Bits of the control character.
- p - num plain text
- c - num to copy
- o - copy offset
- i - identifier.
Note: It can sometimes be confusing when a control character states that you should copy for example 10 characters 5 steps from the end of the output. Clearly, you cannot read more than 5 characters before you reach the end of the buffer. The solution is to read and write one character at the time. Each time you read a character you copy it to the end thereby increasing the size of the output. By doing this, even offset 0 is possible and would result in duplicating the last character a number of times. This is utilized by the compression to recreate repeating text, for example bars of repeating dashes.
[edit] 0x00 - 0x7F
CC length: 2 bytes Num plain text: byte0 & 0x03 Num to copy: ( (byte0 & 0x1C) > > 2) + 3 Copy offset: ( (byte0 & 0x60) < < 3) + byte1 + 1
Bits: 0oocccpp oooooooo Num plain text limit: 0-3 Num to copy limit: 3-11 Maximum Offset: 1023
[edit] 0x80 - 0xBF
CC length: 3 bytes Num plain text: ((byte1 & 0xC0) > > 6 ) & 0x03 Num to copy: (byte0 & 0x3F) + 4 Copy offset: ( (byte1 & 0x3F) < < 8 ) + byte2 + 1
Bits: 10cccccc ppoooooo oooooooo Num plain text limit: 0-3 Num to copy limit: 4-67 Maximum Offset: 16383
[edit] 0xC0 - 0xDF
CC length: 4 bytes Num plain text: byte0 & 0x03 Num to copy: ( (byte0 & 0x0C) < < 6 ) + byte3 + 5 Copy offset: ((byte0 & 0x10) < < 12 ) + (byte1 < < 8 ) + byte2 + 1
Bits: 110occpp oooooooo oooooooo cccccccc Num plain text limit: 0-3 Num to copy limit: 5-1028 Maximum Offset: 131071
[edit] 0xE0 - 0xFB
This is the simplest form of control character. The only thing it does is tell how many plain text characters follow. The formula for this is: (C - 0xDF) * 4. Thus a value of 0xE0 means that you should read 4 characters of plain text and append to the output.
CC length: 1 byte Num plain text: ((byte0 & 0x1F) < < 2 ) + 4 Num to copy: 0 Copy offset: -
Bits: 111ppppp Num plain text limit: 4-112 (Multiples of 4) Num to copy limit: 0 Maximum Offset: -
[edit] 0xFC - 0xFF
CC length: 1 byte Num plain text: (byte0 & 0x03) Num to copy: 0 Copy offset: -
Bits: 111ppppp Num plain text limit: 3 Num to copy limit: 0 Maximum Offset: -
Compressed data MUST end with a code in the range 0xFC to 0xFF. If the data is an exact fit to the size, 0xFC can be used as a null code. While community tools properly handle data without the ending byte, Sims 3 will happily keep reading until it encounters it, usually resulting in a crash.
[edit] Compression Types
In addition to compression tagged as 0x10FB, there are other values that can be orred into the 0x10 byte.
- 0x40 : Sims 3 seems to have two compression and decompression routines. The coding is identical between them, however data tagged with 0x40 only uses a subset of the available codes, and limits the window to a much smaller size than would otherwise be possible. If data is written that goes beyond these limits, it will crash Sims 3. (Q: What codes are restricted? What window size is allowed?)
- 0x80 : If the uncompressed data is longer than 16mb, the size won't fit in the normal 3 bytes in the header. Adding 0x80 in to the compression type increases the uncompressed size field to 4 bytes.
[edit] Example Code
Example code for the Sims 2 variety of DBPF compression can be found at DBPF/Compression.
[edit] See Also
Tutorials by Category | |
---|---|
CAS | Patterns | Objects | Building | Worlds | Modding | Modding Reference |