Disks

Chasing Ghosts: Decoding ID3 Tags

Listen

Occasionally, I find myself digging in places I wish I never knew about. When I get to the bottom of such holes, I usually enjoy the journey and learn a lot from it. This post is my tribute to one of these times. So next time I start digging, I won’t wish otherwise ๐Ÿ˜† To my credit, I had this mindset when I began exploring this time and constantly took notes to turn it into a blog post.

More than 10 years ago I wrote a small Python script to convert SYLT frames from ID3 tags to LYRIC3v2 format. I completely forgot about it. It used mutagen to parse the ID3 tag, and I wrote the LYRIC3 part by hand.

The code

This was the gist of it.

import mutagen
import math

song = mutagen.File(file_path)
lines = song.tags['SYLT'].text

header = 'LYRICSBEGININD00003110LYR'
footer = 'LYRICS200'

lyrics = []
for line in lines:
    text = line[0].strip('\n')
    time = math.floor(line[1] / 1000)
    formatted_time = '[%02d:%02d]' % (time // 60, time % 60)
    lyric_line = '%s%s\r\n' % (formatted_time, text)
    lyrics.append(lyric_line)

lyrics = ''.join(lyrics)
start = '%s%05d%s' % (header, len(lyrics.encode('iso-8859-2')), lyrics)
tag = '%s%06d%s' % (start, len(start.encode('iso-8859-2')), footer)

with open(file_path, 'rb') as f:
    content = f.read()
    with open(output_path, 'wb') as f2:
        f2.write(content)
        f2.write(tag.encode('iso-8859-2'))

It’s not much: parse, do some string formatting, and write the required bytes to the end of the file.

A few days ago, this script’s primary (only ๐Ÿ˜…) user came back and told me something broke. After some back and forth, I obtained the files to test and confirmed it wasn’t working.

Down the rabbit hole

I chased a couple of ideas till I found the right one.

1) Maybe the SYLT tags were no longer present in the files

This was quickly refuted by opening them up in a hex editor. They were there all right.

2) SYLT specs must have changed

This belief lasted too long, as I was too lazy to look at the specification if anything changed significantly. I decided to look into parsing it with other libraries. There must be some out there.

I did check the specs later, and they haven’t changed since 1st November 2000 ๐Ÿคช

3) Let’s try other libraries

After some searching, I settled on two libraries. Neither parsed the SYLT frames. tinytag didn’t parse these frames at all, while eyed3 only gave me the raw frame bytes, with a happy message: Frame 'SYLT' is not yet supported, using raw Frame to parse.

4) Let’s combine two libraries ๐Ÿง

This was my attempt.

from mutagen.id3 import SYLT
from mutagen.id3._tags import ID3Header

import eyed3
FILE_PATH = "./data/file.mp3"

file = eyed3.load(FILE_PATH)
sylt_frame = file.tag.frame_set[b'SYLT'][0]

h = ID3Header()
h.version = (2, 4, 0)

lyric = SYLT._fromData(h, 0x0, sylt_frame)

This was a total failure. I quickly found myself bit hacking, adding a null byte here and ignoring a few bytes there. None led to the solution. I took a deep breath and dove further down.

Even if this one worked, I wasn’t comfortable using _fromData and ID3Header from _tags. Python people would probably frown at this, but I believe in keeping things private ๐Ÿ˜‡.

5) Okay, let’s look at the parsing

Since mutagen was the original choice, I dove right into it. Identifying the parsing bits didn’t take long.

class SynchronizedTextSpec(EncodedTextSpec):
    def read(self, header, frame, data):
        texts = []
        encoding, term = self._encodings[frame.encoding]
        while data:
            try:
                value, data = decode_terminated(data, encoding)
            except ValueError:
                raise SpecError("decoding error")

            if len(data) < 4:
                raise SpecError("not enough data")
            time, = struct.unpack(">I", data[:4])

            texts.append((value, time))
            data = data[4:]
        return texts, b""

Interestingly, the error raised was not enough data, and indeed the last iteration of this decoding only left 1 byte in there. I don’t know how that happened; maybe the SYLT writing software was less strict in keeping rules. That software parsed it fine, probably ignoring errors.

I wanted to fix the input file by hand by rewriting bytes here and there to make it reference-compatible. And for that, guess what you need? The ID3v2.4 reference.

6) Time for the reference

No problem, it’s at https://id3.org/id3v2.4.0. As I headed down there, this greeted me.

To their credit, the HTTP -> HTTPS redirection worked ๐Ÿ™ƒ

Luckily, it’s all on GitHub as well ID3v2.4.0-frames. A quick search through this yielded the bytes I had to nudge. I ended up removing the extra 1 byte the parser complained about. It was a useless \n anyway. Now, if you have done byte digging, you know that removing or adding something isn’t that easy. There is usually a length somewhere or a checksum you must fix.

There is no checksum in this case, and the length is at the beginning of the SYLT frame. Adjusting that yielded a parsable file.

Nice work! ๐Ÿ˜Ž

Towards the final solution

There are a couple of ways this could continue.

1) Check the app that generated this file and see if that \n can be removed. Maybe it’s a dangling new line. This solution requires the least work from my side but depends on the app user to watch out for dangling newlines.

2) Ignore incorrect “packets” during parsing. Rewriting the library to ignore the error parsed the file immediately. This is great, but it requires monkey patching a dependency. Ouch ๐Ÿ˜Ÿ

3) Fix the file before parsing. This isn’t that hard; given the spec, I already know which bytes to change and can quickly code the function that does it.

4) Write a custom SYLT parser. No external dependencies for the win. This one is the most interesting. I have been working a lot with AI code generation lately, so I tried this with ChatGPT 4 with Bing browsing.

You are a Python expert specializing in writing parsers. Based on this documentation: https://raw.githubusercontent.com/id3/ID3v2.4/master/id3v2.4.0-frames.txt 
Write me a SYLT frame parser in Python. The output should be a list of tuples containing a line and the timestamp.

Here is what it came up with.

def parse_sylt_frame(sylt_frame_data):
    pdb.set_trace()
    sylt_frame = io.BytesIO(sylt_frame_data)

    # Determine the text encoding
    encoding_byte = sylt_frame.read(1)
    encoding = {
        b'\x00': 'iso-8859-1',
        b'\x01': 'utf-16',
        b'\x02': 'utf-16be',
        b'\x03': 'utf-8'
    }.get(encoding_byte, 'iso-8859-1')

    # Read the remaining header
    language = sylt_frame.read(3).decode('iso-8859-1')
    timestamp_format = sylt_frame.read(1)
    content_type = sylt_frame.read(1)
    content_descriptor = read_string(sylt_frame, encoding)

    entries = []

    while True:
        text_entry = read_string(sylt_frame, encoding)
        if not text_entry:
            break  # End of frame
        timestamp = int.from_bytes(sylt_frame.read(4), 'big')
        entries.append((text_entry, timestamp))

    return entries

def read_string(stream, encoding):
    bytes_ = bytearray()
    while True:
        char = stream.read(2 if 'utf-16' in encoding else 1)
        if char in (b'\x00', b'\x00\x00'):
            break
        bytes_.extend(char)
        print(bytes_)
    return bytes_.decode(encoding)

It’s pretty good. It ignores that a frame has a 10 byte header and gets into an infinite loop in read_string() ๐Ÿ˜…. Both can be fixed relatively quickly and by the AI model. It did take me a couple back and forth to get it here, but it sure was faster than I would have been.

I might end up using this generated parser. Obviously, do not do this in production. Parsing data is complex, and AI-generated code is not on the level to account for edge cases and security. It will get there shortly, though.

This journey has ended, and most of the learning is already taken out.

I consider this problem solved. Now, someone has to type this in ๐Ÿ˜ƒ

It’s been a nice ride. ๐Ÿ™

Time log: ~3 hours including writing this post.

P.S: I wondered why and how this happened and kept digging more. The root cause turned out to be encoding. The app that handles the encoding is slightly less strict than mutagen. Everything clicked again after removing offending characters and sticking to latin1 (iso-8859-1) instead of utf-8. And that was the ultimate solution.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *