I have two Karaoke players, a Songken MD-388 and a Malata MDVD-6619. Between the two of them they have all the features I think I need from Karaoke players. These are
The Malata is really good in that it shows the notes of the melody and also shows the notes that you are singing. But it has a pathetic range of English songs and doesn't show the Pinyin for the Chinese songs. The Songken has a good selection of both and shows the Pinyin, but doesn't show the notes and has a simplistic scoring system.
So I want to take the songs off my Songken DVD and either play them on the Malata or on my PC. Playing them on my PC is preferred because then I am only limited by the programs that I can write and am not so dependent on the vendor's machines. So my immediate goal is to get the songs off the Songken DVD and start playing them in the ways that I want.
The files on the Songken DVD are in DKD format. This is an undocumented format probably standing for Digital Karaoke Disk. Many people have worked on this format, and there has been much discussion in forums such as the Karaoke Engineering. These include Understanding the HOTDOG files on DVD of California electronics , Decoding JBK 6628 DVD Karaoke Disc and Karaoke Huyndai 99 .
When I started looking at my disk, I went about it in a different direction to many of the posters in these forums. Also, the results in the forums were presented in an adhoc and often confusing manner - as could be expected. So I ended up re-inventing a lot of what had already been discovered, as well as coming up with some new stuff.
In hindsight, I could have saved myself weeks of work if I had paid proper attention to what was said in the forums. So this document is my attempt to lay out the results in a simple and logical enough way so that people trying to do similar things with their own disks can easily work out what is applicable to their situation and what is different.
What this document will cover is
Then I want to follow up with what I have done with this:
Apple have claimed a patent METHODS AND SYSTEMS FOR PROVIDING REAL-TIME FEEDBACK FOR KARAOKE It states
Systems and methods for providing real-time feedback to karaoke users are provided. The systems and methods for providing users with real-time feedback while they are singing karaoke generally relate to receiving the user's vocals, determining whether the user is singing on key/pitch and providing real-time feedback to the user while the karaoke song is being sung. The feedback will be positive feedback if user is on key/pitch and it will be negative feedback if user is off key/pitch.This would appear to make any attempts to display the notes actually being sung versus the correct notes to be covered by this patent. If that is the case, then it should be invalidated by the prior art of the Malata machine (from 2006) having this capability before the Apple patent was even filed. Apple states in the background section that
[0005] Current karaoke systems, however, do not address one of the biggest obstacles faced by amateur singers: singing on key/pitch. As a result, karaoke users seldom improve the quality of their singing.which the Malata system shows is false.
Claim 6 of the patent says
The method defined in claim 1, wherein providing comprises: playing audible feedback signals to the user.and Claim 7 whch says
The method defined in claim 1, wherein providing comprises: playing positive feedback audible signals when the user is on key/pitch; and playing negative feedback audible signals when the user is off key/pitch.and an explanation suggest that this can be done by exaggerating the vocal output:
"For example, if the user is singing 20 Hz high, the voice signal can be changed to 60 Hz high. Control circuitry 210 can output the exaggerated voice through audio output 202 "
Now I'm not going to be doing any of that auditory feedback - if anything, I will be doing the Malata-style feedback. So I don't think I will be in breach of this patent because I will not be doing the same as the patent claim.
Update:
I found Australian Patent AU-B-10227/92 filed on 14/1/92 by Mihoji Tsumura and
Shinnosuke Taniguchi entitled
"Lyric Display for Karaoke" which states in the Summary
The US equivalent patent is #5208413 and there are many others for this.
I think this confirms the lie in the claim by Apple that there was no prior art. Really, the current state of software patents and how poor the vetting process for new granting new patents is really sucks. Companies having to build "patent portfolios" to guard against patent trolls and even other so-called reputable companies is a waste of money that could be used to foster innovation.
The substance of the Tsumura claim is what I am trying to do. The duration of patents in Australia is 20 years from the date of filing, which means it is now out of the patent period. So there.
Isn't it illegal to copy your DVDs? Not in Australia, under the right conditions ( Copyright Amendment Act 2006 - FAQs):
Will I be able to copy my music collection onto my iPod?
Yes. You can format-shift music that you own to devices such as an MP3 player, X-Box 360 or your computer.
I am just copying the music I legally bought from the Songken DVD to my computer for personal use. That is within the revised Australian copyright act. You should check if your country allows the same rights.
Don't ask for any copies of the files off my DVD. That would be illegal, and I'm not going to do it.
My Songken DVD disk contains these files:
BACK01.MPG
DTSMUS00.DKD
DTSMUS01.DKD
DTSMUS02.DKD
DTSMUS03.DKD
DTSMUS04.DKD
DTSMUS05.DKD
DTSMUS06.DKD
DTSMUS07.DKD
DTSMUS10.DKD
DTSMUS20.DKD
This is the MP3 file that plays in the background
These are the song files. The number of these depends on how many songs are on the DVD.
No-one has worked out what this file is for yet.
This file contains the list of song number/song title/artist as given in the song book. The song number in this file is one less than the song number in the book.
I'm on a Linux system and I use Linux/Unix utilities and applications. Equivalents exist under other O/S's such as Windows and Apple.
The Unix command strings
lists all the ASCII 8-bit encoded
strings in a file that are at least 4 characters long. Running
this command on all the DVD files shows that DTSMUS20.DKD is the
only one with lots of english-language strings, and these
strings are the song titles on the DVD.
A brief selection is
Come To Me
Come To Me Boy
Condition Of My Heart
Fly To The Sky
Cool Love
Count Down
Cowboy
Crazy
The actual strings that would show on your disk depends of course
on the songs on it. You would need some english language titles
on it for this to work, of course!
To make further progress you need a binary editor. I use
bvi
. emacs
has a binary editor
mode as well. Search in there for a song title you know is
on the disk. For example, searching for the Beatles "Here Comes The Sun"
shows the block
000AA920 12 D3 88 48 65 72 65 20 43 6F 6D 65 73 20 54 68 ...Here Comes Th
000AA930 65 20 52 61 69 6E 20 41 67 61 69 6E 00 45 75 72 e Rain Again.Eur
000AA940 79 74 68 6D 69 63 73 00 1F 12 D3 89 48 65 72 65 ythmics.....Here
000AA950 20 43 6F 6D 65 73 20 54 68 65 20 53 75 6E 00 42 Comes The Sun.B
000AA960 65 61 74 6C 65 73 00 1B 12 D3 8A 48 65 72 65 20 eatles.....Here
000AA970 46 6F 72 20 59 6F 75 00 46 69 72 65 68 6F 75 73 For You.Firehous
The string "Here Comes The Sun" starts at 0xAA94C followed by
a null byte. This is followed at 0xAA95F by the null-terminated
"Beatles". Immediately before this is 4 bytes.
The length of these two strings (including the null bytes) and the 4 bytes
is 0x1F and this is the first of the 4 preceding bytes.
So the block consists of a 4-byte header followed by a null-terminated
song title followed by a null-terminated artist.
Byte 1 is the length of the song information block including the
4 byte header.
Byte 2 of the header block is 0x12. jim75 at Decoding JBK 6628 DVD Karaoke Disc discovered the document JBK_Manual%5B1%5D.doc . In there is a list of country codes:
00 : KOREAN
01 : CHINESE( reserved )
02 : CHINESE
03 : TAIWANESE
04 : JAPANESE
05 : RUSSIAN
06 : THAI
07 : TAIWANESE( reserved )
08 : CHINESE( reserved )
09 : CANTONESE
12 : ENGLISH
13 : VIETNAMESE
14 : PHILIPPINE
15 : TURKEY
16 : SPANISH
17 : INDONESIAN
18 : MALAYSIAN
19 : PORTUGUESE
20 : FRENCH
21 : INDIAN
22 : BRASIL
The Beatle's song has 0x12 in byte 2 of the header and this matches
the country codes in the table. This is confirmed by looking at
other language files (later).
I've discovered later that the WMA files have their own codes. So far I have seen
83 : CHINESE WMA
92 : ENGLISH WMA
94 : PHILIPPINE WMA
I guess you can see the pattern with the earlier ones!
Bytes 3 and 4 of the header are 0xD389 which is 54153 in decimal. This is one less than the song number in the book (54154). So bytes 3 and 4 are a 16-bit short integer, one less than the song index in the book.
This pattern is repeated throughout the file, so that each record is of this format.
There is a long sequence of bytes near the beginning of the file "01 01 01 01 01 ...". This finishes on my file at 0x9F23. By comparing the index number with those in my song book, I confirm this is the start of the Korean songs, and probably the start of all songs. I haven't found any table giving me this start value.
Checking a number of songs gives me this table:
The end of the block is signalled by a sequence of "FF FF FF FF ..." at 0x136C92.
But there is lots of stuff both before and after the song information block. I don't know what it means.
The first English song in my book is "Gump by Al Wierd", song number 24452. In the table of contents file DTSMUS20.DK this is at 0x9562D (611885). The entry before this is "20 03 3A 04 CE D2 B4 F2 C1 CB D2 BB CD A8 B2 BB CB B5 BB B0 B5 C4 B5 E7 BB B0 B8 F8 C4 E3 00 00". The song code is "3A 04" i.e. 14852 which is song number 14853 (one offset, remember!). When I play that song on my karaoke machine I'm in luck: the first character of the song is "我", which I recognise as the word "I" (in Pinyin: wo3). It's encoding in the file is "CE D2". I've got Chinese input installed on my computer so I can search for this Chinese character.
A Google search for "unicode value of 我" shows me
[RESOLVED] Converting Unicode Character Literal to Uint16 variable ...
www.codeguru.com › ... › C++ (Non Visual C++ Issues)
5 posts - 2 authors - 1 Jul 2011
I've determined that the unicode character '我' has a hex value of
0x6211 by looking it up on the "GNOME Character Map 2.32.1"
and if I do this.
and then looking up 0x6211 on
Unicode Search
gives gold:
Unicode 6211 (25105)
GB Code CED2 (4650)
Big 5 Code A7DA
CNS Code 1-4A3C
There's the CED2 in the second line as GB Code.
So there you go: the character set is GB
(probably GB2312 with EUC-CN encoding) with code for 我 as CED2.
Just to make sure: using the table by Mary Ansell at GB Code Table the bytes "CE D2 B4 F2 C1 CB D2 BB CD A8 B2 BB CB B5 BB B0 B5 C4 B5 E7 BB B0 B8 F8 C4 E3" translate into "我 打 了 一 通 ..." which is indeed the song.
I'm not familiar with other language encodings so haven't investigated the Thai, Vietnamese, etc. The Korean seems to be EUC-KR.
The earlier investigations by others have created programs in C or C++. These are generally standalone programs. I would like to build a collection of reusable modules, so I have chosen Java as implementation language. At this stage there are only two relevant classes: a song and a table of songs.
Java is a good O/O language which supports good design. It includes a Midi player and Midi classes. It supports multiple language encodings so it is easy to switch from, say GB-2312 to Unicode. It has good cross-platform GUI support.
Java doesn't support unsigned integer types. This sucks really badly here since so many data types are unsigned for these programs. Even bytes in Java are signed :-(. Here are some of the tricks :-(.
(byte) n
n = b ≥ 0 ? b : 256 - b
n = ((b1 ≥ 0 ? b1 : 256 - b1) << 8) + (b2 ≥ 0 ? b2 : 256 - b2)
(no joke!)
The song class contains information about a single song and is given here: SongInformation.java
The song table class holds a list of song information objects and is given by SongTable.java You may need to adjust the constant values in the file-based constructor for this to work properly for you.
A Java program using Swing to allow display and searching of the song titles is SongTableSwing.java It will also attempt to decode and play a selected Midi-format song, but you may need to adjust some of the external programs to do this.
The files DTSMUS00.DKD - DTSMUS07.DKD contain the music files. There are two formats for the music: Microsoft WMA files and Midi files. In my song books some songs are marked as having a singer. These turn out to be the WMA files. Those without a singer are Midi files.
The WMA files are just that. The Midi files are slightly compressed and have to be decoded before they can be played.
Each song block has at the beginning a section containing the lyrics. These are compressed and have to be decoded.
The data for one song forms a record of contiguous bytes. These records are collected into blocks, also contiguous. The blocks are separate. There is a "super block" of pointers to these blocks. Part of the song number is an index into the super block, selecting the block. The rest of the song number is an index of the record in the block.
I came backwards into this and only arrived at understanding what others had accomplished after some time. So in case it helps any others, here is my route.
I used the Unix command strings
to discover the
songs information in DTSMUS10.DKD. On the other files it
didn't seem to produce much. But there were ASCII strings
in these files and
some were repeated. So I wrote a shell pipeline to sort these
strings and count them. The pipeline for one file was
strings DTSMUS05.DKD | sort |uniq -c | sort -n -r |less
This produced results
1229 :^y|
1018 j?wK
843 ]/<=
756 Seh
747 Ser
747 _\D+P
674 :^yt
234 IRI$
The results weren't inspiring. But when I looked inside the files to see where "Ser" was occurring, I also saw:
q03C3E230 F6 01 00 00 00 02 00 16 00 57 00 69 00 6E 00 64 .........W.i.n.d
03C3E240 00 6F 00 77 00 73 00 20 00 4D 00 65 00 64 00 69 .o.w.s. .M.e.d.i
03C3E250 00 61 00 20 00 41 00 75 00 64 00 69 00 6F 00 20 .a. .A.u.d.i.o.
03C3E260 00 39 00 00 00 24 00 20 00 34 00 38 00 20 00 6B .9...$. .4.8. .k
03C3E270 00 62 00 70 00 73 00 2C 00 20 00 34 00 34 00 20 .b.p.s.,. .4.4.
03C3E280 00 6B 00 48 00 7A 00 2C 00 20 00 73 00 74 00 65 .k.H.z.,. .s.t.e
03C3E290 00 72 00 65 00 6F 00 20 00 31 00 2D 00 70 00 61 .r.e.o. .1.-.p.a
03C3E2A0 00 73 00 73 00 20 00 43 00 42 00 52 00 00 00 02 .s.s. .C.B.R....
03C3E2B0 00 61 01 91 07 DC B7 B7 A9 CF 11 8E E6 00 C0 0C .a..............
03C3E2C0 20 53 65 72 00 00 00 00 00 00 00 40 9E 69 F8 4D Ser.......@.i.M
Wow! two byte characters!
The strings
has options to look at e.g. 2-byte
big-endian character strings. The command
strings -e b DTSMUS05.DKD
turned up
IsVBR
DeviceConformanceTemplate
WM/WMADRCPeakReference
WM/WMADRCAverageReference
WMFSDKVersion
9.00.00.2980
WMFSDKNeeded
0.0.0.0000
These are all part of the WMA format.
According to http://www.garykessler.net/library/file_sigs.html, the signature of a WMA file is given by the header
30 26 B2 75 8E 66 CF 11
A6 D9 00 AA 00 62 CE 6C
and that pattern does occur, with the above strings appearing some time later.
The spec for the ASF/WMA file format is at http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=14995
So on that basis I could indentify the start of WMA files. The 4 bytes preceding each WMA file are the length of the file. From that I could find the end of the file, which turned out to be the start of a record for the next record containing some stuff and then the next WMA file.
In these records I could see patterns I couldn't understand, but also from byte 36 on I could see strings like
AIN'T IT FUNNY HOW TIME SLIPS AWAY, Str length: 34
00000000 10 50 41 10 50 49 10 50 4E 10 50 27 10 50 54 10 .PA.PI.PN.P'.PT.
00000010 50 20 11 F1 25 12 71 05 04 61 05 05 51 21 13 01 P ..%.q..a..Q!..
00000020 02 05 91 2B 10 20 48 10 50 4F 10 50 57 13 40 00 ...+. H.PO.PW.@.
00000030 12 61 02 12 01 02 04 D1 05 04 51 3B 05 31 05 04 .a........Q;.1..
00000040 C1 29 10 20 50 10 51 45 10 21 28 10 21 1E 10 21 .). P.QE.!(.!..!
00000050 3A 14 F1 05 13 31 02 10 C1 0E 11 A1 58 15 A0 00 :....1......X...
00000060 15 70 00 13 A0 A9 .p....
Can you see "A.I.N.'.T"?
But I couldn't figure out what the encoding was or how to find the table of song starts. That's when I was ready to look at the earlier stuff and understand how it applied to me. ( Understanding the HOTDOG files on DVD of California electronics , Decoding JBK 6628 DVD Karaoke Disc and Karaoke Huyndai 99 ).
The file DTSMUS00.DKD starts with a bunch of nulls. At 0x200 it starts to kick in with data. This was identified as the start of a "table of tables" i.e. a superblock. Each entry in this superblock is a 4-byte integer, which turns out to be an index to tables in the data files. The superblock is terminated by a sequence of nulls (for me at 0x5F4) and there are less than 256 indexes in the table.
The value of these superblock entries seems to have changed in different versions. In the JBK disk and also on mine, the values have to be multiplied by 0x800 to give a "virtual offset" in the data files.
To give meaning to this: on my disk at 0x200 is
00000200 00 00 00 01 00 00 08 6C 00 00 0F C1 00 00 17 7A
00000210 00 00 1E 81 00 00 25 21 00 00 2B 8D 00 00 32 B7
So the table values are 0x1, 0x86C, 0xFC1, 0x177A, ...
The "virtual addresses" are 0x800,
0x436000 (0x86C * 0x800) and so on.
If you go to these addresses, then before the address is a bunch of nulls,
and at that address is data.
Why I call them virtual addresses is because there are 8 data files on my DVD and most addresses are larger than any of the files. The files in my case are all 1065353216L (except the last) bytes. The "obvious" solution works: the file number is address / file size, and the offset into the file is address % file size. You can check this by looking for the nulls before the address of each block.
Each of the tables indexed from the super block is a table of song indexes. Each table contains 4-byte indexes. Each table has at most 0x100 entries, or is terminated by a zero index. Each index is the offset from the table start of the beginning of a song entry.
Given a song number such as 54154 "Here Comes The Sun" we can now find the song entry. Reduce the song number by one to 54153. It is a 16-bit number. The top 8 bits are the index of the song index table in the superblock. The bottom 8 bits are the index of the song entry in the song index table.
Pseudocode:
songNumber = get number for song from DTSMUS20.DKD
superBlockIdx = songNumber >> 8
indexTableIdx = songNumber & 0xFF
seek(DTSMUS00.DKD, superBlockIdx)
superBlockValue = read 4-byte int from DTSMUS00.DKD
locationIndexTable = superBlockValue * 0x800
fileNumber = locationIndexTable / fileSize
indexTableStart = locationIndexTable % fileSize
entryLocation = indexTableStart + indexTableIdx
seek(fileNumber, entryLocation)
read song entry
Each song entry has a header and is followed by two blocks that I call the information block and the song data block. Each header block has a 2-byte type code and a 2-byte integer length. The type code is either 0x0800 or 0x0000. The code signals the encoding of the song data: 0x0800 is a WMA file while 0x0000 is a Midi file.
If the type code is 0x0 such as the Beatles "Help!" (song number 51765) then the information block has the length in the header block and starts 12 bytes further in. The song data block immediately follows this.
If the type code is 0x8000 then the information block starts 4 bytes in for the length given in the header. The song block starts on the next 16-byte boundary from the end of the information block.
The song block starts with a 4-byte header which is the length of the song data for all types.
If the song type is 0x8000 then the song data is a WMA file. All songs looked at have a singer included in this file.
If the song type is 0x0 then (from the book) there is no singer in the songs looked at. The file is encoded, and decodes to a Midi file.
All files have a lyric block followed by a music block. The lyric block is compressed and it has been discovered that this is LZW compression. This decompresses to a set of 4-byte chuncks. The fist two bytes are characters of the lyric. For 1-byte encodings such as English or Vietnamese, the first byte is one character and the second is either zero or another character (two byts such as "\r\n"). For two byte encodings such as GB-2312, the two bytes form one character.
The next two bytes are the length of time the character string plays for.
Each lyric block starts with strings such as "#0001 @@00@12 @Help Yourself @ @@Tom Jones " The language code is in there as NN in "@00@NN". The song title, writer, singer are clear. (Note: these characters are all 4 bytes apart!). For English it is "12" and so on.
Bytes 0 and 1 of each block are a character in the lyric. Bytes 2 and 3 are the duration of each character. To turn them into Midi data, the durations have to be turned into start/stop of each character.
My Java program to do this is SongExtracter.java
The Midi files extracted from the disk can be played using standard Midi players such as Timidity. The lyrics are included and the melody line is in Midi channel one. I've written a batch of Java programs using Swing and also the Java Sound framework which can play and do things to Midi files. At the same time as playing Midi files I can also do cool karaoke things like show the lyrics, show the notes that should be played and show progress through the lyrics. I'm still working on those, they will get posted later.
WMA files are "evil." They are based on two Microsoft proprietary formats. The first is the Advanced Systems Format (ASF) file format which describes the "container" for the music data. The second is the codec, Windows Media Audio 9.
The ASF is the primary problem. Microsoft have a published specification. This specification is strongly antagonistic to anything open source. The license states that if you build an implementation based on that specification then you:
Just to make it a little worse, Microsoft have Patent 6041345 "Active stream format for holding multiple media streams" filed in Mar 7, 1997. The patent appears to cover the same ground as many other such formats which were in existence at the time, so the standing of this patent (were it to be challenged) is not clear. However, it has been used to block the GPL-licensed project VirtualDub from supporting ASF. The status of patenting a file format is a little suspect anyway, but may become a little clearer after Oracle wins or loses its claim to patent the Java API.
The FFmpeg project has nevertheless done a clean-room implementation of ASF, reverse-engineering the file format and not using the ASF specification at all. It has also reverse-engineered the WMA codec. This allows players such as mplayer and VLC to play ASF/WMA files. FFmpeg itself can also convert from ASF/WMA to better formats such as Ogg Vorbis.
There is no Java handler for WMA files, and given the license there is unlikely to be one unless it is based on FFmpeg.
The WMA files that I have extracted from the DVD have the following characteristics:
The Songken player plays the right channel if no-one is singing into the microphones, but switches to the left channel (effectively muting the lead singer) as soon as someone sings into a microphone. Simple and effective.
The lyrics are still there in the track data as Midi and can be extracted as before. They can be played by a Midi player. I have no idea (yet) how to synchronise playing the Midi and the WMA files.